Visual International Transliteration, Introduction

HandCopyright © 1996-1997 Vitaly BlokhinMail This page is a part of Vital Network
Revision: 19961121 (draft)

Visual International Transliteration
------------------------------------

This document contains VIT proposals for languages and other
character and sign systems.

Besides of the existing natural languages, there are many other
character and sign systems to which the VIT principles may be
applied as well. The elements to be represented might be
letter-like, some specific characters, sounds or just about
anything that can and has to be represented as a sequence of
elements.

Just a few examples:

IPA (International Phonetic Alphabet)
Constructed and artificial languages (like Esperanto)
Different sign systems
Archaic and "dead" languages (i.e. ancient Greek and Latin)

In this document, the term "language" is used but its meaning
is actually much wider than just "existing natural languages".


Basic universal transliteration ideas, requirements and principles to follow
----------------------------------------------------------------------------

1. It must use only ASCII characters, preferably letters (A..Z, a..z)
  Unfortunately, these days the only reliable way to transfer
  text information is to use 7-bit ASCII (even, some basic subset of it),
  and this situation doesn't improve much. An example of such a set of
  characters (ISO 646 subset, see TEI Guidelines for Electronic
  Text Encoding and Interchange, chapter 4.3) which will safely survive
  any transmission over the link:
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    " % & ' ( ) * + , - . / : ; < = > ? _
  Eventually, one day this particular problem will be solved
  (which doesn't solve all other internationalization problems).
  Until that happy day, we, mere mortals, still need to communicate,
  sometimes using different languages.

2. It must be unambiguous (100 % reversible and convertible to
  and from any other existing unambiguous encoding), so normal
  documents could be converted to transliterated form and back
  without garbling and information losses.

3. It should be visual, compact, as natural as possible and
  reflect (be based on) the existing widely and commonly used
  practice of transliteration (read) and entering (write, type)
  from the de-facto standard QWERTY keyboard for each language if it
  doesn't contradict to other principles formulated in this document.
  If this is implemented well enough, it becomes possible to read
  and write using virtually any language and multilingual documents
  even if the national fonts, keyboard drivers and other similar
  software for that language is unavailable for the reader/writer.

4. In addition to existing national standards for keyboard input,
  the characters may be typed the same way as they are
  transliterated, thus eliminating the need for keyboard drivers
  and national characters written on the QWERTY keyboard.
  The users who have such keyboards and national software on their
  computers can use traditional methods, but when they switch to
  computers that don't have this nice stuff, their only choice
  becomes transliteration.
  Projects like Unicode are great for standardizing browsing, but they:
  a) require presence of all appropriate fonts on every machine
  b) don't specify any standard way to enter information
  c) use 2 - 4 byte coding for each character which is redundant
  d) don't provide good extensibility features
  e) are static, inflexible and almost impossible to modify
  f) are inconsistent in using of different codes for same characters
  g) still have a lot of problems that cannot be easily resolved
  h) cannot be used immediately; will need a lot of time to be completed

5. A simple way of switching from one language to another should
  be provided allowing for multilingual readable documents in
  plain ASCII (this is currently under construction) that are
  convertible to normal documents if the user has all the
  necessary fonts.

6. The same method may be applied to any character system, even if
  it doesn't correspond to any particular language (i.e.
  International Phonetic Alphabet)

7. The suggested methods and principles keep in mind first of all
  computer-related information and documents (i.e. Internet, e-mail),
  but they may also be very useful in non-computer life, especially
  if their implementations get standardized.
  Examples:
  a) "Snail mail" addresses
  b) International Yellow and White pages
  etc.

8. Multilingual search, sorting, databases, references etc.
  In this case, the fact that a transliteration uses only
  letters may be really helpful because lexical elements
  (i.e. words) are recognizable in a language-independent fashion.

9. Supporting software for different operating systems and
  Web-based should be implemented.
  The software should be open (all the tables may be modified
  externally by the user community), so all the competing
  transliteration and encoding variants would be reduced
  to the best ones, and the rest would disappear.

10. There still are many open issues and unanswered questions.
  Just a few examples:

  a) The order of characters is not left-to-right (Hebrew, Arabic,
  Chinese) -- how to write mixed-language documents in this case ?
  b) How to handle the situations (that take place in some languages
  like Arabic) when different variants of a character should be
  used based upon its position in a word (beginning, middle, end) or
  some other criteria -- should such characters transliterate into
  the same or different strings ?
  c) ...


Notes about possible implementations of VIT
-------------------------------------------

From the standpoint of transliteration, the character
systems used to represent languages may be classified
into the following categories (groups):

1. The character set needed contains only characters
that belong to the safe subset of ASCII (see above),
no additional characters needed.

2. The character set needed contains only characters
that belong to the printable subset of ASCII
(>= 32, <= 126). In some rare situations, a few characters
would need to be transliterated using the safe subset
of ASCII.
(Examples: English)

3. The character set needed contains mostly characters
that belong to the printable subset of ASCII
(>= 32, <= 126). A few modifiers may apply to some
standard Latin letters. For this kind of systems,
typically two variants are in common use:
 a) Standard ASCII part of the character table
 remains unchanged; Extended ASCII set (>= 128, <= 255)
 is used to represent missing characters, so
 the ASCII part and the specific part may coexist
 within one font. In most of the situations, the
 standard Extended ASCII (with international characters)
 is sufficient to represent all such characters.
 b) Missing characters are ASCII-transliterated which
 eliminates the need for characters >= 128. Usually,
 only one transliteration method is in common use but
 there exist some reversibility problems.
(Examples: French, German, Italian, Spanish, Esperanto)

4. This group is similar to 3. The difference is that
many Latin/English letters cannot keep their original
meaning and different transliteration ideas should be
used. Typical approaches:
 a) is similar to 3.a with exception that a few different
 encoding standards are in common use.
 b) A variety of transliteration methods are currently in use.
 Unfortunately, most of them tend to be ambiguous and
 non-reversible.

(Examples: Russian, Greek)

5. If there are so many additional characters that they
cannot fit the range 128..255, they cannot coexist with
standard ASCII part within one font (32..256) and more
than one byte will be required to represent characters.
A separate font is typically used to represent all the
characters. In order to use ASCII, the user has to switch
back and forth between ASCII and national fonts.
Some ASCII transliteration schemes are currently in use.
(Example: Indian languages)

6. The character set requires at least two bytes.
Standard one byte fonts (<= 256) are not applicable
to this kind of systems and they require special
national versions of operating systems and fonts.
The existence of any standards for ASCII transliteration
is questionable. Transliteration may be based on different
principles. One of them may use existing ways to enter
characters from a standard QWERTY keyboard.

(Examples: Chinese, Japanese, Korean)


Examples
--------

Here there are a few examples of different real life situations
when VIT, if implemented, might be extremely helpful:

1. Imagine the situation when a Chinese guy,
being stuck in the Brazilian airport, tries to get
(or send) an e-mail (of course, in Chinese !) from
his friend in China using public Internet access.
He would need a Chinese version of Windows, fonts
and keyboard drivers installed on that computer !

2. Another example: someone is searching the address of
her friend who lives in Israel. White pages
are available, but all the names and addresses
are in Hebrew which the searcher doesn't know and,
of course, she doesn't have any support of Hebrew on
her computer. If those great yellow pages were not
only in Hebrew but also in a form that is usable for
people who are not familiar with that language ...


End Line