F.A.Q.
(Frequently Asked Questions)
This
FAQ has been done according the various requests and
questions we had in the past from our customers, prospect
or end-user contact. Please have a look in it if you
have some questions. If you cannot find an answer to
your demand, feel free to contact us at : contact@langbox.com
Content:
General
FAQ
Arabic
FAQ
Web
FAQ
1-
What is the difference between Internationalization,
Localization and Customization?
Internationalization
(I18n)
Note:
"internationalization" contains 20 letters: an I followed
by 18 letters followed by an 'n'. It is commonly abbreviated
"i18n." "I18n" is pronounced "internationalization",
not yet "eye-eighteen-enn."
A
program written for a specific locale may be difficult
to run in a different environment. Porting such a
program to operate in each desired environment would
be tedious and costly.
The
goal of the developer, then, should be to write programs
which make no assumptions about language, locale customs
or coded character set. Such programs are said to
be "internationalized". Internationalized applications
can run in a user's native environment following native
conventions with native messages, without recompiling
or relinking. Internationalization thus requires only
a single copy of the software for a world of different
users: it results in software that is "locale-independent".
that contains no code that is dependent on the user's
language, the characters needed to represent that
language, or any formats (such as date and currency)
that the user expects to see and interact with.
Localization
(L10N)
Note:
Localization is often abbreviated "l10n" using the same
formalism as i18n above.
Localization
is the act of proving a locale-independent application
with the environment and data it needs in order to
operate in a particular locale. For example, adding
German system messages to the Sun Solaris is part
of localizing Solaris for the German locale.
As
you noticed here, a big part of Localization consists
on Translation of messages and conventions (collation,
date, time, numbers an money...). Thus, Localization
could be a subset of Internationalization.
Customization
Customization
is the act of adapting a software to a specific customer
use. It could be either localization (if the software
already supports the language charset) or internationalization
or both.
2-
Why could I need the LangBox product?
Most
of the Unix systems are supporting Latin1 languages
such as French, German, Spanish, Italian, Dutch, ...
This feature is part of the XPG3 or MNLS (or NLS) subsystems.
However,
this system extension is limited to ISO 8859-1 codeset
languages.
Under
X Window, several Unix are offering the X LOCALE configuration,
that allows to support Non Latin languages such as
Chinese, Japanese of East European.
However,
for some other complex language such as Arabic, Hebrew,
Farsi (that need a Bi-Directionality support) or Thai
that need context glyph shaping, or Greek, Cyrillic
that handle a dual keyboard, most of UNIX OS still
have some lack support and cannot handle properly
these language directly.
3-
How the LangBox product handle the language support?
The
LangBox family language supports are implementing as
much as possible the 'transparency'. This done by extending
either the UNIX Kernel or some system libraries of the
target Operating systems in order to allow the User
application to talk to the I/O device (screen, keyboard,
mouse, printer...) through an exception process language
layer.
4-
What is 'Transparency'
Transparency
for applications means that all the language specific
process is controlled by the Langbox System extension
(either in the Unix Kernel or in some system libraries)
and the user application can use them without any modification
or recompilation of its source code. The use of the
extension is done at the runtime by either load the
Language specific pseudo-device driver on to the current
TTY line or set the LD_LIBRARY_PATH shell variable that
indicate to an application where to load its shared
dynamic system libraries.
However,
is it recommended to implement internationalized application
according the XPG3 specifications if your Operating
System support it. This specifications require to
modify your source code in order to specify to your
application to use the locale defined by at least
make a call to the setlocale() C function that initialize
C internal functions for sorting, c_type, date/time
format...
If used, the XPG3 Compliance can coexist with the
LangBox product family, but in its absence, the LangBox
product can handled the application language support
anyway.
5-
Why is there a TTY support and a GUI support?
Historically,
the UNIX system interfaces were only alphanumeric dumb
terminals. The terminals are connected though a serial
RS232 line and manage the keyboard sending and screen
receiving flow of characters. The terminals include
a set of command defined as escape sequences to allow
the screen actions or special key sending. These terminal
specific escape sequences are described in the Terminfo
UNIX database.
The way to support specific language I/O on this interface
is to be located in the middle of this RS232 line, take
control of all input and output character flow and perform
exception processing.
This first product family has been named "LANGBOX-XXX"
where XXX stands for the Language support.
With
the development of Graphical Interfaces (like X Window
and OSF
Motif), new kinds of applications have been designed.
These applications are clients that communicate with
a Graphic Server through network connection facilities.
Here, the main application routines work directly
with bitmaps and the concept of character flow has
disappeared. A transparent Language processing is
more difficult to implement. We need at least to re-link
the application with an Language processing library,
or use an new dynamic linked library at the runtime
level (if the operating system allows it).
This X Window oriented product family has been named
"XLANGBOX-XXX"
where XXX stands for the Language support.
6-
Why you don't have a support for my OS?
In
fact all currently available platform are the result
of a market demand. If we don't specify you OS, that
just means that we never have been requested before.
We don't have any manufacturer exclusivity. If you need
one of our product on any platform, just contact us.
7-
Why you don't have a support for my Language?
As
for the platforms, t all currently available languages
are the result of a market demand. If we don't specify
you Language, that just means that we never have been
requested before. We have done many research and preliminary
development for some other languages. If you need one
of our product for a specific language, just contact
us.
8-
How do I locate the language keys on the dual keyboard
?
By
default our package comes with a set of keyboard stickers.
They are designed in a way that allows to still see
the original keytop engraving and have the new language
one in red on the bottom-right side of the keytop. This
is fast and we are using it in house for more that 6
years on some of our station keyboard.
For
some customer, we had to use engraved keyboard (PC
keyboards) from the market. Our product just use a
mapping table and allow to use a different layout.
Also
for a SGI customer, we had to engrave the SGI Keyboard
keytops. This is more delicate and this solution is
more expensive since this is always for small number
of Keyboards. In this case, we had to ship the keytops
(from and back) only.
9-
Is the LangBox collation, time/date format support implemented
on the POSIX locale model?
All
LangBox Product supply locale files (NLS) for its host
operating system. These files format depend on the OS
implementation, but this is generally conform to POSIX.
1-
Why Arabic is more complex ?
Arabic,
as a calligraphic language, presents major processing
problems. An Arabic character may take one, two, or
sometimes four different shapes, yet it is represented
by one code. The shape of the character is determined
depending on its position in the word.
-
Each character can be displayed in four different
shapes: Isolated, Initial, Middle and Final.
Here is a sample for the 'Bah' Arabic letter:
-
Multiple characters can be combined into a single
ligature glyph : Example "Lam-Alef" :
This
is but one problem.
Another
is the direction of writing. ARABIC text is written
from right to left. This conflicts with English, which
is written in the opposite direction. When mixing
text languages, characters are added in one language
and pushed in the other.
Some
users speak only Arabic. They will not accept a cursor
positioned at the leftmost position of the screen. They
want to have an option allowing them to start at the
rightmost position of the line, i.e. in brief, a mirror
image of the screen. The implication is that, in this
mode, English characters are pushed from right to left.
Yet
one more complication : vocalization. These characters,
like their counterpart in English, the vowels, are a
linguistic necessity, yet, in Arabic, they appear on
top or below their respective consonants. Diacritics
should be rendered as non-spacing mark. Due to font
limitation, some implementations only support spacing
diacritics.
In
fact the Arabic language include two text rendering
difficulties :
-
Context shape determination (like Thai, Hangul for
example)
-
Bi-Directionality (Right-to-Left and Left-to-Right,
like Hebrew too)
These
linguistic complications - and more - make Arabic
a difficult language to handle.
2-
How is the Langbox approach for the TTY or Console support
?
The
LANGBOX-ARA product
include an extension to the TTY driver that performs
the following:
-
Dual keyboard logical management (mapping and switching)
-
Virtual Screen page mapping the Real screen and
visual screen XY positions.
The
Kernel extension can be either a pseudo-device (on
SCO UNIX, IBM AIX, ...) or a STREAMS Module (on Solaris,
IRIX,...). This kernel extension is delivered with
some new Unix commands that allows to activate/disable
it, configure it, print files... and well as with
a set of Fixed with fonts and printer fonts.
3-
How is the Langbox approach for the X11/Motif support
?
The
XLANGBOX-ARA product
include an Arabic specific context library (that defines
an Arabic language specific API), as well a new set
of X11 (libX11.so) and Motif (libXm.so) libraries.
These libraries are used by the application instead
of the original system's ones by setting the LD_LIBRARY_PATH
variable. Of course, these libraries are installed with
a set of X11 fonts (Fixed width and proportional width)
and a Postscript printing tools.
4-
I have a X GUI application, should I need to modify
my source code?
No,
there is absolutely NO need to rewrite completely the
application source code :
-
If the Interface uses Motif, just some X Resource
files have to be adapted in order to select Arabic
fonts for example. No modification to the source
code at all is needed. All the Arabic I/O management
is done in the Motif library which is dynamically
linked to the application at the runtime.
-
If the interface is a pure proprietary X11 management,
some small changes might be needed (but not necessary,
it depend on the existing code itself) for the Bi-Directionality
handling. All the Arabic I/O management is done
in the X11 dynamic library.
Our
XLANGBOX-ARA product is designed for that. As for
example, on the SGI environment,
we used to localize all the Desktop tools (icons,
file manager, mailtool, mediatool,...) in Arabic without
any access to the source code - just the standard
(English) installed binary. On Sun
Solaris, we used to have most of our VAR customers
with the same situation. They just re-link (dynamically)
their English binary with our X11 and Motif libraries,
and get their application working with Arabic data.
Some of them also want to go deeper and embed Arabic
API in their source code to get a Arabic binary, but
this is their choice.
5-
How to handle Arabic messages in my application ?
The
application messages (labels, menus, static strings...)
might be translated, but this is not an obligation:
Most of users can work with an English application (English
menus) while managing Arabic data. The full translation
of all messages is a must that is appreciated (but sometime
requested) by the user.
If
the English messages are already placed on external
resources files, there is no major problem here, just
a translation issue by domain specialist translators.
If
the English Messages are hard-coded within the source
code, it is more complicated, but we used to have
at LangBox several Unix tools
able to browse a C source code, extract all string
messages and place them on external files, and replace
these message in the source code by call to specific
external function returning the string indexed.
6-
How sorting Arabic data in my application ?
The
Arabic Codeset is managed transparently by the libc
routines and the NLS OS supplement. All C functions
handling the charset (such as strcmp for example) is
using the LC_LOCALE and the LC_CTYPE environment variables.
By setting this variable to "ar" (for Arabic), the libc
routines will load dynamically the Arabic ISO 8859-6
codeset definition for sorting order for example and
will perform correct sorting. XLANGBOX-ARA provides
of course the ISO 8859-6 codeset table definition files
for Operating System NLS.
7-
What are the different charset in Arabic ?
- ISO
8859-6 : De facto Standard on Unix environment.
- ASMO
449+ : Extension to ISO 8859-6 (compatible)
: Encodes Hindi Digits.
- ASMO
708 : Identical to ISO 8859-6
- CP
1256 : Microsoft Windows Arabic default codeset.
- UNICODE
: 16 bit codeset coding all languages.
8-
What are the differences between ISO8859-6 and ASMO
449+ ?
The
differences between ISO
8859-6 and ASMO 449+
is the content of columns 10 and 11, where ASMO 449+
include dedicated code value for 8 bits characters such
digits, !, ", #, $, %, &, ', ), (,.......>, =, <.
Position code from 0xdb-0xdf and 0xfb-0xfe are also
used for Arabic characters [, \, ], ^, _ and }, |, {,
~.
All theses new characters correspond to a 8 bit version
of there 7 bit counterpart located in the same place
into the 7 bit area.
XLANGBOX-ARA may use this ASMO 449+ codeset in "Wordproc"
mode and switch to pure ISO 8859-6 in "Dataproc" mode.
There is no conflict between the two codesets.
9-
What does ASMO stand for ?
ASMO
stand for Arab Standardization and Metrology Organization.
Each country or group of countries has its own standardization
organization, for example for France, it is AFNOR, Europe
is ECMA, etc...
10-
Which products support ASMO 449+ ?
ASMO
449+ (codeset or Code Page) is supported by product
such as Arabic MS-DOS, Specific Arabic Hardware devices
(terminals and printers) such as Alis, Tandberg, Genicom,
Sedco...
This codeset is not supported alone on these products,
but always together with ISO and other Arabic codeset
that we may found on the Arab market.
11-
What are Neutrals characters ?
You
have character class (Latin) that are always written
from left to right. You have character class (Arabic)
that are always written from right to left. This feature
is a sort of CTYPE definition for each characters in
a Codeset.
However, in order to be able to display correctly the
output screen of some applications (that are designed
to build screen from left to right using Latin characters),
we need to define a new class that is able to use the
global language direction (defined in your section 3,
b) for its own display direction. When defined as a
Neutral, a character is written from left to right in
Latin orientation and from right to left in Arabic orientation.
12-
What is Arabic floating point symbol ?
Normally,
a "Arabic" digit number use the point (.) (or the comma
in France). "Hindi" digit number use a reversed comma
defined in the ISO codeset. But for history reason,
some users want to use the Arabic "Ra" letter that look
like a comma. In fact they used this letter just because
the reversed comma didn't exist on their keyboard. LangBox
products support this feature, that may also be defined
in NLS files.
13-
What is the CTL ?
CTL
has been adopted by X/OPEN.
CTL stands for "Complex Test Layout" : This Layout Service
allows to handled languages which require an output
method to transform text string from storage format
to display format before rendering. In general there
are two types of ordering associated with complex text
languages processing.
-
Physical order -- Order which the text is rendered
on the screen.
-
Logical order -- Order which the text is input by
the users and process by applications.
The
characteristics of complex text languages are bi-directional
and context sensitive. The text flow of bi-directional
scripts is from right to left horizontally, with numbers
going from left to right. These languages include Arabic,
Hebrew, Urdu, Farsi, and Yiddish. In context sensitive
scripts, characters could be rendered in a different
shape depending on the combination of the characters
within a word. These languages include Thai, Lao, Vietnamese
and Korean.
Arabic
has both bi-directional and context sensitive characteristics
and is considered as a Complex Text Language.
14-
How can I configure my application under Arabic Motif/X11
?
Basically,
the configuration should include the iso8859-6 fontname
in the /usr/lib/X11/el/app-default/Class_name file.
You can list all fonts used for an application with
the command :
appres
Class_name | grep -i fontlist
The
best thing is to append this list to the .../X11/ar/app-default/Class_name
file after having changed the iso8859-1 font name by
true existing Arabic iso8859-6 fontnames.
When this file is installed, you can launch the application
with :
LANG=ar
LD_LIBRARY_PATH=/usr/lib/alm/lib
export LANG LD_LIBRARY_PATH
application_launch_script
1-
What is the problem when browsing Arabic Html document
?
In
fact there is several problems when browsing an Arabic
HTML document :
2-
Why AraMosaic ?
In
June 1996, after having seen several complains about
the lack of Arabic Web Browser from our customer, LangBox
team decides to investigate in the domain of Arabic
Web support on UNIX platform. At this time, the only
solution found for UNIX is the PMosaic product and its
Trilingual support English/Persian/Arabic), which unfortunately
does not support the ISO 8859-6 encoding codeset nor
a correct cut/paste feature.
In
order to contribute to the Arabic standard support
on Internet, LangBox has decided to study the arabization
of NCSA Mosaic using the XLANGBOX-ARA development
package and to offer the result of this job to the
UNIX Arabic User community. The experience of LangBox
International in the arabization process of applications
and the knowledge of all its related issues has resulted
in the delivery of the version "1.0" of AraMosaic.
Six months later, a second update has been released
in order to fix some bugs and add a CP1256 and ISIRI
codesets support too.
This
product is Free of use and can be run on a standard
Unix system (non arabized).
3-
How to convert MS CP1256 HTML document to ISO 8859-6
HTML document ?
Just
load the CP1256 HTML document under AraMosaic, and save
it using the option "Save As..." of the File menu.
Also several tools are available on the Internet, such
as on the http://www.ayna.com
site or on http://leb.net...
4-
Is there a benefit to run AraMosaic under ALM or XLANGBOX-ARA
?
Under
ALM or XLANGBOX-ARA environments, because of the benefit
of the Motif Arabic support, the AraMosaic allows in
addition to input text in Arabic under :
-
Find/Search menus
-
Mail window
-
HTML <input> and <Textarea> fields in
Forms.
5-
How to use Netscape under ALM or XLANGBOX-ARA ?
In
the same way as other UNIX applications, Netscape can
us used under ALM or XLANGBOX-ARA, but any text selection
operation will gives garbled displays because the HTML
section is managed by the Netscape code itself..
In
any case the Arabic text can be read, the procedure
to follow is :
For Encoding :User
Defined
Use the Proportional fonts :Naskhi
(lbi, iso8859-6)
Use the fixed fonts :Arabic96
(lbi, iso8859-6)
5- How to view Arabic Web page using a regular Latin
browser and environment?
AraWebParse is a LangBox International new Free
Service. It requires a Browser accepting Dynamic Fonts
support (such as Netscape 4.03 or higher or MS IE 4.0
or Higher*) - It allows you to :
-
Display an Arabic Web page dynamically on your Latin
based Browser, using PFR
Dynamic fonts
- Convert
Arabic Web pages to ISO 8859-6
This
engine has a "recursive browsing" option allowing to
continue to surf transparently. This tool is really
interesting when you are traveling and so away from
your Arabic environment.
|