Tools for the Internationalization
of C Applications
This
paper relates to applications written in C and running
under UNIX. "Internationalization" is a method
providing applications written in English with the capability
to work in other languages (English, German, Spanish,
Arabic, Greek ...et c.). To achieve this objective,
a number of problems must be solved. The most important
are :
- An
extended character set allowing storage and display
of information in the selected native languages.
- Mapping
the keyboards to satisfy the national language conventions.
- The
full set of C function libraries accepting 8-bit codes
(256 characters) including national sort functions.
- If
the application is written with the curses package,
it must also support 8-bit codes.
In addition, a whole battery of utilities is needed
to complete the process ; for example the user needs
a text editor such as ed or vi, capable of accepting
an extended character set.
- The
LANGBOX product family supports the following languages
:
- SEMITIC
LANGUAGES : Arabic, Farsi, Urdu, Pashto, and Swahili.
- WEST
EUROPEAN : French, German, Italian, Spanish.
- SCANDINAVIAN
LANGUAGES: Swedish, Finnish, Norwegian. EAST EUROPEAN
: Russian, Bulgarian, Byelorussian, Croatian, Czech,
Estonian, Hungarian, Latvian, Lithuanian, Polish,
and Romanian.
- GREEK
and TURKISH.
- OTHERS
(Please inquire).
The
above list of problems have been solved and an integrated
international work environment is thus created. In addition,
these products contain supplementary tools that reduce
the internationalization process to a pure manual translation
effort.
Internationalization process
|
To
simplify the case study, a general case has been chosen.
It is a simple C program, written in English, with messages
embedded in the source code using printf and puts. (The
"curses" function library could have been
used as well).
The
LANGBOX products come with a set of tools that will
read the source code, extract the character strings
and put them in a special file. Another utility prints
the message file in numbered lines.
A
translator (not a programmer!), will create, for example,
the French version of these English messages. The new
French text is entered into a file.
The
rest of the effort is handled automatically by the system.
If the variable LANG is set to French, the French messages
are used when executing the program, otherwise English
is used if LANG is set to English.
Schematically,
this process is summarized as follows :
The
example used is a source program written in C as follows
; it will be running in French under the West European
Languages :
- #include <stdio.h>
-
- main()
- {
- int n; char y[5];
- printf("This program converts
decimal numbers to hexadecimal\n\n");
- while(1) {
-
- printf("\nEnter decimal
number: ");
-
- scanf("%d",&n);
- printf("\nNumber entered
is <%d> decimal and <%x>
hexa",n,n);
- printf("\nDo you want
to continue? ");
- scanf("%s",y);
- if(strcmp(y,"yes"))
{
- printf("\n exiting
..\n");
-
- exit();
- }
- }
- }
- mypg.c
|
To
extract the messages, the following command is used
:
xtract
-f printf -l en mypg xmypg.c mypg.c
This
creates a file mypg.en containing the messages, and
xmypg.c containing the program mypg.c without the messages.
The option "-f printf" selects only the messages
used by this function. If this option is not used, all
messages are extracte d.
The
new xmypg.c code is as follow:
- #include
<stdio.h>
-
- extern
unsigned char *intl_m_msg(), *intl_f_msg();
main()
- {
- int
n; char y[5];
- printf(intl_m_msg("","mypg",1));
- while(1)
{
-
- printf(intl_m_msg("","mypg",2));
-
- scanf("%d",&n);
- printf(intl_m_msg("","mypg",3),n,n);
- printf(intl_m_msg("","mypg",4));
- scanf("%s",y);
- if(strcmp(y,
(intl_m_msg("","mypg",6)))
{
- printf(intl_m_msg("","mypg",5));
-
- exit();
- }
- }
- }
xmypg.c
|
Step
2 - Printing the Messages
The
following command is used to print the extracted messages,
now in mypg.en :
xdisp mypg.en > mypgmsg.en
The
result, in mypgmsg.en is as follows :
- "This program
converts decimal numbers to hexadecimal\n\n"
-
- "\nEnter decimal
number:"
- "\nNumber entered
is <%d> decimal and <%x> hexa"
- "\nDo you want
to continue?"
- "\nexiting
..\n"
- "yes"
mypgmsg.en
|
Step
3 - The Manual Translation
mypg.en
is translated manually to French.
The
translated version is in mypgmsg.fr :
- "Ce programme convertit les
nombres décimaux en hexadécimal\n\n"
-
- "\nEntrer le nombre décimal:"
- "\nLe nombre entré
est <%d> décimal et <%x>
hexadécimal"
- "\nVoulez vous continuer?"
- "\nSortie ..\n"
- "oui"
mypgmsg.fr
|
Step
4 - Creation of the mypgmsg.fr File
All
LANGBOX products supply an international version of
ed and vi. Under the West European Languages, these
editors are called xed and xvi respectively. xvi has
been used to enter the text mypgmsg.fr. The LANG variable
was set to French and the French shell, frsh, that also
comes with the LANGBOX, was used.
Step
5 - Compilation of mypgmsg.fr File
The
file mypgmsg.fr having been created, it must be transformed
to a format that makes these messages accessible to
the program xmypg.c, in a multilingual environment.
This is done with the command :
xind -l fr mypg mypgmsg.fr
One
can use xdisp to reprint mypg.fr and verify the content.
Step
6 - Compilation of the C Program
The
program xmypg.c can be recompiled now, but needs access
to the messages library and Extended C library as follows
:
cc xmypg.c -o xmypg -lmsgE -lcE
xmypg
will execute using the French or English messages depending
on whether the LANG variable is set to French or English.
Advantages of Internationalization
under the LANGBOX products |
The
simplicity of this approach to internationalization
results in a number of strategic advantages. Here are
some highlights :
- Programming
is totally separated from translation. National teams
in the target markets will translate the application
messages. The programmer need no more be involved
in this effort nor be knowledgeable with national
languages.
- The
application maintenance is centralized. Only message
files are updated or translated. No need to coordinate
updates with national teams. The result is a more
coherent product across linguistic frontiers.
- In
consideration of the particularities of LANGBOX products,
programmers will no more worry about national characters
or contextation rules. These problems are totally
handled by the system.
- The
end user wishing to use the application in two or
more languages will be able to do so with the same
set of binaries. No need to buy several versions,
one in each language.
The
LANGBOX product family solves the problems of internationalization.
Its utilities allow the production of multilingual applications
for all languages of western Europe, Greece, Eastern
Europe and the Arabic world.
|