|
 |
AraWebParse - Visual Rendering for Arabic Web pages
How
to See (or convert) Arabic Web pages using a Standard Latin
Browser
AraWebParse
is a LangBox International Free Service. It requires
either:
- A
Web Browser accepting Dynamic Fonts support such as Netscape
4.03 to 4.72 or MS IE 4.0 (Netscape 6 don't support PFR
Fonts and IE 5.x already provide a plug-in for Arabic display)
- Adobe
Acrobat 4.0 that allows to open Web pages.
It allows you to :
As
to illustrate this check the following results :
1-
Using a Web Browser:
|
(click
on the image to magnify)
|
|
|
2-
Using Adobe Acrobat:
|
(click
on the image to magnify)
|
|
|
-
Convert Arabic Web pages to ISO 8859-6
-
Just select the input codeset and force the output codeset
to ISO 8859-6
-
Click on Convert button
-
Save the resulted page using your File/Save As menu.
Options for Parsing Arabic Web pages:
Please
fill up the following form and click on OK button :
How to tune AraWebParse Options
Most of AraWebParse options come from the AraMosaic
product and so from the XLANGBOX-ARA
library available under Unix Operating systems.
Basically, the options description is as follow:
Input Codeset |
This is the codeset in which the Web document has been
saved - There is four main Arabic different codesets
found on the Web. They are :
-
The ISO 8859-6 - Adopted by the ASMO organism -
Used on Unix and Mac environment
-
The Microsoft CP 1256 - Adopted by Microsoft, and
defacto when creating Arabic Web pages under an
Arabic Windows environment
-
The Persian ISIRI 3342 - Trilingual Latin/Arabic/Persian
codeset, used by PMOSAIC.
-
The UTF-8 encoding - Full multi-byte Unicode encoding
- This is the future unified codeset for Arabic,
but not yet fully supported under all environment
(and not yet under AraWebParse...)
This
is very important to correctly set the Input codeset
while using AraWebParse, otherwise the result will have
no sense. |
Output Codeset |
This is the Codeset in which AraWebParse will present
you the Web page after having loading it. It could be
:
-
ISO 8859-6 - Useful to convert HTML web page created
in an other codeset to the ISO Standard
-
VISUAL - This is not really a codeset here, but
a way to present you the Arabic web page in a "Readable"
format, including Bi-Directionality (Bi-Di) processing,
Automatic shape determination (Shaping), and Loading
of Arabic Dynamic fonts.
|
Global Direction |
As you probably already know and contrary to Latin-based
languages, Arabic text is written from right to left.
Because of this fundamental difference in writing direction,
AraWebParse allows two kinds of sessions :
-
The Latin (left to right or LTR) type session where
the initial starting position is located at the
leftmost position of the browser window, and text
is read from left to right.
-
The Arabic (right to left or RTL) type session where
the initial starting position is located on the
rightmost position of the browser window, and text
is read from right to left.
Of
course, this is a main screen presentation, and in all
cases, the implicit writing direction of each character
will be respected. |
Number |
Arabic digits, or Numerals, are always written from
left to right, as in Latin languages, but their shapes
may be displayed in either Hindi (Used in the Middle
East) or Arabic (or Latin) digits depending on the choice
of the user. |
Processing Mode |
This mode allows to consider that all number are digit
(and and could be computed) in the DATAPROC mode, or
number are just string and their shape are unchanged
according their storing value in WORDPROC mode - This
is the difference you may see between the ASMO
708 and the ASMO 449+ codeset.
|
Wrap line |
AraWebParse is handling flow of HTML document. It cannot
know the real line length displayed on your browser
in pixel. Therefore, when AraWebParsing in RTL mode,
visual line are sent from the left side (i.e. End of
the logical line) to the right side (Begin of the logical
line). If this visual line is too long for your browser,
this last one will cut the line somewhere on the right
side. This has for effect to display the logical beginning
of a line on the next line, which is not correct. To
avoid this default browser behavior, you may force to
cut all logical line read by AraWebParse to a certain
amount of character, such as 80 for example. This is
not the most elegant solution, and it may request you
to tune this number according the page you are AraWebParsing,
but it works. |
Neutral Characters |
Arabic and Latin characters conflict in the direction
of the display. When writing Arabic in an English line,
characters are pushed on the line as they come from
the keyboard or from a file. The reverse effect happens
when entering an English character in an Arabic line.
The user may define neutral characters which will follow
the global writing direction despite their language
value. This feature is useful in RTL mode, for characters
such as (),{},[], space... |
Browsing Mode |
This option is really powerful. In VISUAL mode, It allows
you to load a web page on your browser and dynamically
change all link of this page in order to recursively
continue to browse them under AraWebParse, using the
same option as the mother Web page. Of Course, some
complex JavaScript cannot be recognized - See the
known limitation . |
Links
How
to use it from your site:
In order to use the AraWebParse tool from your web site, you
may just insert a link as follow (in one line and without
spacing):
<A HREF=http://www.langbox.com/cgi-bin/langbox/webparse?
url=http://your.domain.com/your_file.html&
codein=MSCP1256&codeout=VISUAL>
http://your.domain.com/your_file.html</A>
Sample
Arabic site parsing:
As for illustrate, here is some link already parsed by some
users :
Know Limitations
-
Some problems in recursive mode in handling absolute/relative
files names - Still under testing
-
Recursive mode and Java Basedir is not supported
-
Recursive mode and JavaScript internal path for filenames
is not supported
-
Recursive mode and Links to non-html (such as images) filenames
is not supported
-
Display of Arabic text is done by text segment. On the same
line, the segment visual order might be not correct in RTL
-
... Please send pbm to support@langbox.com
- We will appreciate any additional comments you might have
in order to increase this performances of this service -
Thanks!
|
|
|