LangBox International

Your Interface in Your Language
AraWebParse

 
About Us | Products | Languages | Services | Documentation | Resources | News | Downloads
Quick links: Arabic | AraWebParse | Axmedit | Linux | Solaris | Search | FAQ | Jobs | Contact Us
  Navigation
About Us
Products
Languages
Platforms
Services
Documentation
Downloads
Resources
Subscribe
Contact Us
  Customer
Customer Support Download Area

Technical Documentation

License Key Generator
  Your Need
Tell us your Need!
  Search
 Search on this Site:

AraWebParse - Visual Rendering for Arabic Web pages



Content:
Presentation
More options
Links
Limitations
Credit
Quick Launch AraWebParse (See below for more options):

ISO 8859-6 CP-1256



How to See (or convert) Arabic Web pages using a Standard Latin Browser

AraWebParse is a LangBox International Free Service. It requires either:

  • A Web Browser accepting Dynamic Fonts support such as Netscape 4.03 to 4.72 or MS IE 4.0 (Netscape 6 don't support PFR Fonts and IE 5.x already provide a plug-in for Arabic display)
  • Adobe Acrobat 4.0 that allows to open Web pages.

It allows you to :

    1- Just select the input codeset and force the output codeset to VISUAL

    2- Click on Parse button


    3- Read it or just print it out using your Browser or Acrobat print menu command :-)

As to illustrate this check the following results :

 

  • Convert Arabic Web pages to ISO 8859-6
    • Just select the input codeset and force the output codeset to ISO 8859-6
    • Click on Convert button
    • Save the resulted page using your File/Save As menu.

Options for Parsing Arabic Web pages:

Please fill up the following form and click on OK button :
1- URL :
AraWebParse will load the Web page located at this URL and will display it dynamically after its conversion
  

2- Conversion Settings
Input Codeset
(Codeset in which the original document is stored)
ISO 8859-6 (Unix, Mac) MS CP-1256 (Arabic Windows) ISIRI 3342 (Persian)

Output Codeset

(Codeset you want to obtain after conversion)
ISO 8859-6 (ASMO 708) Visual - with Dynamic fonts


3- Visual rendering options
(This setting is only valid when the Output codeset Visual is selected.)
Direction : Left-To-Right Right-To-Left
Number : Arabic (Latin) Hindi
Processing mode : Data Processing (ASMO 708) Word Processing (ASMO 449+)
Diacritics : With tashkils Without tashkils
Force wrap line at :  characters (leave blank for no wrap)
    (This is really needed for wrapping long input lines when displaying in Right-To-Left Mode.)
Neutral characters list : (Leave blank for default list)
    (This list of Latin characters will be behave as neutral characters in Right-To-Left Mode.)
Browsing mode : Recursive Not Recursive
    (This is mode allow to continue to browse the converted document recursively under the AraWebParse engine.)


4 - Valid and Parse now :



How to tune AraWebParse Options

    Most of AraWebParse options come from the AraMosaic product and so from the XLANGBOX-ARA library available under Unix Operating systems.

    Basically, the options description is as follow:

    Input Codeset This is the codeset in which the Web document has been saved - There is four main Arabic different codesets found on the Web. They are :
    • The ISO 8859-6 - Adopted by the ASMO organism - Used on Unix and Mac environment
    • The Microsoft CP 1256 - Adopted by Microsoft, and defacto when creating Arabic Web pages under an Arabic Windows environment
    • The Persian ISIRI 3342 - Trilingual Latin/Arabic/Persian codeset, used by PMOSAIC.
    • The UTF-8 encoding - Full multi-byte Unicode encoding - This is the future unified codeset for Arabic, but not yet fully supported under all environment (and not yet under AraWebParse...)
    This is very important to correctly set the Input codeset while using AraWebParse, otherwise the result will have no sense.
    Output Codeset This is the Codeset in which AraWebParse will present you the Web page after having loading it. It could be :
    • ISO 8859-6 - Useful to convert HTML web page created in an other codeset to the ISO Standard
    • VISUAL - This is not really a codeset here, but a way to present you the Arabic web page in a "Readable" format, including Bi-Directionality (Bi-Di) processing, Automatic shape determination (Shaping), and Loading of Arabic Dynamic fonts.
    Global Direction As you probably already know and contrary to Latin-based languages, Arabic text is written from right to left. Because of this fundamental difference in writing direction, AraWebParse allows two kinds of sessions :
    • The Latin (left to right or LTR) type session where the initial starting position is located at the leftmost position of the browser window, and text is read from left to right.
    • The Arabic (right to left or RTL) type session where the initial starting position is located on the rightmost position of the browser window, and text is read from right to left.
    Of course, this is a main screen presentation, and in all cases, the implicit writing direction of each character will be respected.
    Number Arabic digits, or Numerals, are always written from left to right, as in Latin languages, but their shapes may be displayed in either Hindi (Used in the Middle East) or Arabic (or Latin) digits depending on the choice of the user.
    Processing Mode This mode allows to consider that all number are digit (and and could be computed) in the DATAPROC mode, or number are just string and their shape are unchanged according their storing value in WORDPROC mode - This is the difference you may see between the ASMO 708 and the ASMO 449+ codeset.
    Wrap line AraWebParse is handling flow of HTML document. It cannot know the real line length displayed on your browser in pixel. Therefore, when AraWebParsing in RTL mode, visual line are sent from the left side (i.e. End of the logical line) to the right side (Begin of the logical line). If this visual line is too long for your browser, this last one will cut the line somewhere on the right side. This has for effect to display the logical beginning of a line on the next line, which is not correct. To avoid this default browser behavior, you may force to cut all logical line read by AraWebParse to a certain amount of character, such as 80 for example. This is not the most elegant solution, and it may request you to tune this number according the page you are AraWebParsing, but it works.
    Neutral Characters Arabic and Latin characters conflict in the direction of the display. When writing Arabic in an English line, characters are pushed on the line as they come from the keyboard or from a file. The reverse effect happens when entering an English character in an Arabic line.
    The user may define neutral characters which will follow the global writing direction despite their language value. This feature is useful in RTL mode, for characters such as (),{},[], space...
    Browsing Mode This option is really powerful. In VISUAL mode, It allows you to load a web page on your browser and dynamically change all link of this page in order to recursively continue to browse them under AraWebParse, using the same option as the mother Web page. Of Course, some complex JavaScript cannot be recognized - See the known limitation .



Links

  How to use it from your site:

    In order to use the AraWebParse tool from your web site, you may just insert a link as follow (in one line and without spacing):

       <A HREF=http://www.langbox.com/cgi-bin/langbox/webparse?
       url=http://your.domain.com/your_file.html&
       codein=MSCP1256&codeout=VISUAL>
       http://your.domain.com/your_file.html</A>
                              

  Sample Arabic site parsing:




Know Limitations

  • Some problems in recursive mode in handling absolute/relative files names - Still under testing
  • Recursive mode and Java Basedir is not supported
  • Recursive mode and JavaScript internal path for filenames is not supported
  • Recursive mode and Links to non-html (such as images) filenames is not supported
  • Display of Arabic text is done by text segment. On the same line, the segment visual order might be not correct in RTL

  • ... Please send pbm to support@langbox.com - We will appreciate any additional comments you might have in order to increase this performances of this service - Thanks!

 
Sitemap | Arabic | AraWebParse | Axmedit | Linux | Solaris | Search | FAQ | Jobs | Contact Us
© 1996-2011 LangBox International