BiDi Mozilla

Last events
Developers
Specifications:
General BiDi
Arabic Specific
Hebrew Specific
 
Reference and Related Specification
Open Issues
Free Resources
Schedule
Back to The Mozilla Language Enabling Prj
Back to The Mozilla
Organization

Mozilla Language Enabling Feature
Arabic/Hebrew (Bi-Di) language Enabling
status update

Editor: Franck Portaneri <fportaneri[at]langbox.com>

Last Update: April 18th, 2002
Original page NEW location : http://www.langbox.com/bidimozilla 

 

This status update page is updated weekly, mainly according the
news:netscape.public.mozilla.i18n discussions.
To add updates and news, you are invited to mail your input or text to me.

Last Events

Also, the Hebrew mozilla l10n team (http://www.mozilla.org.il ) has started working, and it is reporting (in Hebrew) quite a bit bugs about rtl User Interface...

Lina's comments are:

"Regarding the latest gfx code, forgot to mention one important thing.

In the meantime, BiDi gfx changes ensure correct rendering only on Win platform.
On all the other platforms, as I suppose, BiDi text would be displayed correctly in the following cases:

  • -- non-BiDi platform and visual text mode,
  • -- BiDi platform and implicit text mode.

Also, the Windows specific code should be enhanced by checking whether to reverse text on the basis of the embedding level, instead of testing for the presence of Hebrew characters as at present.


Summary of Bidi-related processing implemented in this code :

    • Parsing: consume text token taking into account BiDi category.
    • Retrieve "CSS display" part of the style sheet.
    • Content model: resolve text classification of each token, taking the CSS display property into account.
    • Frame model: sort frames according to the resolved embedding level of their content.
    • Rendering context: ensure that the text is displayed in the correct order.

 

To use this tar file, untar it in the mozilla directory (It overlays files!) Then build it (Note that Bidi is turned on in config.mak by definining IBMBIDI.)

  • Feb 10, 2000 : Matitiahu Allouche from IBM Israel posts a long document on the Bi-Di Mozilla status and his proposed design : This is a very complete and detailed document which presents:
    • Issue raised in past discussions
    • Principle for BiDi support in Mozilla
    • Proposed Design
    • Division in Tasks (Matitiahu mentions too that IBM Israel is working on Task T1, T2, T3, T4, T5, T6 and T13)

    --> Please send your remarks and comments...

  • Feb 9, 2000: Maha Abou El-Rous from IBM Egypt reports a status and gives a plan for BiDi mozilla project.
    • Status:
      • We have a preliminary design for the required changes for the User Interface, once finalized, it will be posted for your feedback/comments
      • Layout text
      • Layout table
      • Listbox, single element listbox, and scrollbar
      • Investigating CSS Bidi attributes
      • Arabic Shaping Engine

    • Plan:
      • Task Module
        Code Page Intl
        Font Support -
        UI: Preference page -
        UI: View menu - Bidi option -
        UI: Character set menu additions XPFE
        Layout Text htmlparser, layout
        Layout Table layout
        Controls: Layout RTL layout
        Controls: Bidi Multi/single line editor editor
        Search layout
        Selection layout
        Clipboard operations -
        Shaping: Character shaping gfx
        Shaping: Numeral shaping gfx
        HTML,CSS,XUL Bidi attributes htmlparser, layout

    --> Please send your remarks and comments...

  • Jan 31, 2000: Mike Kaply from IBM informed me that : "The Bidi team is doing design work and they are getting ready to post some stuff. Unfortunately, with the holidays and some other personal issues, there have been some slight delays, but we are looking to post the task list and design very soon."...

  • Dec 8, 1999: Great news : Mike Kaply from IBM announced that IBM have assembled a staff of people in Israel and Egypt (Yaacov Akiba Slama and Maha Abou El-Rous) to help with Bidi work in Mozilla. This group already worked on Bidi enablement for the Netscape 4.X products on OS/2 and Windows...
    This new team introduction reactivated the BI-DI, Arabic, Persian and Hebrew (Logical/Visual) discussions on the news:netscape.public.mozilla.i18n.

  • Jonathan Rosenne also forwarded a message from the Hebrew SIG of the Israeli Internet Association, whose first meeting will address Hebrew and the Internet, with Mozilla as the first priority. 
 
Developers
Feature Owner:
Alexander Khalil  <iskandar@ee.tamu.edu>
Franck Portaneri <fportaneri[at]langbox.com>
WinFE:
Barak Ori <barak@comfy.co.il>
XFE:
Franck Portaneri <fportaneri[at]langbox.com>
Mark Leisher <mleisher@crl.nmsu.edu>
MacFE:
Adil Allawi <adil@diwan.com> starts an in-house project and might show a beta at the Gitex show. He is open to a collaboration with the Mozilla team.
XP:
BI-DI : Michael Kaply <mkaply@us.ibm.com>, Yaacov Akiba Slama <slama@il.ibm.com> and Maha Abou El-Rous
Arabic : Franck Portaneri <fportaneri[at]langbox.com> and Mark Leisher <mleisher@crl.nmsu.edu>
Hebrew : Dotan Dimet <dotan@usa.net>, Ariel Backenroth <arielb@rice.edu>
QA:
Alexander Khalil  <iskandar@ee.tamu.edu>
Anoosh Hosseini <anoosh@gpg.com>

JKL <jklnet@usa.net>
Doron Shikmoni  <doron@erez.cc.biu.ac.il>
Jonathan Rosenne  <rosenne@qsm.co.il>
Dov Grobgeld <dov@orbotech.co.il>
Ariel Magnum <amagnum@bigfoot.com>
Shay Elkin <antil_za@mailandnews.com>
Roozbeh Pournader <roozbeh@sina.sharif.ac.ir>  
Document:
Alexander Khalil  <iskandar@ee.tamu.edu

You want to participate :

  1. Visit on the mozilla.org site and specially http://www.mozilla.org/community.html
  2. Subscribe to the netscape.public.mozilla.i18n newsgroup ( mailto:mozilla-i18n-request@mozilla.org?subject=subscribe)
  3. Have a look on the http://www.mozilla.org/docs/refList/i18n/scripts.html and http://www.mozilla.org/docs/refList/i18n/schedule.html
  4. Download the source tree and build it on your system
  5. Contact the project owner by e-mail, cc mozilla-i18n@mozilla.org to introduce yourself.
Specifications

The main support is common for Arabic and Hebrew because of the Bi-Di (Bi-Directionality) specificity of both languages.
Of course, the charset is not the same, as well as the latest rendering process which is more complex for Arabic due to the "glyph shaping determination". So, this part of the document is split in two sections - Arabic and Hebrew :

General BiDi
IBM Code review
Jan 08, 2001
by Steve Clark <buster@netscape.com>

Last Thursday, I held a meeting a design and code review meeting regarding the Bi-Di code submission from IBM. Thanks to all those who attended and sent me feedback. Here's a summary of where I think we are today.

1. Architecture

The overall design of the new code is fine, as far as we can tell. There are plenty of things that need to get fixed, but the basic concept is perfectly acceptable. However, there were a few issues that do need to be addressed before we could include the code on the trunk.

A) platform-specific code

In general, we do not allow #ifdef PLATFORM code in XP modules. You need to factor out the platform-specific portions of your code, and isolate platform code in it's own module. Then the build system can do the right thing at build time, without polluting the XP modules with tons of #ifdef code. Along these lines...it is absolutely *not* required that you implement Bi-Di on all platforms. However, your implementation should strive to be free of platform-specific assumptions, so that others can implement it on their systems. Erik has volunteered to help validate your design against other platforms (I think he volunteered to validate Linux himself, and he "volunteered" Frank for Mac.)

B) illegal dependancies

You added a dependancy between layout and the view system that isn't legal. Kevin Mcclusky can provide the details, but basically you are making bad assumptions about frames in the view code. Kevin, please elaborate.

C) misuse of interfaces

You have added concrete functions and member variables to several interfaces. This is illegal. XPCOM interface are abstract contracts that cannot include this sort of implementation. Also, you should not have #ifdef blocks on an interface. An interface is a public contract that sometime soon (probably Mozilla 1.0), will become immutable. It cannot depend on compile-time switches. If you need optional additional functionality, it has to be on a new interface that is optionally a subclass of whatever concrete class needs to support the methods.

2. Documentation

One thing that makes reviewing a submission of this size very difficult is a lack of documentation. Some of the individual code blocks are well documented, but there is no overview to guide us. To get this code successfully integrated into the branch, we need 4 levels of documentation:

A) an overview document.

This need not be long, or formal. Just something to help us understand the philosophy behind the changes. Where are major pieces of data stored (such as knowing whether Bi-Di is enabled, or required for a particular page?) What classes do which portion of the work? What work exactly is being done (i.e., frame reordering.) I don't think the overview document needs to be complete and polished before the code can go in, but I do think something is needed before the next round of reviews.

B) interface documentation.

Though we're not always good at it, we do try hard to get all major classes and public interfaces thoroughly documented. It would be a big help if each new method had a comment block that described what the method did, its arguments, it's return value, and any possible side effects. We urge people to use a javadoc syntax, because there are tools that automatically build documentation from such comments. See nsIFrame.h for an example of a fairly-well documented interface.

C) code-level documentation.

For the most part, the submission was pretty good about including appropriate code-level comments. More is better, of course. In particular, documenting the use of member variables inside of classes is very helpful.

D) adhering to coding conventions.

Parts orf the submission were very poor at sticking to the mozilla coding conventions. This makes the code much more difficult to read. Please see http://www.mozilla.org/newlayout/doc/codingconventions.html

3. Performance

One of the biggest concerns is the impact on clients that are not interested in providing Bi-Di support. Let's break this down into several categories:

A) code size

Clearly, clients that are not interested in supporting Bi-Di should not have to pay a significant penalty for the additional code required for Bi-Di. The two ways we can think to minimize the impact are to factor as much as possible into a separate library, or to leave significant code chunks in #ifdef BIDI blocks. I'd like to urge people to think about which code could reasonably be factored into it's own library, since the support costs for #ifdef code is high.

B) memory usage

Reading the code, it doesn't look like the Bi-Di code adds any significant amount of bloat. We'll have to take measurements once it's integrated to validate, but so far, it looks good.

C) performance

Most reviewers are less concerned with the performance of the code when Bi-Di is required, than the impact of the code when Bi-Di is not needed to lay out a page. There seemed to be a few areas where Bi-Di code was being executed unnecessarily. These could probably be fixed by simply checking whether anything on the page warrented Bi-Di calculation before executing the new code.

4. Implementation problems

There are plenty of minor problems that need to get fixed. Too many to put in a newsgroup posting! But here are some general trends:

A) memory leaks

There are a few places where you leak objects because of early returns in a method. Using nsCOMPtr would prevent this.

B) null pointer checks

There are many places where pointers are used without first being checked for null. These include new allocations, method parameters, and returned out-parameters from function calls. At a minimum, assertions need to be added to validate the pointer. And unless you're guaranteed the pointer must be valid, you should put in a null pointer check and return an error if null.

C) 64-bit compatibility

Chris Waterson noticed some code that seemed to make bad assumptions about 32-bit pointers. We already have one 64-bit system, and in general we strive to avoid assumptions about the hardware. Chris, could you elaborate on the specifics here?

I'll foward individual comments separately.

Proposed plan
Dec 11, 1999
by Franck tang <ftang@netscape.com>

Frank Tang propose that the priority should be :

  • 1. Add XP bidi engine- grab from somewhere- free-bidi or the pretty-good-bidi
    Mark Leisher did an excellent comparison page here
  • 2. Look at layout code- resolve directionality and break text in different direction into different text frame.
  • 3. Add directionality attribute into text frame
  • 4. We already flow text frame depend on the DIR, so we probably don't need to change that part.
  • 5. Make sure the LTR text frame call GFX DrawString from left to right
  • 6. Fix GFX bugs.

Detail Design: Find public source code or write new code from scratch for the Bi-Di API

Three codes are free or almost open source now. They are the following in the order of their announcement:

Mark Leisher did an excellent comparison between these package results as well with IE 5.0.

However, under such systems, the GUI side (dialog boxes, text input forms...) will behave only in Latin (no dual keyboard management) - This pbm has to be fixed at the GTX level.

Here after is some details on these codes :


19-Nov-1999: Mark Leisher <mleisher@crl.nmsu.edu> announces the Version 2.3 of the UCData package, which includes the PGBA.

What is the PGBA? The PGBA is a small, simple, and fast one-pass Unicode bi-directional text reordering algorithm that works "pretty good" for most text. It provides an effective alternative to the Unicode Bidi algorithm for implicit reordering of bi-directional text. It does not currently support the explicit bi-directional codes available in Unicode. Support for logical and visual cursor motion through the reordered string is included.

Some problems with the PGBA have been fixed, speed has been improved, the code has been reduced in size and made somewhat clearer, a man page for the bidi API has been added, and the documentation has been improved a bit. The README file in the distribution details the changes. The home page will eventually have a section showing the results from the PGBA, the IBM ICU bidi implementation, and the FriBiDi implementation.

See http://crl.nmsu.edu/~mleisher/ucdata.html for documentation and download.

7-Oct-1999 : Mark Leisher <mleisher@crl.nmsu.edu> announced the availability of "Pretty Good BiDi Algorithm." Version 2.1 to its UCData freeware package. The good news is that Frank Tang  did the embedding of UCData 1.9 to the Mac, Win and Unix XPCOM in April 1999.

Mark Leisher says: << ... This release provides some bug fixes, and update for the new (apparently undocumented?) Unicode 3.0 bi-directional categories, and the addition of the "Pretty Good BiDi Algorithm." The PGBA is an elegant and simple one-pass BiDi reordering algorithm that works pretty dang good for most text. It has some deliberate, but (hopefully) minor shortcomings just so developers who use it have something to keep them occupied :-) The PGBA is in no way related to the Unicode BiDi Algorithm except by coincidence.

IMPORTANT: The PGBA is dependent on UCData because of the interpretation of certain 3.0 BiDi categories. To be explicit, the following BiDi category assumptions are made when building the character type data file:

  • "AL" is equivalent to the "R" property.
  • "BM", "NSM", "LRE", "RLE", "LRO", "RLO", "PDF" are all equivalent to the "ON" property.

If your character type package of preference has these assumptions, then using the PGBA will be no problem.

>>

Short and simple info page: http://crl.nmsu.edu/~mleisher/ucdata.html
The distribution is available in .tar.gz and .zip form from:

http://crl.nmsu.edu/~mleisher/ucdata-2.1.tar.gz
http://crl.nmsu.edu/~mleisher/ucdata21.zip
ftp://crl.nmsu.edu/CLR/multiling/unicode/ucdata.tar.gz
ftp://crl.nmsu.edu/CLR/multiling/unicode/ucdata.zip


3-Nov-1999: Markus Scherer <schererm@us.ibm.com> from IBM Cupertino mentioned that ICU have the Unicode 3.0 BiDi algorithm implemented since the end of september and since ICU 1.3. The current version is ICU 1.4.2. Mark Leisher did some testing on it. If someone tried this BiDi API, please send feedback on it.


15-Jan-1999 : Dov Grobgeld <dov@imagic.weizmann.ac.il> announces the first alpha version of FriBidi, a Free BiDi library that adhers closely to the Unicode BiDi algorithm. See http://imagic.weizmann.ac.il/~dov/freesw/FriBidi for more info.

Detail Design : Use an HTML Explicit or Implicit description of the RTL management

    This part should determine if Mozilla Arabic support expects that all the RTL/LTR management is done as :
      explicitly :
      i.e. only forced through <dir> HTML tags and directives as described in HTML 4.0 proposal.

      implicitly :
      i.e. meaning that if the charset definition is something like :
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-6">
      then the default direction is forced to RTL (Right justification)
      both allowed :
      with the introduction of something like :
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-6">  for Implicit
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-6-e"> for explicit
    But this point should be in accordance with the HTML 4.0 definition. Please send you feedback here, this is really an open subject that need more input and discussions...
Detail Design: Extend the Mozilla layout source code with the Bi-Di API (By Franck Portaneri) -
    The API function calls must be embedded within the Mozilla source tree to get the Bi-Di and Arabic support build-in. This is a complex part where the following issues must be taken in account:
     
    • Dissociate the "Bi-Di" and "Glyph Shaping" process (to allow both Arabic and Hebrew support)
    • Work on full paragraph context (merge all text segments of a paragraph in order to do the rendering process)
    • Embed the "Output Rendering" process on the text display level.
    • Embed the "Text Selection highlight" process on the text display level.
    • Embed the "Mouse Position handling" process at the mouse pointing level (for selection operation)
    • Manage the full RTL presentation : Right alignment, Scroll bar sliding reversed....
    • Check the Printing subsystem and contribute with the "UNIX Non-Latin1 Printing Enhancement" module owner.
    • Take care to the coexistence with an BiDi Operating system and avoid conflicts
Detail Design : GFX code extension for Bi-Di (by Frank Tang)

18-Aug-1999, Frank Tang  fixed some bugs on the MacGFX for Unicode BiDi rendering. The Screen shot results are as follow :


for Arabic

and for Hebrew.


13-Sep-1999, Frank Tang  worked a lot on the WinGFX for Arabic and Hebrew

13-Sep-1999, The same code gives this under a Linux machine and the LangBox Font : Some bug have to be fixed in the Font Mapping.

To test this, try to use Mozilla to look at:

Detail Design : Text Selection algorithm for BiDi (by Mark Leisher)

The way we implemented it, the selection model depends on whether it is being done in a visual or logical fashion.
The trick with doing visual selections is not to extend the selection until a whole run of text in one direction has been crossed. For example if we have the text LLLRRRLLL, starting the selection in the left L section and dragging right will not select the R section until the the right L section is reached.

  • Problems: The selection can only cross directional boundaries by including whole runs and implementation is a little tricky.
  • Benefits: This avoids multiple selection boundaries and is behavior that some users expect.

Using the same text for logical selection, starting in the left L section and moving toward the right, when the left side of the R section is reached, the whole section is selected and moving further right, the R section is deselected a glyph at a time until the right L section is reached. At that point, the whole R section is selected again.

  • Problems: Strange selection highlighting behavior that confuses users.
  • Benefits: Can select portions of text across directional boundaries, implementation is easier, and this is behavior that some users expect (it is BTW, the way adopted on Arabic enabled Windows).

 

Arabic specific:

Document Charset:

There are several charset commonly used on the web for Arabic/Hebrew languages. We decide to support the following:
 
Unicode : See http://www.unicode.org
It is next generation charset standard : The new layout engine, NGLayout, uses UCS-2 internally (in contrast to the current layout engine which internally dealt with multiple encodings).
Mark Leisher <mleisher@crl.nmsu.edu> is working on this specific issue.
 
ISO-8859-6 : See http://www.langbox.com/codeset/iso8859-6.html
It is international standard adopted by the Arab Community as well under the UNIX X11 and Mac environment. It is common used in many web site, such as :
ASMO 449+ : See http://www.langbox.com/arabic/asmo449.html
It is national standard and fully compatible with ISO 8859-6.  All sites using ISO 8859-6 are directly readable under this format. However, some additional characters (Arabic digits, punctuation signs... are added in this ASMO codeset)
cp1256 : See http://www.itsnet.com/~qamus/codepages/codepage_win95.htm
It is the code page Window used for font, and supported by many web site, such as most of site developed or hosted under Arabic Windows machine...
Arabic-Mac Code Page  (Is there a specific name?) See http://www.itsnet.com/~qamus/codepages/codepage_mac.htm
It is the script code Macintosh used, it is compatible with ISO 8859-6 and ASMO 449+.

IRAN SYSTEM : http://sina.sharif.ac.ir/~roozbeh/farsiweb/iransystem.txt

Apparently, more than 90% of Persian pages on the internet are stored in this character set.

ISIRI 3342 : http://sina.sharif.ac.ir/~roozbeh/unicode/3342.txt

It is a Farsi codeset, not yet adopted by ISO, but by the Iranian Group of Normalization. It is also used on the Web with the PMosaic browser. It is the actual 8 bit standard for Farsi. The Farsi language cannot be managed by the ISO 8859-6 alone.

Mail Charset:

We decide to use ISO 8859-6 as Mail Charset since it is de-facto standard common to all platforms.

Front-End Font Encoding

For Arabic, there not really a Font Encoding definition, just because even if the codeset have been defined and fixed, the font itself must include much more glyphs than can appear in the codeset. This is due to the "glyph shaping" characteristic of the Arabic language. So according the different Software implementation, we can find different font set definition.  At LangBox, we used to have 2 levels of font encoding, according to the device font capabilities and the requested quality :
As for example, to read text on the web, the second set is quite enough. Now for publishing or printing purpose, it is preferable to use the first one.  Some ISO-8859-6-8 fonts are given with the AraMosaic browser on UNIX, and can be used with Mozilla.
So, we propose the following :
XFE
ISO-8859-6-8 or ISO-8859-6-8X (includes more shapes in a 8 bits font)
WinFE
Arabic Windows fonts (used under Arabic Windows license) - Or any Free TTF fonts (any pointer here???)....
MacFE
Arabic Mac fonts
Printing
ISO-8859-6-16

Host Operating Systems Consideration:

There is two types of host operating systems :
  • Standard (English) Operating Systems: (e.g. without any Arabic specific add-on) :
  • On these systems, the Bi-Di process must be done by Mozilla to display correctly HTML document, but all Operating System GUI will behave in Latin only (for <select...> , <textarea...> or <input...>  fields in forms, or for dialog box such as Edit/Find in Page...).

    The fontset must also be provided by Mozilla here.
     

  • Arabic Operating Systems : like Arabic Windows, Arabic MAC, or Arabic Language Module on UNIX
  • On these systems, the Bi-Di rendering process is already done within the XDrawString() (UNIX X11) or TextOut() (Windows) functions, and there is a potential risk that the Bi-Di process can be performed twice on the same string. This is not correct and will give garbled output. So there is two options here:

    • Disable the OS Bi-Di process from Mozilla before display text in the HTML page (preferred)
    • Just pass the original logical strings to the OS functions, but there in this case, the OS Bi-Di engine parameters (global direction, numeric type, diacritics, fontset...) MUST be the same as those assumed by the Mozilla Bi-Di engine.
The advantage to use an Arabic OS is that all GUI widgets and keyboard input will also work properly in Arabic. The System Arabic fonts could be used, or new font can be add, but according the same fontset that the system's one.

Detail Design : Introduce the Arabic new Charset :

 See the Frank Tang doc : How To Add Additional Charset : http://www.mozilla.org/docs/refList/i18n/addcharset.html

 

Hebrew specific:

This part has been directly created from the Dotan Dimet document : "A Proposal For Preliminary Hebrew Support In Mozilla" (URL??) where I made some light modification (Please Dotan, send me your comments)

Document Charset:

There are several charset commonly used on the web for Arabic/Hebrew languages. We decide to support the following:
ISO-8859-8 :
This is an VISUAL standard (according RFC1555) : Apparently, 98% (??? to be verified) of Hebrew language documents on the Internet use the webfont or visual encoding to display hebrew. This codeset is the same as ISO 8859-8-i, but the Bi-Di rendering process has already be done on the stored data within the HTML document. Thus, the Bi-Di process must NOT be done a second time, and we just have to display the data as is, using an ISO 8859-8 font set. This support should be very easy to implement and if there is really so much site that use it, it must be done first. However, the data cannot be used for editing purpose since the input sequence is lost.
It is common used in many web site, such as :
  • (any URL here...)
  • ISO-8859-8-i :
    It is international standard adopted under the UNIX X11, Windows and Mac environment. It is used in web site, such as :
  • (any URL here, guys...)
  • This codeset is an IMPLICIT codeset, meaning that the rendering process has to follow the Bi-Di algorithm to re-organize both Latin and Hebrew letters.
    ISO-8859-8-e:
    EXPLICIT encoding: apparently not used
    CP-1255
    Default under Hebrew Windows -

    Mail Charset:

    We decide to use ISO 8859-8 as Mail Charset since it is the standard to all platforms for data exchange (RFC 1555).

    Front-end Font Encoding

    XFE
    ISO-8859-8
    WinFE
    ISO 8859-8
    CP-1255
    MacFE
    ISO 8859-8

    Detail Design

     By Dotan Dimet (Email: dotan@usa.net )  (Modified by Franck Portaneri <fportaneri[at]langbox.com> - Dotan, any comments???):

    1 - Support of Hebrew Visual : This means adding support for "visual" display of the iso-8859-8 charset.

    Currently, most of Hebrew language documents on the Internet use the webfont or visual encoding to display Hebrew The Visual encoding method does not rely on the OS or windowing environment for Hebrew support. In fact, it actively ignores such support by requiring the user to install special fonts and the page creator to write his Hebrew text in reverse (if he's using an application with Hebrew support) and use HTML tags such as PRE and NOBR to handle line-breaking. Despite the hassle, this lowest common denominator de-facto standard is in such wide use that it has been ratified officially, and Israeli standard bodies have determined that the following META tag should be used to label such pages:
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-8">

    Mozilla doesn't recognize this tag. Or rather, when it sees it, it sets the encoding to "Western (iso-8859-1)", and treats the Hebrew text as a standard (Western) 8-bit character set, without applying any Bi-Di algorithm. However, if the special "web fonts" are chosen for this encoding, the pages will be readable.

    Problems with this method include line-breaking (must be controlled by HTML tags, must not be done automatically by the display), printing (on systems with Hebrew support the BiDi algorithm kicks in, reversing text), and font choice (the limited selection of special web fonts is rather ugly).

    The two big advantages of this method is that it should work on systems without any built-in Hebrew support, and that is the de-facto standard.

    The suggestion is to add support for this charset to the user interface. Instead of overriding the "Western" encoding, the user should have a separate entry for "ISO-8859-8 (visual)" where he can install his web fonts. A good improvement to this would be to bypass font/language association, and let the user use any installed Hebrew fonts to view pages. This in fact is what the Hebrew version of Internet Explorer allows you to do. You'll still need to install fonts if your system has no Hebrew support (and you'll still probably see the page title and any form elements as messed up), but if you have a Hebrew-aware system, you'll get more choice.

    The second level of this "Visual" support should be to make it available on Hebrew Operating systems by either disable the System Bi-Di rendering in the TextOut (or equivalent) function, or by performing a reverse-transformation on the Visual line to get back the logical (Implicit) one and let the OS render it correctly (but a little bit tricky and resource consuming).

    2. - Support Hebrew Implicitly: This means adding support for the logical or "implicit" interpretation of ISO-8859-8 Documents written in this method will not be reversed when viewed with applications that DON'T have an Hebrew support, it will be shown in the inputting order. The charset tag used should be"iso-8859-8-i", and the Bi-Di algorithm should be used to present this text. It consists in the support for codes that implicitaly set the text's direction (e.g. Latin, digit or punctuation mark characters are considered as LTR ("Left-To-Right") direction characters, while Hebrew characters are considered as RTL ("Right-To-Left") In fact, the Implicit coding represents and store the exact entry sequence of keys pressed by the user when he/she wrote the text. The support of this encoding is necessary for text editing.

    On operating systems with Hebrew support, this implicit support is already there, and the Hebrew text will be displayed correctly, but without Bi-Di support within Mozilla, the text selection for cut/paste operation, mouse pointing will not work properly. But here, we should take care that the Bi-Di process is not performed twice on the same line (in Mozilla and in the OS TextOut (or equivalent) functions).

    On standard (English) Operating systems, If you use a font that the system knows is Hebrew to look at some text in the browser, it will be displayed the way it was written (and then cannot be read correctly)

    3 - The Fiddly Bits: These include support for tricky directionality codes, HTML 4 stuff, CSS(?), Forms, and Javascript.

    4- The support of Hebrew Explicit: This is really an optional case. Apparently, it is not really used for Web document, unless someone can explain or gives some input here : It consists in the support for codes that explicitly set the text's direction (codes that exist in ISO-8859-8 and Unicode, as well as those in HTML 4) and that should be included to force specific nested LTR or RTL sub-string within a line. The Bi-Di algorithm's should attempts to interpret these codes and by-pass the implicit ordering of characters to render its output text. The charset tag used could be "iso-8859-8-e".

     
    Reference and Related Specification

    W3C Documents:

    RFC:

    Character Sets:

    MIME Charset Name

    Related Engineering Information:

    Related Information and Resources:

     
    Open Issues

    Any URL, pointer...??
    • How to desactivate or bypass BiDi system routines under Bi-DI OS (Arabic Mac or Windows for instance)

     

     
    Free Resources

    BiDi Algorithm Code

    XFE fonts:

     http://www.langbox.com/bidimozilla/fontXFE (See README file)

     
    Schedule

    To be determined ...