Something in HoneyComb!

Hi, I’ve just received a strange official answer from a google guy regarding our issue.

I’ve submitted a BiDi patch for review in Android’s Gerrit. The reviewer, Kenny Root, explains that text layouts received a major change for HoneyComb and I should wait for HoneyComb’s source to be release before submitting works on this area. Interesting what is it, but BiDi is one of the biggest gapes in Android’s text handling so this should be relevant…

We’ll need to wait for HoneyComb to see. Meanwhile I’m targeting previous releases (Gingerbread, FroYo, Eclair) and the modders, and preparing to submit a patch for WebKit BiDi and Arabic reshaping….


Gingerbread – Another Sad Story

Many people asked me.. sorry to say but Gingerbread made no progress regarding RTL languages, probably no progress in internationalization at all.

UPDATE: HoneyComb too (thanks PapaDocta).

Google is busy hunting manufacturers and expanding Android adoption in tablets, and before that it was busy doing performance polishing in FroYo. I guess we, open-source guys in the middle east, are the only realistic source for a solution.

I’m saying again: anyone seeking to solve the problem of RTL languages in Android should upload his patches to Android’s Gerrit Code Review and insist on them being approved. This makes the change an immediately-available official work. Another good place is CyanogenMod’s Gerrit because CM is used as a basis for many mods.

My Solution II – Arabic Reshaping for Android (in progress)

Now the bigger part is integrating reshaping without messing up things.

Integration of Arabic Reshaping

First, lots of the logic in StaticLayout should be replaced.

When StaticLayout breaks text into lines it simply measures each character alone and sums individual character widths up in each word, applying rules from the Unicode Text Segmentation standard. Lines are sequentially extended to contain max number of words that fit.

After breaking words, Reshaping should be performed where necessary so that Arabic (or Arabic-like) words are measured correctly. This is especially important for lam-alef and other combinations that significantly affect displayed size of words (different shapes of the same Arabic letter might have different sizes). By that we successfully patch StaticLayout to break lines correctly. I removed the whole line-breaking logic and rewrote it in a separate class android.text.LineBreaker (not ready yet).

The next and not less important part of the job lefts off from Layout. Methods that inform cursor positions should be also modified to work according to Unicode Text Segmentation (currently they don’t), specifically letter boundaries (Unicode calls them Grapheme Boundaries). They should handle special cases like not stopping inside lam-alef sequences. They should coordinate real text indexes with indexes in reshaped forms.

Next, we should modify android.text.Styled to perform Arabic reshaping when necessary. Styled is used by layouts to measure text and draw it; it is the medium between Layout and the abstract graphics layer.

Performance Considerations

The text itself is always stored in its bare form in layouts. Reshaped form is necessary when doing measurements (cursor movement or placement) or drawing. Since the same layout might be redrawn several times and cursor movements are performed often, caching is logically favorable. I can think about big caches in Styled for general strings, or a cache in each Layout object that stores the shaped form of the text (only reshaped parts are stored).

Work Progress

  • Patch StaticLayout to break lines correctly. done
  • Patch Layout and Styled to measure and draw Arabic reshaped. pending
  • Patch Layout to do mark selections and move cursors correctly. pending

I haven’t uploaded any code changes regarding this step. I’ll do so incrementally.

My solution I – BiDi for Android (done)

As I explained before, the solution is branched off stock Android source’s master. This is the source on GitHub, the patches are under the bidi-dev branch.

Android already includes ICU project, which has solid implementations of the Unicode BiDi algorithm and reshaping for Arabic too, into its source. ICU for Java (icu4j) isn’t there, but android.text.AndroidBidi provides native access to ICU for C’s BiDi algorithm (performance impact, anyway..). I’ll consider adding icu4j later, but the decision should be careful because the icu4j jar is 6MB+ size (huge in Android terms).

android.text.Bidirections was intended to replace Layout.Directions but modifying public API breaks the build and requires a special build ‘make update-api’ to update the public API index and carry on. For now Bidirections is simply wrapped in Directions, and I’ll change that if Gerrit reviewers agree.

The dumb BiDi analysis in StaticLayout is deprecated. BiDi is now performed externally by the wrapper around ICU’s library. Eveyrthing in Layout that reads directions is modified to correctly use BiDi data (visual order, logical order, reversals, …).

With that part done, I can safely say that my patches provide a reliable handling for Hebrew in Android much like that in your windows. That is because Hebrew is RTL but doesn’t require reshaping.

Next step: Reshaping for Arabic characters.

Issues With Current Solutions

All current solutions are either incorrect or partial.

Since Android’s current current handling of bidirectional text is equivalent to a two-level restriction on Unicode BiDi, any patches on it do not solve the problem. The whole Layout.Directions should be replaced and methods using it in Layout should be modified accordingly. This isn’t a system-wide change because Directions is never used outside the android.text package, mostly in Layout and its subclasses.

Reshaping letters is the biggest common mistake. Almost all available patches (including those in CyanogenMod) perform reshaping in the graphics layer, either the abstract graphics (Canvas etc.) or the native graphics (Skia engine).

Text is handled by layouts. When a text is actually drawn, the action has mostly been requested through a layout object. Layout analysis includes many things like breaking lines and measuring contents. The graphics layer simply draws a sequence of letters to some rectangular area on the screen. If the low graphics layer modifies the text, we get unexpected behavior like blanks/gibberish at end of line, crazy cursor movements, and an inconsistent display.

Modifications regarding reshaping should go in layouts much like BiDi. This is because a layout must be aware of the displayed form of the text in order to keep everything consistent, especially measurements and cursor movement and placement (for example, place cursor in correct index when you touch a text).

To apply real BiDi and shaping, we should first go back to stock versions of all purpose-modified files (Layout, StaticLayout, Canvas, …). We should revert any changes to Skia too.

Current BiDi Handling in Android

Android’s current handling of BiDi (Bidirectional) text is very lacking.

Currently, the layout of any text containing RTL characters is android.text.StaticLayout. It divides a line of text into segments and the segments are always displayed in their original order, each either straight or reversed. Effectively this is equivalent to a two-level restriction on the full Unicode BiDi algorithm.

For example, the sentence I have followed by פורשה followed by 80 (i have a 80′ porsche) will be displayed I have פורשה 08 instead of I have פורשה 80. The full Unicode BiDi algorithm give the english part a level of 0, the arabic part a level of 1, and the number part a level of 2. But android’s implementation results in two levels only, simple giving the numbers the same levle as the Hebrew part and flipping them alongside as a single RTL segment of text.

Reshaping is changing a letter’s form according to the context (Initial, median, final, and isolated forms). It is relevant to Arabic and arabic-like languages like Persian and Urdu. Reshaping might combine several letters into one: like ل followed by ا into لا.

Naturally, Android doesn’t support rehsaping too. The system always assumes that every character is displayed as-is, and that the number of character in the text doesn’t change visually.