6.9.6 Internationalization and localization

A program may need to communicate with its user in the user’s language, and it may have users with different languages. We do not want to produce one version of the program for each language, so we write one internationalized program that can use localization features to communicate in the user’s language.

Apart from the words mentioned here, you will probably want to use Unicode to write the localized strings; you probably do not need to use the xchar words (see Xchars and Unicode) in that context, but they are there if you need them.

Moreover, you may need to put placeholders (e.g., for amounts of currency) in localized strings that you substitute for the real values later. The word substitute and friends (see Substitute) have been designed for that purpose.

The basic idea in an internationalized program is that instead of, e.g.,

." Please enter your name:"

you write

L" Please enter your name:" locale@ type

In the following examples, we use the locales defined with

locale: de    \ German (generic)
locale: de_AT \ German (as used in Austria)
locale: de_CH \ German (as used in Switzerland)
locale: de_DE \ German (as used in Germany)

In addition, there are the locales program and default.

You can activate a locale, i.e., make it the current locale, with, e.g.,

locales:de_AT

Note that, unlike most Forth words, locales are case-sensitive, so locales:de_at would not work.

In the following examples, we use the following code to output localized strings, after first setting the strings (shown later):

L" Please enter your name:" locale@ cr type
L" cauliflower" locale@ cr type
L" street" locale@ cr type
L" something else" locale@ cr type
L" bank [geography]" locale@ cr type
L" bank [finance]" locale@ cr type

When the current locale is de_AT, the output is:

Bitte geben Sie Ihren Namen ein:
Karfiol
Straße
something else
Ufer
Bank

When the current locale is de_CH, the output is:

Bitte geben Sie Ihren Namen ein:
Blumenkohl
Strasse
something else
Ufer
Bank

In the localization data used for this example (see below), most of these localizations are inherited from the locale de, with the only de_AT-specific localization being “Karfiol” and the only de_CH-specific localization being “Strasse”. There is no de and no default localization for “something else”, so the text in the string L" something else" is used (we always get that if we use the program locale).

So there is a sequence of fallbacks for looking up localizations: For the general locale X_Y, the first fallback is to X, next to default, and finally to program. If the current locale’s name contains no underscore, the fallback sequence starts at default.

The locale default is a fallback if there is no more specific localization (typically, if the localization is missing), as for “something else”. In many cases (as for “something else”) there is no default localization, and the fallback continues to the L" string. But in some cases (e.g., in L" bank [geography]" and L" bank [finance]") these strings are too developer-oriented, and we put a user-oriented string (“bank” for both) in the default locale.

So how do we provide the localized string for a given program string? A simple way is to use locale!, e.g.:

locales:de_AT
"Karfiol" L" cauliflower" locale!

However, defining localizations for all L" strings with locale! is cumbersome and error-prone (you have to use the exact same spelling for L" strings with locale! as before locale@ in the code).

A better alternative is to use locale-csv-out after loading the program to save all its L" strings and all the existing localizations to a CSV (comma-separated values) file. The first column of this file contains the L" strings, the others the various localizations for that string. E.g., the CSV file for the example in this section contains:

"program","default","de","de_AT","de_CH","de_DE"
"bank [finance]","bank","Bank","","",""
"bank [geography]","bank","Ufer","","",""
"Please enter your name:","","Bitte geben Sie Ihren Namen ein:","","",""
"cauliflower","","Blumenkohl","Karfiol","",""
"street","","Straße","","Strasse",""
"something else","","","","",""

The first line contains the names of the locales. Many entries contain empty strings; in that case, there is no localization for the L" string for the locale of that column (and locale@ will use the next fallback that is not empty.

The way to add localizations is to edit the CSV file. This is not easy with a simple text editor if there are many columns. One way to work around this is to use a spreadsheet program or an editor that has good CSV support. Another way is to distribute the localizations across several files (which is also better for letting several people work on the localizations at the same time). E.g., one file could contain the localizations for de and its variants, while another could contain the localizations for fr (French) and its variants.

Once a localization has been edited into a CSV file, one can load the CSV file with locale-csv. All the locales mentioned in the CSV file will be defined automatically; if you use locale-csv, do not use locale: afterwards for the locales in the CSV file.

The lines in the CSV file are ordered by the order in which the L" strings are found by Gforth. If the CSV file is generated just from the L" strings in the program, this order may be helpful for producing the localizations, because related L" strings are grouped together. If, during maintenance, new L" strings are added, and you first load the CSV file, then the program, and then write out a CSV file, you will find the new L" strings (for localizing) at the end of the new CSV file.

A locale-string identifier (lsid) is an opaque token that occupies a cell, and it identifies the L" string.

L" ( Interpretation "string<">" – lsid; Compilation "string<">" –  ) gforth-experimental “l-quote”

At text interpretation time, parse string. At run-time, push the lsid associated with string. Each string has a unique lsid. If no lsid for the string exists yet, a new one is created. If an lsid for the string exists already, that lsid is returned. This means that one can refer to and use the same lsid with L" in different locations in the source code, by uing the same string. If you want two different lsids (e.g., because you refer to two different concepts), but would use the same user-centric text in L", append " [specifier]" to the text, e.g. L" bank [finance]" or L" bank [geography]". You may then want to add a user-centric non-unique default localization (e.g., “bank”).

locales ( ) gforth-experimental

This case-sensitive vocabulary contains the locales. Typical use: locales:locale.

native@ ( lsid – c-addr u  ) gforth-experimental “native-fetch”

c-addr u is the L" string for lsid (i.e., the text-interpretation argument of L").

locale@ ( lsid – c-addr u  ) gforth-experimental “locale-fetch”

c-addr u is the localized string for lsid in the current locale. If no localized string is found in the current locale with a name of the form X_Y, lsid is looked up in locale X. If no localized string is found in the locale X, lsid is looked up in the locale default. If no localized string is found in the locale default, lsid is looked up in the locale program (i.e., c-addr u is the text-interpretation argument of L").

program ( ) gforth-experimental

locales:program becomes the current locale. When this locale is current, locale@ produces the string used for identifying the lsid (i.e., the string parsed by L"). This locale is useful for development: One can see which lsid is used in which context.

default ( ) gforth-experimental

locales:default is the default locale if the user has not set one. Most lsids don’t have a specific default string, so fallback to the program locale happens. But if you have a developer-centric program string that is inappropriate for end users (in particular, if the program string contains an extra specifier), you will prefer to define a user-centric string in the default locale.

locale-csv-out ( "name" –  ) gforth-experimental “locale-csv-out”

Create file name and write the locale database to this in CSV format.

locale-csv ( "name" –  ) gforth-experimental “locale-csv”

Import comma-separated value (CSV) table into locales. The first line contains the locale names (column headers). The program locale must be leftmost. Fallback locales like de must precede more specific locales like de_AT. The other lines contain the L" string (first column) and the corresponding localizations. Each column contains the localizations for a specific locale. Empty entries mean that this locale does not define a localization for this L" string, resulting in using the localization from a fallback locale instead.

.locale-csv ( ) gforth-experimental “dot-locale-csv”

Write the locale database in CSV format to the user output device.

locale! ( addr u lsid –  ) gforth-experimental “locale-store”

After executing locale!, the localized string for lsid in the current locale is c-addr u.

Locale: ( "name" –  ) gforth-experimental “Locale-colon”

Defines a new locale l with name name in locales.
name execution: ( – ) l becomes the current locale.
For locales with names of the form X_Y, define X first in order to establish X as a fallback for X_Y.