A program may need to communicate with its user in the user’s language, and it may have users with different languages. We do not want to produce one version of the program for each language, so we write one internationalized program that can use localization features to communicate in the user’s language.
Apart from the words mentioned here, you will probably want to use Unicode to write the localized strings; you probably do not need to use the xchar words (see Xchars and Unicode) in that context, but they are there if you need them.
Moreover, you may need to put placeholders (e.g., for amounts of
currency) in localized strings that you substitute for the real values
later. The word substitute and friends (see Substitute)
have been designed for that purpose.
The basic idea in an internationalized program is that instead of, e.g.,
." Please enter your name:"
you write
In the following examples, we use the locales defined with
locale: de \ German (generic) locale: de_AT \ German (as used in Austria) locale: de_CH \ German (as used in Switzerland) locale: de_DE \ German (as used in Germany)
In addition, there are the locales program and default.
You can activate a locale, i.e., make it the current locale, with, e.g.,
locales:de_AT
Note that, unlike most Forth words, locales are case-sensitive, so
locales:de_at would not work.
In the following examples, we use the following code to output localized strings, after first setting the strings (shown later):
L" Please enter your name:" locale@ cr type L" cauliflower" locale@ cr type L" street" locale@ cr type L" something else" locale@ cr type L" bank [geography]" locale@ cr type L" bank [finance]" locale@ cr type
When the current locale is de_AT, the output is:
Bitte geben Sie Ihren Namen ein: Karfiol Straße something else Ufer Bank
When the current locale is de_CH, the output is:
Bitte geben Sie Ihren Namen ein: Blumenkohl Strasse something else Ufer Bank
In the localization data used for this example (see below), most of
these localizations are inherited from the locale de, with the
only de_AT-specific localization being “Karfiol” and the only
de_CH-specific localization being “Strasse”. There is no
de and no default localization for “something else”,
so the text in the string L" something else" is used (we always
get that if we use the program locale).
So there is a sequence of fallbacks for looking up localizations: For
the general locale X_Y, the first fallback is to
X, next to default, and finally to program.
If the current locale’s name contains no underscore, the fallback
sequence starts at default.
The locale default is a fallback if there is no more specific
localization (typically, if the localization is missing), as for
“something else”. In many cases (as for “something else”) there
is no default localization, and the fallback continues to the
L" string. But in some cases (e.g., in L" bank
[geography]" and L" bank [finance]") these strings are too
developer-oriented, and we put a user-oriented string (“bank” for
both) in the default locale.
So how do we provide the localized string for a given program
string? A simple way is to use locale!, e.g.:
However, defining localizations for all L" strings with
locale! is cumbersome and error-prone (you have to use the
exact same spelling for L" strings with locale! as
before locale@ in the code).
A better alternative is to use locale-csv-out after loading the
program to save all its L" strings and all the existing
localizations to a CSV (comma-separated values) file. The first
column of this file contains the L" strings, the others the
various localizations for that string. E.g., the CSV file for the
example in this section contains:
"program","default","de","de_AT","de_CH","de_DE" "bank [finance]","bank","Bank","","","" "bank [geography]","bank","Ufer","","","" "Please enter your name:","","Bitte geben Sie Ihren Namen ein:","","","" "cauliflower","","Blumenkohl","Karfiol","","" "street","","Straße","","Strasse","" "something else","","","","",""
The first line contains the names of the locales. Many entries contain
empty strings; in that case, there is no localization for the
L" string for the locale of that column (and locale@
will use the next fallback that is not empty.
The way to add localizations is to edit the CSV file. This is not
easy with a simple text editor if there are many columns. One way to
work around this is to use a spreadsheet program or an editor that has
good CSV support. Another way is to distribute the localizations
across several files (which is also better for letting several people
work on the localizations at the same time). E.g., one file could
contain the localizations for de and its variants, while
another could contain the localizations for fr (French) and its
variants.
Once a localization has been edited into a CSV file, one can load the
CSV file with locale-csv. All the locales mentioned in the CSV
file will be defined automatically; if you use locale-csv, do
not use locale: afterwards for the locales in the CSV file.
The lines in the CSV file are ordered by the order in which the
L" strings are found by Gforth. If the CSV file is generated
just from the L" strings in the program, this order may be
helpful for producing the localizations, because related L"
strings are grouped together. If, during maintenance, new L"
strings are added, and you first load the CSV file, then the program,
and then write out a CSV file, you will find the new L" strings
(for localizing) at the end of the new CSV file.
A locale-string identifier (lsid) is an opaque token that
occupies a cell, and it identifies the L" string.
At text interpretation time, parse string. At run-time,
push the lsid associated with string. Each string has
a unique lsid. If no lsid for the string exists yet, a new one
is created. If an lsid for the string exists already, that
lsid is returned. This means that one can refer to and use the
same lsid with L" in different locations in the source
code, by uing the same string. If you want two different
lsids (e.g., because you refer to two different concepts), but
would use the same user-centric text in L", append "
[specifier]" to the text, e.g. L" bank [finance]" or
L" bank [geography]". You may then want to add a
user-centric non-unique default localization (e.g.,
“bank”).
This case-sensitive vocabulary contains the locales. Typical use:
locales:locale.
c-addr u is the L" string for lsid (i.e., the
text-interpretation argument of L").
c-addr u is the localized string for lsid in the
current locale. If no localized string is found in the current
locale with a name of the form X_Y, lsid is
looked up in locale X. If no localized string is
found in the locale X, lsid is looked up in the
locale default. If no localized string is found in the
locale default, lsid is looked up in the locale
program (i.e., c-addr u is the text-interpretation
argument of L").
locales:program becomes the current locale. When this
locale is current, locale@ produces the string used for
identifying the lsid (i.e., the string parsed by L"). This
locale is useful for development: One can see which lsid is used in
which context.
locales:default is the default locale if the user has not
set one. Most lsids don’t have a specific default string, so
fallback to the program locale happens. But if you have a
developer-centric program string that is inappropriate for end
users (in particular, if the program string contains an extra
specifier), you will prefer to define a user-centric string in the
default locale.
Create file name and write the locale database to this in CSV format.
Import comma-separated value (CSV) table into locales. The
first line contains the locale names (column headers). The
program locale must be leftmost. Fallback locales like
de must precede more specific locales like de_AT.
The other lines contain the L" string (first column) and
the corresponding localizations. Each column contains the
localizations for a specific locale. Empty entries mean that
this locale does not define a localization for this L"
string, resulting in using the localization from a fallback
locale instead.
Write the locale database in CSV format to the user output device.
After executing locale!, the localized string for
lsid in the current locale is c-addr u.
Defines a new locale l with name name in
locales.
name execution: ( – ) l becomes the
current locale.
For locales with names of the form
X_Y, define X first in order to
establish X as a fallback for X_Y.