A recognizer is a Forth word with the stack effect (
c-addr u -- translation )
. c-addr u describes the string to
be recognized. The returned translation is an abstract, but
transparent data type: On the top of stack, there is a single-cell
translation token. If the recognizer does not recognize the
string, it returns the translation token translate-none
. If it
does recognize the string, it returns a translation with a different
translation token. The translation token also identifies how many
other stack items the translation contains and how the translation
will be processed later.
E.g., when you perform
"5" rec-number
it pushes 5 translate-cell
on the stack, which is a translation
with the translation token translate-cell
.
You typically write a recognizer as ordinary colon definition that
examines the string in some way, and then pushes the appropriate
translation. E.g., a simple variant of rec-tick
can be
implemented as follows:
: rec-tick ( addr u -- translation ) 2dup "`" string-prefix? if 1 /string find-name dup if name>interpret translate-cell exit then drop translate-none exit then rec-none ;
The only appropriate use of a translation is to pass it to one of the words for performing translation actions (see Performing translation actions).
A number of translation tokens already exist in Gforth and can be used in a recognizer you write. If none of them is appropriate for your recognizer, read the next section about defining your own translation tokens.
The system-defined translation token words are documented as removing
some stack items and pushing a complete translation on the stack,
e.g., for translate-cell
( x -- translation )
. This
makes the documentation uniform and avoids cumbersome descriptions.
However, actually the current translation token words just push a
cell-sized translation token on the stack (for translate-cell
:
( -- translate-cell )
), and, combined with the additional stack
items (for translate-cell
: ( x -- x translate-cell )
),
the result is a translation (for translate-cell
: ( x --
translation )
).
The text interpreter passes the output of the recognizer to a translation action (see Performing translation actions). Every translation action removes the translation from the stack, then may perform additional parsing, and finally performs the interpreting run-time of the translation token, or the compiling run-time, or the postponing run-time.
For each system-defined translation token we specify the interpreting run-time explicitly. Unless otherwise specified, the compiling run-time compiles the interpreting run-time. Unless otherwise specified, the postponing run-time compiles the compiling run-time.
In the rec-tick
example above, if the recognizer recognizes,
say, `dup
, it returns xt-dup translate-cell. If the text
interpreter then performs the compiling action, that action first
removes this translation (these two cells), and compiles code that
pushes xt-dup.
Interpreting run-time: ( ... -- ... )
Perform the interpretation semantics of nt.
Compiling run-time: ( ... -- ... )
Perform the compilation semantics of nt.
Interpreting run-time: ( -- x )
Interpreting run-time: ( -- xd )
Interpreting run-time: ( -- r )
Interpreting run-time: ( -- r1 r2 )
Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the result of translating the \
-escapes in
c-addr1 u1.
Every translation action also parses until the first non-escaped
"
. The string c-addr u and the parsed input are
concatenated, then the \
-escapes are translated, giving
c-addr2 u2.
Interpreting run-time: ( -- c-addr2 u2 )
Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the content of the environment variable with name
c-addr1 u1.
xt belongs to a value-flavoured (or defer-flavoured) word,
n is the index into the to-table:
for xt
(see Words with user-defined to
etc.).
Interpreting run-time: ( ... -- ... )
Perform the to-action with index n in the to-table:
of
xt. Additional stack effects depend on n and xt.
One way to write a recognizer r is to call a recognizer (for the
whole input of r or a substring) that recognizes more strings
(e.g., rec-forth
), and then look at the result to see if
something was recognized that r actually deals with.
E.g., the actual implementation of rec-tick
passes its input
without the prefix ‘`’ to rec-forth
and checks whether the
resulting translation-token is nt translate-name, then converts
nt to xt, and replaces translate-name with
translate-cell. The benefit of this approach compared to our
example implementation above is that, e.g., `environment:max-n
works, where rec-scope
recognizes environment:max-n
.
The specific check for an nt used in rec-tick
is
rec-forth-nt?
; it is implemented on top of the more
general rec-filter
.
Execute rec ( c-addr u -- translation1 )
;
translation1 is then examined with filter (
translation1 -- translation1 f )
. If f is
non-zero, translation is translation1, otherwise
translation is translate-none.
If rec-forth
produces a result nt
translate-name
, return nt, otherwise 0.