A recognizer is a Forth word with the stack effect ‘( c-addr u -- ... translator | 0 )’. c-addr u describes the string to be recognized. If the recognizer does not recognize the string, it returns 0. If it does recognize the string, it returns a translator, and a translator-specific amount of additional data (“...”). When performing a translator action, the translator consumes this additional data. E.g., when you perform
"5" rec-number
it pushes ‘5 translate-cell’ on the stack, and when the
compilation action of translate-cell
is performed, both stack
items are removed from the stack. This compilation action also
compiles a literal 5 into the current definition.
You typically write a recognizer as ordinary colon definition that
examines the string in some way, and if the string is accepted by this
recognizer, the recognizer pushes a translator and (below that)
additional data. E.g., a simple variant of rec-tick
can be
implemented as follows:
: rec-tick ( addr u -- xt translate-cell | 0 ) over c@ '`' = if 1 /string find-name dup if name>interpret translate-cell then exit then rec-none ;
The only appropriate use of a translator (plus data) on the stack is to pass it to one of the words for performing translator actions (see Performing translator actions).
A number of translators already exist in Gforth and can be used in a recognizer you write. If none of them is appropriate for your recognizer, read the next section about defining your own translators.
For each translator, additional data is documented; a recognizer that returns a certain translator also has to return the additional data below it.
The text interpreter passes the output of the recognizer to a translator action (see Performing translator actions), which removes the translator and all the additional data from the stack, may perform additional parsing, and then invokes the interpreting run-time of the translator, or the compiling run-time, or the postponing run-time.
For each system-defined translator we specify the interpreting run-time explicitly. Unless otherwise specified the compiling run-time compiles the interpreting run-time. The postponing run-time compiles the compiling run-time.
In the rec-tick
example above, if the recognizer recognizes,
say, `dup
, it returns xt translator, where translator
is the value returned by translate-cell
, and xt is the
execution token of dup
. So xt is the additional data for
this translator. If the text interpreter then performs the compiling
action, that action first removes these two stack items, and compiles
code that pushes xt.
Additional data: ( nt )
.
Interpreting run-time: ( ... -- ... )
Perform the interpretation semantics of nt.
Compiling run-time: ( ... -- ... )
Perform the compilation semantics of nt.
Additional data: ( x )
.
Interpreting run-time: ( -- x )
Additional data: ( xd )
.
Interpreting run-time: ( -- dx )
Additional data: ( r )
.
Interpreting run-time: ( -- r )
Additional data: ( r1 r2 )
.
Interpreting run-time: ( -- r1 r2 )
Additional data: ( c-addr1 u1 )
.
Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the result of translating the \
-escapes in
c-addr1 u1.
Additional data: ( c-addr1 u1 'ccc"' )
.
Every translator action also parses until the first non-escaped
"
. The string c-addr u and the parsed input are
concatenated, then the \
-escapes are translated, giving
c-addr2 u2.
Interpreting run-time: ( -- c-addr2 u2 )
Additional data: ( c-addr1 u1 )
.
Interpreting run-time: ( -- c-addr2 u2 )
c-addr2 u2 is the content of the environment variable with name
c-addr1 u1.
Additional data: ( n xt )
.
xt belongs to a value-flavoured (or defer-flavoured) word,
n is the index into the to-table:
for xt
(see Words with user-defined to
etc.).
Interpreting run-time: ( ... -- ... )
Perform the to-action with index n in the to-table:
of
xt. Additional stack effects depend on n and xt.
One way to write a recognizer is to call forth-recognize
on a
substring, and then look at the result to see if something was
recognized that the whole-string recognizer actually deals with.
E.g., rec-tick
and rec-dtick
do this and then check
whether forth-recognize
has pushed nt translate-name
;
the benefit of this approach is that, e.g. `environment:max-n
works, where rec-scope
recognizes environment:max-n
.
The specific check for an nt used in rec-tick
and
rec-dtick
is forth-recognize-nt?
; it is implemented on top of the
more general try-recognize
.
Try to recognize c-addr u with rec-forth
,
then execute xt ( ... translator -- ... true |
false )
. If xt returns 0, reset the stacks to the
depths at the start of try-recognize
, drop three data
stack items, and push 0. Otherwise return the results
of executing xt.
If forth-recognize
produces a result nt
translate-name
, return nt, otherwise 0.