lexgenerate programs for simple lexical tasks |
Command |
lex
[-ctvn
]
[-V
]
[-Q
[y
|n
]]
[file]
lex
command generates programs to be used in simple
lexical analysis of text. The input files (standard input default)
contain strings and expressions to be searched for and C text to be executed
when these strings are found. lex
processes supplementary
code set characters in program comments and strings, and single-byte
supplementary code set characters in tokens, according to the locale specified
in the LC_CTYPE
environment variable.
lex
generates a file named lex.yy.c
. When
lex.yy.c
is compiled and linked with the lex library, it copies the
input to the output except when a string specified in the file is found. When a
specified string is found, then the corresponding program text is executed. The
actual string matched is left in yytext
, an external character
array. Matching is done in order of the patterns in the file. The
patterns may contain square brackets to indicate character classes, as in
[abx-z]
to indicate a
, b
, x
,
y
, and z
; and the operators *
,
+
, and ?
mean, respectively, any non-negative number
of, any positive number of, and either zero or one occurrence of, the previous
character or character class. Thus, [a-zA-Z]+
matches a string of
letters. The character .
is the class of all characters except
new-line. Parentheses for grouping and vertical bar for alternation are also
supported. The notation r{
d,e}
in a rule indicates between d and e instances of regular
expression r. It has higher precedence than |
, but lower
than *
, ?
, +
, and concatenation. The
character ^
at the beginning of an expression permits a successful
match only immediately after a new-line, and the character $
at the
end of an expression requires a trailing new-line. The character /
in an expression indicates trailing context; only the part of the expression up
to the slash is returned in yytext
, but the remainder of the
expression must follow in the input stream. An operator character may be used
as an ordinary symbol if it is within "
symbols or preceded by
\
.
Three macros are expected: input
to read a character;
unput(
c)
to replace a character read; and
output(
c)
to place an output character. They
are defined in terms of the standard streams, but you can override them. The
program generated is named yylex
, and the lex library contains a
main
that calls it. The macros input
and
output
read from and write to stdin
and
stdout
, respectively.
The function yymore()
accumulates additional characters into the
same yytext
. The function
yyless(
n)
pushes back yyleng
-
n characters into the input stream. (yyleng
is an external
int
variable giving the length in bytes of yytext
.)
The function yywrap()
is called whenever the scanner reaches end of
file and indicates whether normal wrapup should continue. The action
REJECT
on the right side of the rule causes the match to be
rejected and the next suitable match executed. The action ECHO
on
the right side of the rule is equivalent to printf("%s", yytext)
.
Any line beginning with a blank is assumed to contain only C text and is copied;
if it precedes %%
, it is copied into the external definition area
of the lex.yy.c
file. All rules should follow a %%
, as
in yacc
. Lines preceding
%%
that begin with a non-blank character define the string on the
left to be the remainder of the line; it can be called out later by surrounding
it with {}
. In this section, C code (and preprocessor statements)
can also be included between %{
and %}
. Note that
curly brackets do not imply parentheses; only string substitution is done.
The external names generated by lex
all begin with the
prefix yy
or YY
.
Multiple files are treated as a single file. If no files are specified, standard
input is used.
Certain default table sizes are too small for some users. The table sizes for
the resulting finite state machine can be set in the definitions section:
%p
nnumber of positions is n (default 20000)
%n
nnumber of states is n (4000)
%e
nnumber of parse tree nodes is n (8000)
%a
nnumber of transitions is n (16000)
%k
nnumber of packed character classes is n (20000)
%o
nsize of output array is n (24000)
-v
option, unless the -n
option is used.
-c
indicates C actions and is the default.
-n
does not display the -v
summary.
-Q
[y
|n
]when -Qy
is specified, writes version information
to output file lex.yy.cc
. When -Qn
is
specified, no version information is written. -Qn
is
the default.
-t
causes the lex.yy.c
program to be written instead to
standard output.
-v
provides a two-line summary of statistics.
-V
displays version information on standard error.
D [0-9] O [0-7] %{ void skipcommnts(void) { for(;;) { while(input()!='*') ; if(input()=='/') return; else unput(yytext[yyleng-1]); } } %} %% if printf("IF statement\n"); [a-z]+ printf("tag, value %s\n",yytext); 0{O}+ printf("octal number %s\n",yytext); {D}+ printf("decimal number %s\n",yytext); "++" printf("unary op\n"); "+" printf("binary op\n"); "\n" ;/*no action */ "/*" skipcommnts(); %%
yacc