comment on

To the best of my knowledge...
The problem is not just that Perl uses a NFA algorithm for its regular expressions and Flex uses a DFA one. It is that flex maintains considerable special state information and processes all of the rules in parallel. Consider the following flex code....


%x comment
%%
        int line_num = 1;

"/*"         BEGIN(comment);


<comment>[^*\n]*        /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
<comment>\n             ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);
[download]

Now this will eat anything in a comment... while maintaing that line count var.... But implied in that code is

A system for maintaining which runstate we are in. That is what remembers that we are in the Comment state.
A system for matching the greediest RE in the list. That is what keeps
```
<comment>[^*\n]*
[download]
```
from matching the * in */. The existance of that "*" + "/" rule.

Of course nothing is stopping you from writing a program that does all of those things in perl. But you cannot just use the regular expressions cart blanch. There are slight differences in perl's re and flex's use of yacc's re system. Ie

"/*"         BEGIN(comment);
[download]

is perl's

if ($bob =~ /^\/\*/) {
  $comment = 1;
}
[download]

But again.. you could write something that properly escaped these things.

Here is some perl code to do that flex code.

#!/usr/bin/perl
use strict;
my $bob = "/* THis ** comment** is\n\nMy\ngreatest\n Comment** */";
my $comment = 0;
my $line_num = 1;
while($bob) {
my @match_list;
print "STARTLOOP matching $bob\n";
if ($bob =~ /^\/\*/) {
  push @match_list,{rest=>$',code=>"\$comment=1",matched=>$&};
  print "start comment matched: $&\n";
}
if($comment && $bob =~ /^[^*\n]*/) {
  push @match_list,{rest=>$',code=>"",matched=>$&};
  print "not star matched: $&\n";
}
if($comment && $bob =~ /^\*[^*\n]*/) {
  push @match_list,{rest=>$',code=>"",matched=>$&};
 print "star with no slash matched: $&\n";
}
if($comment && $bob =~ /^\n/) {
  push @match_list,{rest=>$',code=>"++\$line_num;",matched=>$&};
  print "newline matched: $&\n";
}
if($comment && $bob =~ /^\*\//) {
  push @match_list,{rest=>$',code=>"\$comment=0",matched=>$&};
  print "end comment matched: $&\n";
}
my $max_length = 0;
my $match;
#should be priority queue.. but I'm too tired to
# use that module now...
foreach $match (@match_list) {
  if(length($match->{matched}) > $max_length) {
     $max_length = length($match->{matched});
  }
}
foreach $match (@match_list) {
  if(length($match->{matched}) == $max_length) {
    eval $match->{code};
    $bob = $match->{rest};
  }
}
}
print "line num is $line_num\n";
[download]

You should be able to see how you could make a program that took the flex code and made perl from it. It could just leave stub functions or "" in the code key of that hash.

DFA's are just NFA's that just don't allow multiple moves from one state to another via the same character. It is hard to get an NFA in DFA form, but the converse is not true.

Hope this makes sense.
---
Crulx
crulx@iaxs.net

In reply to Re: Flex, converting regexes, and other Interesting Stuff. by Crulx
in thread Flex, converting regexes, and other Interesting Stuff. by Petruchio

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.