comment on

in my incoming text hop:lexer seems to incorrectly advance to the next token on qualified names. Am I doing something wrong?

For example when it encounters "Enterprise Warehouse"."Charge Date" AS "Charge Date"

I get the following output...

next FQNAME2 Enterprise Warehouse.Charge Date

current: FQNAME2 Enterprise Warehouse.Charge Date

next FQNAME1 Enterprise Warehouse

current: FQNAME1 Enterprise Warehouse

next FQNAME1 Enterprise Warehouse

current: FQNAME1 Enterprise Warehouse

code below

use strict;
use warnings;
use HOP::Lexer 'make_lexer';

my $sql = <<END_SQL;
DECLARE DIMENSION "Enterprise Warehouse"."Charge Date" AS "Charge Date
+" UPGRADE ID 8444734 ON
    (
          "Enterprise Warehouse"."Charge Date"."Charge Date Total" ) D
+EFAULT ROOT  "Enterprise Warehouse"."Charge Date"."Charge Date Total"
    DESCRIPTION {This is the date hierarchy for the Charge date.} 
    PRIVILEGES ( READ);
END_SQL

my @keywords = (
    'DECLARE DIMENSION',
    'AS',
    'UPGRADE ID',
    'DESCRIPTION',  
    'PRIVILEGES',  
    'ON',
    'DEFAULT ROOT'
);

my @sql   = $sql;
my $lexer = make_lexer(
    sub { shift @sql },
    [ 'KEYWORD', qr/(?i:@{[join '|', map {$_} @keywords]})/ ],
    [ 'UPRGADEID',   qr/\d+/                            ],
    [ 'COMMA',    qr/,/                            ],
    [ 'PAREN',   qr/\(/,      sub { [shift,  1] } ],
    [ 'PAREN',   qr/\)/,      sub { [shift, -1] } ],
    [ 'BRACE',   qr/\{/,      sub { [shift,  1] } ],
    [ 'BRACE',   qr/\}/,      sub { [shift, -1] } ],
#   [ 'TEXT',     qr/\([^\(]+\)\)/, \&text        ],
#   [ 'TEXT',     qr/({[^{]+})/, \&text  ],
    [ 'FQNAME3',  qr/("[^"]+".){2}\"[^"]+"/, \&text  ],
    [ 'FQNAME2',  qr/("[^"]+".)\"[^"]+"/, \&text  ],
    [ 'FQNAME1',  qr/("[^"]+")/, \&text  ],
    [ 'PERIOD',   qr/\./],
    [ 'SPACE',    qr/\s*/,     sub {}              ],
);

sub text {
    my ( $label, $value ) = @_;
    $value =~ s/["']//;
    $value =~ s/["']$//;
    $value =~ s/\".\"/./;
    return [ $label, $value ];
}

my $inside_parens = 0;
while ( defined ( my $token = $lexer->() ) ) {
     my ( $label, $value ) = @$token;
     $inside_parens += $value if 'PAREN' eq $label;
     print "current: $label $value \n";
     
     my $next = $lexer->('peek');
     my ( $next_label, $next_value ) = @$next;
     print "next $next_label $next_value \n";

#    next if $inside_parens || 'TEXT' ne $label;
        if ( defined ( my $next = $lexer->('peek') ) ) {
        my ( $next_label, $next_value ) = @$next;
        if ( 'COMMA' eq $next_label ) {
            print "$value\n";
        }
        elsif ( 'KEYWORD' eq $next_label && 'from' eq $next_value ) {
            print "$value\n";
            last; # we're done
        }
    }
}
[download]

In reply to hop:lexer question by Freewilly3d

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.