Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm not sure I've got this completed narrowed down, but please bear with me...

I've got a multi-line SQL SELECT statement generated by a tool where I want to extract the table name. Okay, easy enough. Here's simple code which plucks out the table name when there is a WHERE clause and when there is not:

#!/usr/bin/perl my $stmt0 = "SELECT table.field0, table.field1\nFROM table;\n"; my $stmt1 = "SELECT table.field0, table.field1\nFROM table\nWHERE fiel +d0 = 'foo';\n"; my $tablename = $1 if $stmt0 =~ m/^.*FROM\s+(\S+)\s*.*;$/s; print "table name =\"$tablename\"\n"; $tablename = ''; $tablename = $1 if $stmt0 =~ m/^.*FROM\s+(\S+)\s*.*;$/s; print "table name =\"$tablename\"\n";

I get the same answer either way, I would think that I've got a generalized regular expression which takes care of all types of SELECT statements.

Or so I thought...

When I run this regular expression to extract the table name in the script which accesses the tool, the tool emits a multi-line statement with no WHERE clause, and does not match anything with the above regular expression. If I take the semicolon out of the RE:

$tablename = $1 if $stmt =~ m/^.*FROM\s+(\S+)\s*.*$/gs;

I get the table name as table;. I can hack around this problem by adding a substitution afterwards:

$tablename = $1 if $stmt =~ m/^.*FROM\s+(\S+)\s*.*$/gs; $tablename =~ s/;//;

...but this only points out that I don't know what's going on.

I then dumped the statement to a file and wrote a small C program to read the string character-by-character to see what really was in it. In stepping through execution with a debugger, I saw nothing out of the ordinary -- the SELECT keyword, two fully-qualified field names, a 13,10 newline character pair given that this is on Windows followed by the FROM clause.

I'm baffled as to why I need the kludgy substitution. Any insight any of you can provide would be appreciated. Thanks.

20031203 Edit by Corion: Changed PRE tags to CODE tags

Replies are listed 'Best First'.
Re: regular expression across lines?
by Roger (Parson) on Dec 02, 2003 at 23:50 UTC
    Add the m switch.
    $stmt =~ m/^FROM\s+(\S+)$/mgs;
    I have created a simple example below -
    use strict; use warnings; my $sql = q{ SELECT id FROM table WHERE name like 'A%' }; my ($table) = $sql =~ m/^FROM\s+(\S+)$/mgs; print "$table\n";
    And the output is as expected -
    table
    And of course if you want to capture multiple table names, with FROM on the same line, etc, the regular expression will be more complex, but that is out of scope.

      Thanks for your suggestion. I have a print statement in my code following the regular expression:
      print "'$tablename'\n";
      
      and the interpreter scolds me with Use of uninitialized value in concatenation (.) or string when I try your regular expression which focuses on only the line comprising the FROM clause.

      I'm still baffled...

        Drop the g switch. And then modify your regex to capture table names with \w+ instead of \S+.
        use strict; use warnings; my $stmt0 = "SELECT table.field0, table.field1\nFROM table0;\n"; my $stmt1 = "SELECT table.field0, table.field1\nFROM table1\nWHERE fie +ld0 = 'foo';\n"; my $nameA = $1 if $stmt0 =~ m/^FROM\s+(\S+)$/ms; my $nameB = $1 if $stmt1 =~ m/^FROM\s+(\S+)$/ms; my $name0 = do { $stmt0 =~ m/^FROM\s+(\w+)(;*?)$/ms; $1 }; my $name1 = do { $stmt1 =~ m/^FROM\s+(\w+)(;*?)$/ms; $1 }; print "table name A ='$nameA'\n"; print "table name B ='$nameB'\n"; print "table name 0 ='$name0'\n"; print "table name 1 ='$name1'\n";
        And the output is -
        table name A ='table0;' table name B ='table1' table name 0 ='table0' table name 1 ='table1'
        Notice that in table name A, there is a ';' because of the ; character. The new regex is capable of filtering it out.

        My bad. I tried the following:
        $tablename = $1 if $stmt =~ m/^FROM\s+(\S+)/mgs;
        #$tablename =~ s/;//g;
        print "'$tablename'\n";
        
        but I am back to table; as to what is assigned to $tablename. I guess I need to resort to refining the character class I want to save. Thanks again.
Re: regular expression across lines?
by allolex (Curate) on Dec 03, 2003 at 08:11 UTC

    Please, Anonymous Monk, try to use <code> tags in the future instead of <pre> or <xmp>. Pre tags mess up the formatting on this site. :)

    --
    Allolex