Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I need help with some regular expressions. I'm trying to match the table and view names and the tables and view in the from section of a SQL statement. The problems include: - that the spacing can be irregular - They can go over the a single line - can be mixed casing I can match the names with different regular expressions, but I'm wondering if it's possible to merge them? What about the from clause? How do I strip them out over the multiple lines? I think I will need to parse the scripts twice to get the different part and change the paragraph symbol.
create view V_PAID_USERS as select * from users u, subcription s where u.user_id=s.user_id and s.balance =< 0; CREATE OR REPLACE VIEW V_OVERDUE AS select * from users u, subscription s where u.user_id=s.user_id and s.duedate > 60; create TABLE UNPAID_USERS as select * from users u, subcription s where u.user_id=s.user_id and s +.balance > 0; create view receipts as select * from unpaid_users; Here's the code: if($_ =~ /create\s/i .. /V_\w+/i) { if($_ =~ /(V_\w+)/i) { print OUTFILE "$path Line: $line_cnt: $1\n"; } } if($_ =~ /create\s/i .. /T_\w+/i) { if($_ =~ /(T_\w+)/i) { print OUTFILE "$path Line: $line_cnt: $1\n"; } }

Replies are listed 'Best First'.
Re: Pattern matching "dirty" data sources
by LameNerd (Hermit) on Apr 30, 2003 at 18:09 UTC
Re: Pattern matching "dirty" data sources
by pzbagel (Chaplain) on Apr 30, 2003 at 18:19 UTC
    You have two options.

    1. You can read the whole file in as one big line and then for a m//mg fo what you are looking for.

    { local $/=""; $file=<INFILE>; } #Now you can regex with m//mg all you want

    Those curly brackets in the snippet above are important, they change the scope of the local $/ (the input record separator) so you can clear it just for the duration of the assignment to $file.

    Option 2 is slightly different. Since it seems you are counting lines, this one might be better for what you are looking for. After you start reading an SQL line, simply keep reading until you hit the semicolon that concludes that line of code and concatenate everything together.

    while(<INFILE>){ chomp(); $line.=" " .$_; #This adds a space between each line if (/\;$/) { #This works as long as the last char on the line is ;, m +ay need to be tweaked. #regexes go here $line=""; }

    Cheers

Re: Pattern matching "dirty" data sources
by BrowserUk (Patriarch) on Apr 30, 2003 at 22:56 UTC

    I'm not at all sure I've read you question correctly, but is this the output you are looking for from your sample input? (The quotes are just a sanity check and easily removed)

    C:\test>254403 'V_PAID_USERS' 'V_OVERDUE' 'UNPAID_USERS' 'receipts'

    #! perl -slw use strict; $/ = ""; # Set paragraph mode (see Perlvar:$/) while( <DATA>) { m[(\w+)\s+as]i and print "'$1'"; } __DATA__ create view V_PAID_USERS as select * from users u, subcription s where u.user_id=s.user_id and s.balance =< 0; CREATE OR REPLACE VIEW V_OVERDUE AS select * from users u, subscription s where u.user_id=s.user_id and s.duedate > 60; create TABLE UNPAID_USERS as select * from users u, subcription s where u.user_id=s.user_id and s +.balance > 0; create view receipts as select * from unpaid_users;

    Setting $/ to '' enables paragraph mode, and which allows you to read each complete statement as a single string. Using the keyword "AS" as an anchor, seems to select the table or view name in each statement. You can easily change \w+ to be more selective about which ones get captured.

    As always, if I got it completely wrong, simply ignore me:)


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: Pattern matching "dirty" data sources
by dragonchild (Archbishop) on Apr 30, 2003 at 17:51 UTC
    What is it you're trying to accomplish? There are several better ways, but it all depends on where you're trying to go.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      I want to create a report listing:

      - all tables and views created in a script

      - all the used tables and views in a script

      for each script file we use to create the databases. Probably 50-80 scripts and around 40K LOC.

      My first approach is to parse the script file and extract the names from the create statements and the from clauses. Maybe there's a better way?