MelaOS has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am currently trying to create a perl script that takes sql statement that is stored within text files and add up the frequency for that sql statement in that text file and store the sql statement and the no of frequency in another file.

Below is a sample of the sql statement that i would be capturing.

SELECT F_PRODUCT.BUILDING, F_PRODUCT.TESTER, PEDB.Name, PEDB.Email, T1.Name AS PLMName, T1.Email AS PLMEmail, PEDB_1.Name AS QREName, PEDB_1.Email AS QREEmail,

The thing is, there's too much variety to do a efficient pattern matching on these statements.

So would it be possible to first add a " in front of each statement and a " at the back of each statement, and then by using " " as a delimiter i can take whatever sql statement that is found between two ", and use that to compare to other statements in the line to get the frequency and finally storing that statement along with it's no of frequency on another text file.

If you know of any better way or if there's already actual program/scripts that does that, pls advise.

Thank you for your time and attention.
  • Comment on create a perl script for pattern matching

Replies are listed 'Best First'.
Re: create a perl script for pattern matching
by holli (Abbot) on Jul 01, 2005 at 09:22 UTC
    Isn't there a "FROM" part missing? Anyway, this is a typical hash-count problem.
    use strict; use warnings; my %data; while (<DATA>) { chomp; next unless $_; $data{$_}++; } for ( keys %data ) { print "$data{$_} : $_\n"; }


    holli, /regexed monk/
      The catch I can see here is that SQL can be expressed on multiple lines and might have irrelevant variations (like one extra space after the SELECT) causing a statement to be entered under a separate count. Or what about 'WHERE A.A=C.A' being logically identical to 'WHERE C.A=A.A' - traversing output from a parser seems the only really reliable approach.

      But that aside, once the parsing problem is resolved, the hash count is indeed the best way to finish off.

      One world, one people

Re: create a perl script for pattern matching
by Ben Win Lue (Friar) on Jul 01, 2005 at 09:34 UTC
    It is rather easy to recognize a start of an SQL-Statement, because very few words are used. Mostly Select, Update, Insert or Delete and very few other words.
    The problem is to recognize the end of the Statement. Are there semicola (";") at the end?
    The rest is as easy as shown by holli.

      Slightly OT, but

      It is rather easy to recognize a start of an SQL-Statement
      Is it? Here are some random snippets from past code I've written.

      select max(prod_id) from product where effdt = (select max(effdt) from product)
      or
      insert order (prod_id, prod_cnt) select prod_id, count(*) from product group by prod_id
      How many statements was that? In fact it's two, but if you just count keywords, you'll get four.

      The point is not to underestimate the tasks that faces the OP. Trying to make it work by just matching keywords gets the wrong answer against normal, everyday SQL very easily. SQL has a definite grammar that's precise, but it's not easy to parse thru. Additional difficulty comes from recognizing all the vendor-specific extensions to the language, but let's not go into that. :-)

Re: create a perl script for pattern matching
by anonymized user 468275 (Curate) on Jul 01, 2005 at 09:31 UTC
Re: create a perl script for pattern matching
by TedPride (Priest) on Jul 02, 2005 at 08:11 UTC
    Pardon me for stating the obvious, but if you know what the start of every statement is going to look like, you don't have to be able to find the end. The next start defines your current end.