create a perl script for pattern matching

MelaOS has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am currently trying to create a perl script that takes sql statement that is stored within text files and add up the frequency for that sql statement in that text file and store the sql statement and the no of frequency in another file.

Below is a sample of the sql statement that i would be capturing.

SELECT F_PRODUCT.BUILDING, F_PRODUCT.TESTER, PEDB.Name, PEDB.Email, T1.Name AS PLMName, T1.Email AS PLMEmail, PEDB_1.Name AS QREName, PEDB_1.Email AS QREEmail,

The thing is, there's too much variety to do a efficient pattern matching on these statements.

So would it be possible to first add a " in front of each statement and a " at the back of each statement, and then by using " " as a delimiter i can take whatever sql statement that is found between two ", and use that to compare to other statements in the line to get the frequency and finally storing that statement along with it's no of frequency on another text file.

If you know of any better way or if there's already actual program/scripts that does that, pls advise.

Thank you for your time and attention.

Comment on create a perl script for pattern matching

Replies are listed 'Best First'.
Re: create a perl script for pattern matching by holli (Abbot) on Jul 01, 2005 at 09:22 UTC
Isn't there a "FROM" part missing? Anyway, this is a typical hash-count problem. `use strict; use warnings; my %data; while (<DATA>) { chomp; next unless $_; $data{$_}++; } for ( keys %data ) { print "$data{$_} : $_\n"; }` [download] holli, /regexed monk/	[reply] [d/l]
Re^2: create a perl script for pattern matching by anonymized user 468275 (Curate) on Jul 01, 2005 at 13:06 UTC
The catch I can see here is that SQL can be expressed on multiple lines and might have irrelevant variations (like one extra space after the SELECT) causing a statement to be entered under a separate count. Or what about 'WHERE A.A=C.A' being logically identical to 'WHERE C.A=A.A' - traversing output from a parser seems the only really reliable approach. But that aside, once the parsing problem is resolved, the hash count is indeed the best way to finish off. One world, one people	[reply]
Re: create a perl script for pattern matching by Ben Win Lue (Friar) on Jul 01, 2005 at 09:34 UTC
It is rather easy to recognize a start of an SQL-Statement, because very few words are used. Mostly Select, Update, Insert or Delete and very few other words. The problem is to recognize the end of the Statement. Are there semicola (";") at the end? The rest is as easy as shown by holli.	[reply]
Re^2: create a perl script for pattern matching by VSarkiss (Monsignor) on Jul 01, 2005 at 14:20 UTC
Slightly OT, but It is rather easy to recognize a start of an SQL-Statement Is it? Here are some random snippets from past code I've written. `select max(prod_id) from product where effdt = (select max(effdt) from product)` [download] or `insert order (prod_id, prod_cnt) select prod_id, count() from product group by prod_id` [download] How many statements was that? In fact it's two, but if you just count keywords, you'll get four. The point is not to underestimate the tasks that faces the OP. Trying to make it work by just matching keywords gets the wrong answer against normal, everyday SQL very easily. SQL has a definite grammar that's precise, but it's not easy to parse thru. Additional difficulty comes from recognizing all the vendor-specific extensions to the language, but let's not go into that. :-) Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.*	[reply] [d/l] [select]
Re: create a perl script for pattern matching by anonymized user 468275 (Curate) on Jul 01, 2005 at 09:31 UTC
There's a module that parses SQL, which might reduce your task: http://search.cpan.org/~duncand/SQL-Routine-SQLParser-0.01/lib/SQL/Routine/SQLParser.pm One world, one people	[reply]
Re: create a perl script for pattern matching by TedPride (Priest) on Jul 02, 2005 at 08:11 UTC
Pardon me for stating the obvious, but if you know what the start of every statement is going to look like, you don't have to be able to find the end. The next start defines your current end.	[reply]