Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

rewriting SQL

by hv (Parson)
on Jun 22, 2014 at 09:59 UTC ( #1090819=perlquestion: print w/replies, xml ) Need Help??

hv has asked for the wisdom of the Perl Monks concerning the following question:

Is there a module that can parse SQL statements into an object, then turn that object back into SQL? I've tried looking at SQL::Statement, which parses ok but which doesn't have any facility I can find for re-emitting SQL, and at SQL::Translator, which is all about parsing things in one format and emitting them in another, but only handles table definition stuff (CREATE, ALTER).

What I hope to achieve is to parse a bunch of SQL statements from logs, and change them into a canonical form so that I can compare logs of doing the same work with two different versions of my code and see more easily where they diverge. That'll include things like reordering lists of fields in ASCII order (eg for UPDATE statements, with corresponding reordering of the bind variables), and also spotting references to volatile fields such as session ids (randomly generated on each run) to minimize useless noise in the diffs.

I think my best way forward right now is to use SQL::Statement, and sort out the emitting myself (maybe as Data::Dumper output rather than SQL, as long as all I want to do is get a signature to compare), but I'd love to hear of an easier way: it sounds like the sort of useful thing that somebody would already have written.


Update: SQL::Parser also seems to have some issues, not sure right now whether it'd be easier to fix those or handroll my own. The first line it barfs on if like "insert into t1 (col1, col2) values (now(), 1)", in which it splits the values on parens without checking for balancing: while ( $val_str =~ m/\((.+?)\)(?:,|$)/g ) { ... }, then tries to parse "now(" as a complete row of data to insert.

Replies are listed 'Best First'.
Re: rewriting SQL
by wjw (Priest) on Jun 22, 2014 at 12:01 UTC

    I think this is going to depend a great deal on what gets put into those log files, as well as how and when things get written to the log files. In general, it seems like SQL::Statement::Structure is pretty close to what you describe you are looking for. But again, only you will know, as the log files you read from will be the key to the success of what you are attempting...

    Best of luck..

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

Re: rewriting SQL
by Anonymous Monk on Jun 23, 2014 at 00:01 UTC

    It might help to look at the original problem and see some examples of the SQL statements you want to compare, and to know what dialect (or dialects?) they're in. Without that, we don't know how big the differences you're trying to cover are. If they're not too big, and this tool is just supposed to help you during development, then perhaps you don't need a full parser, and a few well-designed regexes to iron out the differences will do the trick?

Re: rewriting SQL
by sundialsvc4 (Abbot) on Jun 22, 2014 at 12:59 UTC

    An interesting proposition.   If you truly need to parse any SQL statement, then it seems to me that you may(?) need to obtain (from the web) an actual, authoritative BNF grammar for the SQL dialect that you are using, then employ a Lex/Yacc style parser using it.   Which I can see that Perl can do, although I have not personally used that parser with Perl.   I can’t yet speak for Marpa, although it looks interesting and possibly applicable too.   I do not know, but certainly would like to know, if there are any Perl modules out there with this degree of rigor and completeness.

    The difficulty that I foresee is having to compare two statements by comparing two AST parse-trees, which I do think that you would have to do instead of trying to “un-parse” them back to a then-comparable string.

Re: rewriting SQL
by perlfan (Vicar) on Jun 22, 2014 at 20:17 UTC
    I don't have any suggestions as far as Perl-related tools go, but after some searching I found this product that looks promising - Good luck.

      There are many parsers, including Parse::RecDescent, that can probably be made to do this job.   (I’ve done things with that one that I was sure it couldn’t do ...)   All of them, however, are grammar-driven general purpose parsing engines, vs. homebrew stuff that does not take that approach.   A product like this one might be worth $150 to find out exactly how they did it, but it might be tangental to the task at hand.   Their description strongly suggests that a C-capable parser engine is at the core of this product ... which probably means Lex/Yacc but not necessarily.   It is not clear, though, whether they reveal all their secrets for less than $500, or even then.   And I do wonder where their grammar actually did come from, since I see many online references to BNFs of various SQL dialects.

      In any case, I think you’re going to have to find, or build, an honest-to-god grammar for your SQL, use it with an appropriate parser through a CPAN/Perl interface (there are several), and then do your actual comparisons by comparing parse-trees (or by getting clever with the exit-subroutines that you call from various key points within your grammar).

      I used this technique to rip-apart thousands of SAS scripts, Tivoli workload schedules, and Korn scripts to analyze what was a rather hosed-up production system that I didn’t write.   It was an adventure.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1090819]
Approved by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2021-12-05 14:55 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (31 votes). Check out past polls.