haricothoriz has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone know of a module to pretty-print SQL statements?

I'm writing a utility to analyze postgresql log files. I'd like to be able to do two things:

  1. Identify parameters in a SQL statement to be able to identify families of statements that are essentially "the same" but use different constants. Just using bone-headed parsing of numbers and tick-delimited strings, I can do OK with this one, but I know I'm not doing it perfectly. One obvious thing I'm not handling yet is embedded ticks in strings, but maybe there are other things.

  2. Format the SQL statements in a more readable way, as an option. E.g. have each major clause start on a new line and maybe try to break up lengthy boolean clauses somehow. Again, it's easy to fake this by just looking for keywords, but it would be nice to be able to do it right.

Thanks,
Kevin

P.S. Right now my script's usage looks like this:

analyze_pg_log [options] file ... Options: --sort-by-count, -c Sort by the number of each type of quer +y (default) --sort-by-sql, -s Sort by the query text itself --sort-by-dupes, -d Sort by the ratio of instances of a que +ry to unique instances of a query --sort-by-weight, -w Sort by the total time spent on each ty +pe of query --params, -p Show the unique parameter sets used for + each query type --one-pid, -o Ignore entries for all but the first pi +d encountered. analyze_pg_log --help: Show this help message. analyze_pg_log --man: Show the full manual page. analyze_pg_log --version: Show the version number.

The output of the script looks like this (when not using --params) (I haven't added even primitive pretty-printing yet):

murphy@genometest$ analyze_pg_log --one-pid egdata.2.2.log There are 21 unique basic queries: Fam Count Uniques TotalTime Avg-Time Query ----- ----- ------- --------- -------- ----- A 1 1 0.001548 0.001548 SELECT main.* +FROM CustomFieldValues main WHERE ((main.CustomField = ?String-1?)) O +RDER BY main.SortOrder ASC B 1 1 0.005947 0.005947 SELECT main.* +FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Sta +tus != ?String-1?)) AND ((main.Type = ?String-2?)) AND\ ( ( (main.Status = ?String-3?)OR(main.Status = ?String-4?) ) AND ( (m +ain.Queue = ?String-5?) ) ) ORDER BY main.Priority DESC LIMIT ?Num-6? ... ---- ----- ----- --------- Totals: 161 0.091403 Approx. elapsed time of this section of log is 4.594 seconds.

Replies are listed 'Best First'.
Re: SQL pretty printer?
by kvale (Monsignor) on Apr 26, 2004 at 17:14 UTC
    The module SQL::Statement can parse and process SQL code:
    require SQL::Statement; # Create a parser my($parser) = SQL::Parser->new('Ansi'); # Parse an SQL statement $@ = ''; my ($stmt) = eval { SQL::Statement->new("SELECT id, name FROM foo WHERE id > 1", $parser); }; if ($@) { die "Cannot parse statement: $@"; } # Likewise, query the tables being used in the statement: my $numTables = $stmt->tables(); # Scalar context my @tables = $stmt->tables(); # Array context # @tables now contains SQL::Statement::Table instances # etc...

    -Mark

Re: SQL pretty printer?
by Fletch (Bishop) on Apr 26, 2004 at 17:10 UTC

    Not a complete solution, but you probably could use SQL::Statement to parse out the parts to apply formatting to presuming you can extract the query from the rest of the line (which shouldn't be too hard).