in reply to parse a query string

Any thoughts of an efficient method of obtaining these results?

For the first pass, I would not recommend that you worry about an efficient method of parsing these results. A slow method that works is surely better than a fast method that doesn't, yes?

Right off the bat, it looks like your SQL is ugly enough that you need a proper parser. I would recommend you look at Parse::RecDescent. If that turns out to be too slow, turning to Parse::Yapp or something similar might help.

However, then I got curious and decided to write a "rough draft" of the problem for you. You'll need to clean up the output, but this should pretty much print what you want.

#!/usr/bin/perl use strict; use warnings; my $sql = <<END_SQL; select the_date as "date", round(months_between(first_date,second_date),0) months_old ,product,extract(year from the_date) year ,case when a=b then 'c' else 'd' end tough_one from XXX END_SQL my $lparen = qr/\(/; my $rparen = qr/\)/; my $keyword = qr/(?i:select|from|as)/; # this is all this problem need +s my $comma = qr/,/; my $text = qr/(?:\w+|'\w+'|"\w+")/; my $op = qr/(?:=>|<=|[=+\-*\/])/; sub lexer { my $sql = shift; return sub { LEXER: { return ['KEYWORD', $1] if $sql =~ /\G ($keyword) /gcx; return ['COMMA', ''] if $sql =~ /\G ($comma) /gcx; return ['OP', $1] if $sql =~ /\G ($op) /gcx; return ['PAREN', 1] if $sql =~ /\G $lparen /gcx; return ['PAREN', -1] if $sql =~ /\G $rparen /gcx; return ['TEXT', $1] if $sql =~ /\G ($text) /gcx; redo LEXER if $sql =~ /\G \s+ /gcx; } }; } my $lexer = lexer($sql); my @tokens; my $nested = 0; while (my $token = $lexer->()) { $nested += $token->[1] if 'PAREN' eq $token->[0]; next if $nested or $token->[1] eq 'PAREN'; last if 'KEYWORD' eq $token->[0] && 'FROM' eq uc $token->[1]; push @tokens => $token; } foreach my $i (0 .. $#tokens) { my $token = $tokens[$i]; next unless 'TEXT' eq $token->[0]; print $token->[1], $/ if $i == $#tokens; print $token->[1], $/ if 'COMMA' eq $tokens[$i + 1][0]; } __END__ # prints out: "date" months_old product year tough_one

I'm tired and I need to hit the sack, so I can't explain this too much. Let's just say that I turned the SQL into a series of easy to parse tokens. Then I discarded all tokens that were inside of parentheses and after the FROM clause. Then I assumed that any TEXT token at the end or before a comma is what you were looking for. This works for the sample SQL you provided, but it's very fragile. I would write lots of tests to explore edge cases.

This code is not perfect, but it's a start.

Note that SQL::Statement cannot be used here because SQL::Parser does not recognize CASE statements.

Cheers,
Ovid

New address of my CGI Course.