in reply to Reguler Expression Problem

Several issues (and I'll tell you right now that your question is not fully answerable because of one of the issues).

The first issue we can fix easily. The [square] brackets form a character class. So your regex is matching any single character that has a letter I, N, S, E, R, T, C, R, E, A, T, E, in it, or a | character. That's not what you want. You probably intended to constrain an alternation while capturing whichever choice matched. So m/^[INSERT|CREATE] should be m/(INSERT|CREATE).

The next issue is this construct: (.*).+ Perl has no idea (and neither do I, which is what part of what makes this question unanswerable) where the dot-star capture is supposed to end, and the dot-plus match is supposed to begin. Actually, Perl has a rule that will govern what happens here, but it doesn't match what you intend, and Perl doesn't realize or care. The dot-star is going to capture as much as it possibly can, and then it will give one character back so that the dot-plus can match right before a semicolon. The dot-plus will match the space character right before the semicolon, and nothing more. Is this what you wanted? If so, just use a single unquantified dot.

Third: if you want . (dot) to match across multiple lines the /s modifier will be needed.

Also, if you intend for line 24 of your input to initiate a new match, then the ^ will need to take on the meaning where it gets to match after a newline rather than only at the start of the string. That means you'll need the /m modifier.

my( $start, $query ) = $line =~ m/^(INSERT|CREATE)(.+);/msig;

This is probably still broken, since the dot-plus is greedy, and will probably just devour both the first and the second query all at once.

So while I've provided the regex above, I feel it really doesn't get you much closer to a workable and robust solution. Your question was seeking regex support, but I feel that's focusing too much on the tool you've chosen to use rather than on what actually needs to be accomplished. It's probably a better choice to turn to a module such as SQL::Statement to handle your SQL parsing for you in a more robust and predictable way.

And even that is not going to get you 100% of the way there, because the queries are embedded in another markup (probably XML), so another parser for that layer will be advisable.


Dave