comment on

Several issues (and I'll tell you right now that your question is not fully answerable because of one of the issues).

The first issue we can fix easily. The [square] brackets form a character class. So your regex is matching any single character that has a letter I, N, S, E, R, T, C, R, E, A, T, E, in it, or a | character. That's not what you want. You probably intended to constrain an alternation while capturing whichever choice matched. So m/^[INSERT|CREATE] should be m/(INSERT|CREATE).

The next issue is this construct: (.*).+ Perl has no idea (and neither do I, which is what part of what makes this question unanswerable) where the dot-star capture is supposed to end, and the dot-plus match is supposed to begin. Actually, Perl has a rule that will govern what happens here, but it doesn't match what you intend, and Perl doesn't realize or care. The dot-star is going to capture as much as it possibly can, and then it will give one character back so that the dot-plus can match right before a semicolon. The dot-plus will match the space character right before the semicolon, and nothing more. Is this what you wanted? If so, just use a single unquantified dot.

Third: if you want . (dot) to match across multiple lines the /s modifier will be needed.

Also, if you intend for line 24 of your input to initiate a new match, then the ^ will need to take on the meaning where it gets to match after a newline rather than only at the start of the string. That means you'll need the /m modifier.

my( $start, $query ) = $line =~ m/^(INSERT|CREATE)(.+);/msig;
[download]

This is probably still broken, since the dot-plus is greedy, and will probably just devour both the first and the second query all at once.

So while I've provided the regex above, I feel it really doesn't get you much closer to a workable and robust solution. Your question was seeking regex support, but I feel that's focusing too much on the tool you've chosen to use rather than on what actually needs to be accomplished. It's probably a better choice to turn to a module such as SQL::Statement to handle your SQL parsing for you in a more robust and predictable way.

And even that is not going to get you 100% of the way there, because the queries are embedded in another markup (probably XML), so another parser for that layer will be advisable.

Dave

In reply to Re: Reguler Expression Problem by davido
in thread Reguler Expression Problem by sarf13

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.