sarf13 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All

I am trying to retrieve sql query value. Can you tell me how i can prepare regular exp for this? I am attaching my sample file value. I just want to retrieve sql query value from the given sample file. i am also attaching the reguler exp which is not giving the expacted value

<query> insert into tab_wk ( ran_id, prnt_id , event_id ) (select roe_id, prnt , event_id from event_links where event_id in ( 2,5 ) AND link_type = 4 AND prnt is not NULL AND crrnt_id = ? AND prnt not in (select roe_id from tab where crrnt_id &lt) ; ) <queryParam>CURRENT_PRCSG_TC_ID +</queryParam> </query> <query> INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, + value2, value3,...); <queryParam>BATCH_ID</queryParam> <message>Loader Processing postcheck failed: There are records in roe_ +rjctn_rsn_stgng table which are not present in roe_rjctn_stgng</messa +ge> </query>
my ($start, $query) = $line =~ m/^[INSERT|CREATE](.*).+;/gi;

i really apricate for your kind help and knowlage sharing.

Replies are listed 'Best First'.
Re: Reguler Expression Problem
by davido (Cardinal) on Jul 24, 2012 at 10:42 UTC

    Several issues (and I'll tell you right now that your question is not fully answerable because of one of the issues).

    The first issue we can fix easily. The [square] brackets form a character class. So your regex is matching any single character that has a letter I, N, S, E, R, T, C, R, E, A, T, E, in it, or a | character. That's not what you want. You probably intended to constrain an alternation while capturing whichever choice matched. So m/^[INSERT|CREATE] should be m/(INSERT|CREATE).

    The next issue is this construct: (.*).+ Perl has no idea (and neither do I, which is what part of what makes this question unanswerable) where the dot-star capture is supposed to end, and the dot-plus match is supposed to begin. Actually, Perl has a rule that will govern what happens here, but it doesn't match what you intend, and Perl doesn't realize or care. The dot-star is going to capture as much as it possibly can, and then it will give one character back so that the dot-plus can match right before a semicolon. The dot-plus will match the space character right before the semicolon, and nothing more. Is this what you wanted? If so, just use a single unquantified dot.

    Third: if you want . (dot) to match across multiple lines the /s modifier will be needed.

    Also, if you intend for line 24 of your input to initiate a new match, then the ^ will need to take on the meaning where it gets to match after a newline rather than only at the start of the string. That means you'll need the /m modifier.

    my( $start, $query ) = $line =~ m/^(INSERT|CREATE)(.+);/msig;

    This is probably still broken, since the dot-plus is greedy, and will probably just devour both the first and the second query all at once.

    So while I've provided the regex above, I feel it really doesn't get you much closer to a workable and robust solution. Your question was seeking regex support, but I feel that's focusing too much on the tool you've chosen to use rather than on what actually needs to be accomplished. It's probably a better choice to turn to a module such as SQL::Statement to handle your SQL parsing for you in a more robust and predictable way.

    And even that is not going to get you 100% of the way there, because the queries are embedded in another markup (probably XML), so another parser for that layer will be advisable.


    Dave

Re: Reguler Expression Problem
by tobyink (Canon) on Jul 24, 2012 at 10:33 UTC

    There are a couple of things wrong with your regular expression...

    • Square brackets are not a grouping construct. They are a character class. /[INSERT|CREATE]/ will not match the string "INSERT". It matches a single character which must be one of these: ACEINRST|. What you want are rounded parentheses: /(INSERT|CREATE)/.
    • Your assignment to my ($start, $query) seems to be assuming that two variables will be assigned. However, the things that get assigned are the matches of so-called capturing subexpressions within the query. Subexpressions are things within rounded parentheses; they are capturing subexpressions unless you do something to make them non-capturing, like placing ?: just inside the opening parenthesis.
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Reguler Expression Problem
by choroba (Cardinal) on Jul 24, 2012 at 10:27 UTC
    You have the query in XML - use XML! Ask for the text child of the query tag and you are finished.
    BTW, [INSERT|CREATE] matches any of the characters ACEINRST|, not one of the words. See perlre.
      hii thanx for such quick reply.but i can not pass this as xml because i have to put some validation check on retrieve sql query.
        I do not understand. Where is the problem? Get the query from XML, then do the validation.

        Using regex to manipulate XML is like using a belt sander to remove a zit from your face: it will do the job, but it will be a lot more painful than you can ever imagine.

        Please read and heed the wise counsel of choroba. Perhaps some monk better versed than I in XML can suggest a couple of specific XML modules that might be of use?

        really? Why are you so sure about that?

        K(eep)
        I(t)
        S(imple)
        S(illy)
        Get your data in, as it is, exactly as it appears from your file

        *Then* do whatever validation/processing you need to do on it... don't make the problem harder than it needs to be...