in reply to Re: can I make my regexp match first pattern instead of last?
in thread can I make my regex match first pattern instead of last?

I actually tried that. It really got things honked up. Here is the output just by adding that single question mark to the first (.+) expression in the regex:
Here's the result: ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ***** Found Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ***** Found Kurt's SMKY-1978 SeaMonkeys. (counter: 0) ***** ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ELEMENT Shane's SMKY-1978 SeaMonkeys. (counter: 1) ***** ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ELEMENT Mick (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1978/F", Vendor := 77, END_ELEMENT ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1979/A", Vendor := 77, END_ELEMENT

Replies are listed 'Best First'.
Re^3: can I make my regexp match first pattern instead of last?
by dreadpiratepeter (Priest) on Oct 24, 2008 at 21:34 UTC
    Well, that is becuase your expression doesn't do what you think it does. the (.+?) sections you have are matching a lot more than you think they are going to. You start the match in one ELEMENT and eat through multiple records until you find your item name.
    As another poster suggested, you should first break the input up into records, the apply your regexes to that particular record.


    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
      Okay. That's why I'm asking what I'm doing wrong. I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern, not find the first ELEMENT text then eat through multiple records to find my item name. Regex works differently than I have read about in books and online, apparently.
      Okay. That's why I'm asking what I'm doing wrong. I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern, not find the first ELEMENT text then eat through multiple records to find my item name. Regex works differently than I have read about in books and online, apparently.

        A handy rule to know about Perl's regular expressions is that a regular expression will always match the leftmost longest match, and not the overall longest match. If you want a good treatise on how regular expressions work, I recommend the "Owl Book". Personally, I found the 1st edition quite good for Perl. The second edition adds Unicode and the third edition didn't add much beyond that, so in my opinion, if you find an older edition, it'll help you just as well.

        I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern

        No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

        Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

        You aren't putting any bounds on what it throws away when looking for your tag. Here is a more simple example. Here's the data:
        a=1,b=2,a=2,a=3,b=4
        If you search using /b(.+?)4/, you will match b=2,a=2,a=3,b=4 because the .+? expression doesn't care about what it discards to find your match.
        In your code above, you match the first ELEMENT tag, but your .+? expressions only care about the part number; that is why they eat multiple ELEMENT tags on their way to it.
        while you could do your match using very convoluted and fragile regexes, it is better to do it programatically. Since you only want to match within 1 record of your data, the first step should be splitting the input into records (demonstrated in another responce). then your regexes will be limited to just the 1 record you are searching


        -pete
        "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."

        I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern

        No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

        Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.