in reply to Re^2: can I make my regexp match first pattern instead of last?
in thread can I make my regex match first pattern instead of last?

Well, that is becuase your expression doesn't do what you think it does. the (.+?) sections you have are matching a lot more than you think they are going to. You start the match in one ELEMENT and eat through multiple records until you find your item name.
As another poster suggested, you should first break the input up into records, the apply your regexes to that particular record.


-pete
"Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."

Replies are listed 'Best First'.
Re^4: can I make my regexp match first pattern instead of last?
by kleucht (Beadle) on Oct 25, 2008 at 18:47 UTC
    Okay. That's why I'm asking what I'm doing wrong. I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern, not find the first ELEMENT text then eat through multiple records to find my item name. Regex works differently than I have read about in books and online, apparently.
Re^4: can I make my regexp match first pattern instead of last?
by Anonymous Monk on Oct 25, 2008 at 18:45 UTC
    Okay. That's why I'm asking what I'm doing wrong. I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern, not find the first ELEMENT text then eat through multiple records to find my item name. Regex works differently than I have read about in books and online, apparently.

      A handy rule to know about Perl's regular expressions is that a regular expression will always match the leftmost longest match, and not the overall longest match. If you want a good treatise on how regular expressions work, I recommend the "Owl Book". Personally, I found the 1st edition quite good for Perl. The second edition adds Unicode and the third edition didn't add much beyond that, so in my opinion, if you find an older edition, it'll help you just as well.

      You aren't putting any bounds on what it throws away when looking for your tag. Here is a more simple example. Here's the data:
      a=1,b=2,a=2,a=3,b=4
      If you search using /b(.+?)4/, you will match b=2,a=2,a=3,b=4 because the .+? expression doesn't care about what it discards to find your match.
      In your code above, you match the first ELEMENT tag, but your .+? expressions only care about the part number; that is why they eat multiple ELEMENT tags on their way to it.
      while you could do your match using very convoluted and fragile regexes, it is better to do it programatically. Since you only want to match within 1 record of your data, the first step should be splitting the input into records (demonstrated in another responce). then your regexes will be limited to just the 1 record you are searching


      -pete
      "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."

      I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern

      No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

      Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

      I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern

      No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.

      Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.