kleucht has asked for the wisdom of the Perl Monks concerning the following question:

I'm probably doing this all wrong. I have a text file full of data and I want to match and replace patterns of "item" and "catalog number" that are in the file. But the order of each element in the file is very important, so I want to match/replace starting from the top of the file and then work my way down.

The code snippet below actually works, but when I execute it, it replaces the third instance of the "SeaMonkey" & "SMKY-1978" pattern and then it replaces the second instance of that pattern. What I'd like it to do is replace the first instance of the pattern and then the second.

So I'd like the output to say "Found Kurt's SMKY-1978 SeaMonkeys" and then "Found Shane's SMKY-1978 SeaMonkeys" and then leave Mick's SMKY-1978 SeaMonkeys alone since I only want to find and replace the first 2 instances of the pattern. Right now it says "Found Shane's SMKY-1978 SeaMonkeys" and "Found Mick's SMKY-1978 SeaMonkeys" because it is matching the last pattern each time the for loop is executed.

So am I missing a subtle little known regexp character or am I just doing what I want to do completely and utterly wrong?

Here is the working code:

# my regexp matches from the bottom to the top but I'd like it to repl +ace from the top down local $/=undef; my $DataToParse = <DATA>; my $item = "SeaMonkeys"; my $catNum = "SMKY-1978"; my $maxInstancesToReplace = 2; parseData(); exit(); sub parseData { for (my $counter = 0; $counter < $maxInstancesToReplace; $counter+ ++) { # Stick in a temporary text placeholder that I will replace la +ter after more processing $DataToParse =~ s/(.+)\sELEMENT\s(.+?)\s\(Item := \"$item\".+? +CatalogNumber := \"$catNum.+?END_ELEMENT(.+)/$1 ***** Found $2\'s $ca +tNum $item. (counter: $counter) *****$3/s; } print("Here's the result:\n$DataToParse\n"); } __DATA__ ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ELEMENT Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ELEMENT Kurt (Item := "SeaMonkeys", ItemID := 12, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ELEMENT Shane (Item := "SeaMonkeys", ItemID := 1032, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ELEMENT Mick (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1978/F", Vendor := 77, END_ELEMENT ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1979/A", Vendor := 77, END_ELEMENT

And here is what it currently outputs:

Here's the result: ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ELEMENT Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ELEMENT Kurt (Item := "SeaMonkeys", ItemID := 12, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ***** Found Shane's SMKY-1978 SeaMonkeys. (counter: 1) ***** ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ***** Found Mick's SMKY-1978 SeaMonkeys. (counter: 0) ***** ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1979/A", Vendor := 77, END_ELEMENT

Replies are listed 'Best First'.
Re: can I make my regexp match first pattern instead of last?
by GrandFather (Saint) on Oct 24, 2008 at 21:19 UTC

    Slurping files should generally be avoided, but your data seems to have a nice record separator so you can set $/ to that and then:

    use strict; use warnings; my $item = "SeaMonkeys"; my $catNum = "SMKY-1978"; my $remainingToReplace = 2; my @lines; local $/ = "\n\n"; while (defined (my $block = <DATA>)) { @lines = split "\n", $block; next unless $remainingToReplace; next unless $lines[0] =~ m/\Q$item\E/; # Replacement needed here print "***** Found $item, counter $remainingToReplace *****\n"; --$remainingToReplace; } continue { print join "\n", @lines, "\n"; } __DATA__ Data per OP's node

    Prints:

    ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ELEMENT Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ***** Found SeaMonkeys, counter 2 ***** ELEMENT Kurt (Item := "SeaMonkeys", ItemID := 12, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ***** Found SeaMonkeys, counter 1 ***** ELEMENT Shane (Item := "SeaMonkeys", ItemID := 1032, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ...

    Perl reduces RSI - it saves typing

      Awesome! Thanks for the answer! This seems to work fine. I had to modify your suggested regex a bit to handle matching both the item and the catalog number, since that was part of my original problem. I may not have made that very clear above.

      My regex looks like this, and it's acting on the whole $block variable instead of acting on just the first line of each block.

      m/ELEMENT\s(.+?)\s\(Item := \"\Q$item\E\".+?CatalogNumber := \"\Q$catNum\E/s

      Thanks again for the quick help!

      I have one small remaining problem, though. My data is already a large string variable instead of a filehandle. I used the filehandle in my example above because it was easy to just copy and paste. So what do I do with this "while" line in your solution if I already have the whole dataset slurped up into a variable called $fileContents?

      while (defined (my $block = <DATA>))

        Use the string as a file:

        my $fileContents; open my $scan, '<', \$fileContents; while (defined (my $block = <$scan>)) { ... } close $scan;

        Perl reduces RSI - it saves typing
Re: can I make my regex match first pattern instead of last?
by johngg (Canon) on Oct 24, 2008 at 22:01 UTC
    local $/=undef; my $DataToParse = <DATA>; ... exit();

    As GrandFather points out, you should be cautious about slurping files. When you do use slurping it is worth getting into the habit of making the local $/=undef; really local rather than applying from that point in your script onwards so as to avoid nasty surprises further down your code. You can do this either inside a bare code block

    my $DataToParse; { local $/; $DataToParse = <DATA>; }

    or, perhaps better, a do block

    my $DataToParse = do { local $/; <DATA>; };

    I hope this is of interest.

    Cheers,

    JohnGG

    Update: Just noticed a horrible cut'n'paste error, there should not have been a my in the bare code block. Corrected.

      okay. Thanks for the advise.
Re: can I make my regexp match first pattern instead of last?
by dreadpiratepeter (Priest) on Oct 24, 2008 at 21:17 UTC
    Off the top of my head, it looks like you need to replace the first (.+) expression in your regex with (.+?) so that it doesn't eat all the text up until the last ELEMENT that follows it.


    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
      I actually tried that. It really got things honked up. Here is the output just by adding that single question mark to the first (.+) expression in the regex:
      Here's the result: ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ***** Found Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ***** Found Kurt's SMKY-1978 SeaMonkeys. (counter: 0) ***** ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ELEMENT Shane's SMKY-1978 SeaMonkeys. (counter: 1) ***** ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ELEMENT Mick (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1978/F", Vendor := 77, END_ELEMENT ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1979/A", Vendor := 77, END_ELEMENT
        Well, that is becuase your expression doesn't do what you think it does. the (.+?) sections you have are matching a lot more than you think they are going to. You start the match in one ELEMENT and eat through multiple records until you find your item name.
        As another poster suggested, you should first break the input up into records, the apply your regexes to that particular record.


        -pete
        "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
Re: can I make my regex match first pattern instead of last?
by bart (Canon) on Oct 25, 2008 at 20:27 UTC
    s/// usually works from the top, you must be doing to change that.
    $DataToParse =~ s/(.+)\sELEMENT\s(.+?)\s\(Item := \"$item\".+? +CatalogNumber := \"$catNum.+?END_ELEMENT/$1 ***** Found $2\'s $catNum + $item. (counter: $counter) *****$3/s;
    Ah, yes. Drop the useless initial subpattern (.+) (you only reproduce what it captures anyway) and you'll be much closer to home. Capturing the rest of the string after what you care for, is unnecessary too, for the same reason.

    A regex doesn't have to match a whole string, you know.

    $DataToParse =~ s/\sELEMENT\s(.+?)\s\(Item := \"$item\".+?Cata +logNumber := \"$catNum.+?END_ELEMENT/ ***** Found $1\'s $catNum $item +. (counter: $counter) *****/s;

      That doesn't do it either. It's matching more than just one single ELEMENT block. Here's the output when I put in your suggestion (although I think I changed a catalog number in the data since my original post):

      Here's the result: ***** Found Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ***** Found Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ELEMENT Kurt's SMKY-1978 SeaMonkeys. (counter: 0) ***** ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ELEMENT Mick's SMKY-1978 SeaMonkeys. (counter: 1) ***** ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1978/A", Vendor := 77, END_ELEMENT
Re: can I make my regex match first pattern instead of last?
by ikegami (Patriarch) on Sep 18, 2009 at 18:10 UTC
    local $/ = ''; # Paragraph mode my $item = "SeaMonkeys"; my $catNum = "SMKY-1978"; my $maxInstancesToReplace = 2; my $item_re = qr/^[ ]*ItemID[ ]*:=[ ]*"\Q$item\E"[ ]*$/m; my $catNum_re = qr/^[ ]*CatalogNumber[ ]*:=[ ]*"\Q$catNum\E"[ ]*$/m; my $instances = 0; while (<DATA>) { if ( $instances < $maxInstancesToReplace && /$item_re/ && /$catNum_re/ ) { ++$instances; # ... } print; }