Argel has asked for the wisdom of the Perl Monks concerning the following question:

SOLVED! Thanks to everyone, especially ikegami, whose first solution in Re: Problems with matching? got the job done!! Original text follows....

I'm slurping in an entire file because I have data I want to substitute in more than one place. Thought it would be easier this way (one pass), but I'm either tripping on matching or making some silly mistake that I cannot see because I have been looking at the code for too long.

The goal is to replace network device names with random names -- basically, scrubbing the data before I send it to one of our vendors. But I am also trying to make sure the config file remains "well formed" so e.g. if a certain vendor going to be writing a parser for the entire file, it should work against the real thing.

Problem is as soon as I do the substitution, the next $old_name is the previous $new_full_name!! Probably missing something obvious here. :(

Same sample data:

unit 53 { description 1234-rwan-1; vci x.yy; } unit 54 { description 45ff-rwan-1; vci x.yz;
Here's the code:
# $1 $2 while( $file =~ m/^\s+description\s(\w\w\w\w)(-\w+-\w+);/gm ) { #say $1, $2 if $opt{debug}; # Way to reset $1, $2, etc? my $prefix = $1; my $suffix = $2; my $old_name = $prefix.$suffix; if( $old_name ~~ @old_names ) { # Should only be processing +each name once! say "ERROR: Already processed old_name $old_name."; reset; next; #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NEXT } push @old_names, $old_name; my $new_name = my_rand(); # Update: my_rand now takes care of +duplciates my $new_full_name = $new_name.$suffix; say "Replacing $old_name with $new_full_name..." if $opt{verbo +se}; # $file =~ s/$old_name/$new_full_name/g; reset; }
If I leave the substitution line above commented out I get (obviously the replacement is not actually happening):
Replacing 572l-rwan-1 with 4S1O-rwan-1... Replacing 25hu-rwan-1 with 9L8O-rwan-1... Replacing 51h2-rwan-1 with 2PDZ-rwan-1...
But if I uncomment the substitution line then I get the following. Note that the rlab-b3f2 also propagates, which is what makes me think it has something to do with matching (it's like $2 never changes).
Replacing 7301-rlab-b3f2 with 8SF6-rlab-b3f2... Replacing 8SF6-rlab-b3f2 with CAB7-rlab-b3f2... Replacing CAB7-rlab-b3f2 with 3X7L-rlab-b3f2...
Using a self-compiled Perl 5.10.1 on a Solaris 10 SPARC system. Anyone see anything obvious? Should I just do one pass to get the names I want to change and then read the file in again, line by line, making the substitution?

Update: The device names appear at least twice in the config file, hence why I was reading in the entire file. If it was a small file I may have just brute forced it, but there are around 8500 devices (so double that for the total number of substitutions) so I was hoping I could read it all into memory and pull it off that way.

Elda Taluta; Sarks Sark; Ark Arks

Replies are listed 'Best First'.
Re: Problems with matching?
by ikegami (Patriarch) on May 18, 2011 at 23:48 UTC

    The search you start with the match op is completed with the substitution op because they both use the same iterator (pos($file)). As a result, your match op always starts from the start of $file.

    my %translations; my %scrubbed; while ($file =~ /^ \s+ description \s (\w\w\w\w(-\w+-\w+)) ;/xmg) { my $real_dev = $1; my $name = $2; if (!$translation{$real_dev}) { # Make sure not to duplicate anything in %scrubbed. my $scrubbed_dev = ...; ++$scrubbed{$scrubbed_dev}; $translation{$real_dev} = $scrubbed_dev; } } my $pat = join '|', map quotemeta, keys %translations; my $re = qr/$pat/; $file =~ s/($re)/$translations{$1}/g;

    If you just want to change the device description instead of every occurrence, I suggest

    my $output = ''; my %translations; my %scrubbed; for my $line (split(//m, $file, -1)) { if ( my ($pre, $real_dev, $name, $post) = $line =~ /^ ( \s+ description \s ) (\w\w\w\w(-\w+-\w+)) ( ; . +* ) /xs) { ) { my $scrubbed_dev = $translation{$real_dev}; if (!$scrubbed_dev) { # Make sure not to duplicate anything in %scrubbed. $scrubbed_dev = ...; ++$scrubbed{$scrubbed_dev}; $translation{$real_dev} = $scrubbed_dev; } $output .= $pre . $scrubbed_dev . $post; } else { $output .= $line; } }
      THANKS!!! I knew I was overlooking something obvious but just couldn't see it!!

      Update1: Thanks for the second snippet, but the device names appear at least twice. And I need to use the full names because sometimes the 4 character sequence will match something unrelated in the config file. Didn't think scrubbing this data would be such a pain! :/

      Update2: Yep, I did miss it! Thanks for the help!!

      Elda Taluta; Sarks Sark; Ark Arks

        And I need to use the full names

        I used the full names. You might have missed my change to smarter capturing.

        You: / ... (\w\w\w\w)(-\w+-\w+) ... / Me: / ... (\w\w\w\w(-\w+-\w+)) ... /
Re: Problems with matching?
by wind (Priest) on May 18, 2011 at 23:13 UTC

    Turn your while into a s///eg

    $file =~ s{^\s+description\s\K(\w{4})(-\w+-\w+)(?:=;)}{ my $prefix = $1; my $suffix = $2; # Optional Transformation logic here, giving $prefix and $suffix n +ew values. $prefix.$suffix }mge;
      I added the reset later, with no effect. And how would I change the while m// to s///? I need to find what I am substituting first, right? I did update my_rand to deal with duplicates (will update code above).

      Update1; Oh, put code within the substitution!!

      Update2: How do I get it to loop? :/

      Elda Taluta; Sarks Sark; Ark Arks

        if you just want to replace the first occurance of a name, then use the following

        my %new_names; $file =~ s{^\s+description\s\K(\w\w\w\w)(?=-\w+-\w+;)}{ my $name = $1; if (!$new_names{$name}) { $name = $new_names{$name} = my_rand(); } $name }gme;

        Or if you want to replace all occurances of a name with the new name, then the following would work:

        my %new_names; $file =~ s{^\s+description\s\K(\w\w\w\w)(?=-\w+-\w+;)}{ $new_names{$1} //= my_rand(); }gme;

        Note: since the suffix appears to not be relevant, I added it to the positive look ahead assertion.

Re: Problems with matching?
by 7stud (Deacon) on May 19, 2011 at 00:00 UTC

    I'm not sure why you think you need to slurp the file:

    use strict; use warnings; use 5.010; my $str =<<'ENDOFSTRING'; unit 53 { description 1234-rwan-1; vci x.yy; } unit 54 { description 45ff-rwan-1; vci x.yz; ENDOFSTRING open(my $INPUT, '<', \$str) or die "Couldn't open string for IO: $!"; while (my $line = <$INPUT>) { if ($line =~ /^ \s+ description \s (\w{4}) (- \w+ - \w+); /xms ){ say "$1 <=> $2"; } } --output:-- 1234 <=> -rwan-1 45ff <=> -rwan-1
      My apologies. I should have made it clearer, but I thought the comment "# Should only be processing each name once!" would suffice since it implies names can exist more than once. In fact, pretty much all of them exist at least twice. If they only existed once this would have been easy!!

      Elda Taluta; Sarks Sark; Ark Arks