shelob101 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I've recently tackled a code update project at my day job that involves scanning program code for some crummy stuff and replacing it with smart stuff. (The code is Business Basic, for those interested). At any rate, the bad code attempts to PRINT some ASCII terminal control characters that are better represented by a sequence that is not terminal-type dependent. An example line looks like:
10 PRINT "<29>This should be dim.<30>EThis should be reversed.<28>This + is bold."
And so on. The pieces that look like <\d+> are what I'm interested in. What this should be replaced with is:
10 PRINT @(-38);"This should be dim.";@(-37);"This should be reversed. +";@(-40);"This is bold."
OK, no problem. So I set up a hash with keys in it like '<29>' and a value like '@(-38);'. And, using the /g switch of the pattern match operator, I can do:
while ($line=~/(<\d+>)/g) { next if ($1 eq '<30>'); $line=~s/$1/$fix{$1}/; }
I've ignored the codes like <30>E because they are a slighty different pattern, and I want to keep the code simple so I can set up a similar loop for those.
The problem is those darn double quotes in the PRINT statement. As you can see from the desired output, regular text should be bracketed in quotes but those ASCII codes should be changed to the @ statments outside quotes!
So can anyone come up with either a pattern match or substitute operator to successfuly change the bad line to the good? I believe this could be done with pos() and index() but that would be a lot of repetition in the code, and slick RegEx's are so much nicer :)

Thanks for any ideas

Replies are listed 'Best First'.
(tye)Re: Smart Search and Replacement with RegEx
by tye (Sage) on May 29, 2002 at 17:38 UTC

    This works for me:

    $line= <<END; 10 PRINT "<29>This is dim.<30>EThis is reversed.<28>This is bold.<27>" END my %fix= ( "<29>" => '@(-38)', "<28>" => '@(-40)', "<27>" => '@(-35)' ); $line =~ s{("?)(<\d+>)("?)}{ if( "<30>" eq $2 ) { "$1$2$3" } else { my $str= $1 ? "" : '";'; $str .= $fix{$2}; $str .= $3 ? "" : ';"'; $str; } }gex; print $line;
    Though I wish I could use return. (:

            - tye (but my friends call me "Tye")
      Here's another approach:
      $line= <<END; 10 PRINT "<29>This is dim.<30>EThis is reversed.<28>This is bold.<27>" END my %fix= ( "<29>" => '@(-38)', "<28>" => '@(-40)', "<27>" => '@(-35)' ); my $pattern = join("|", map "(?:$_)", keys %fix); 1 while $line =~ s/("[^"]*) ($pattern) (.*?")/$1";$fix{$2};"$3/gx; $line =~ s/"";//g; $line =~ s/;""//g; print $line;
      hey, thanks tye. I knew it would take mere seconds for the wise ones here to enlighten me. One thing I seriously underuse in my perl programming is the extended capabilities of the replace clause of the s/// (or in this case, s{}{}) operator. Solution works perfectly for me!
Re: Smart Search and Replacement with RegEx
by Abigail-II (Bishop) on May 30, 2002 at 13:36 UTC
    I wouldn't try to do it all in a jumbo regex. Why not a simple split?
    sub translate { my $str = shift; my @chunks = split /<(\d+)>/ => $str; my @results; while (@chunks) { my ($number, $string) = splice @chunks => 0, 2; if ($number == 30) { ... } else { push @results => '@(' . $hash {$number} . ')', qq {"$string"}; } } join ";" => @results; }

    Abigail

      very nice! I've never used the parenthetical technique in split before, but after playing with it, this seems to do the breakup and keep your search pattern instead of throwin g it away. Also the splice keeps the string stuff in order and still moves the pointer through the while loop. Thanks for showing me a couple of new tricks, Abigail
Re: Smart Search and Replacement with RegEx aka I am not crazy, I am insane.
by Jenda (Abbot) on May 29, 2002 at 20:04 UTC
    $line = qq{10 PRINT "<29>This should be dim.<30>EThis should be revers +ed.<28>This is bold."}; my %fix = ( "<29>" => '";@(-38);"', "<28>" => '";@(-40);"', "<27>" => '";@(-35);"', "<30>" => "<30>", ); $line =~ s/"([^"]*)"/my $s = $1;$s =~ s{(<\d+>)}{$fix{$1}}g;'"'.$s.'"' +/ge; #or # $line =~ s/("[^"]*")/my $s = $1;$s =~ s{(<\d+>)}{$fix{$1}}g;$s/ge;

    This is a regexp inside regexp. The outer one "extracts" all strings from a line, the second replaces the <\d+> thingies inside. This way it's safe even in the unlikely case that a <\d+> appears outside a string in the code.

    Another thingie is that I keep the whole replacements in the hash, including the double quotes and semicolons, and replace <30> by itself. And an unknown <\d+> by nothing.

    If you wanted to leave unknown alone you'd change the regregexpexp to

    $line =~ s/"([^"]*)"/my $s = $1;$s =~ s{(<\d+>)}{$fix{$1}||$1}ge;'"'.$ +s.'"'/ge;
    or
    $line =~ s/("[^"]*")/my $s = $1;$s =~ s{(<\d+>)}{$fix{$1}||$1}ge;$s/ge +;
    Then you would not have to include the <30> in the %fix hash.

    I assume that to include a doublequote in a string (in that version of Basic) you have to double it.

      Jenda