Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Basically I have two sets of arrays consisting of strings. In one of the arrays I have shortened strings of the strings of the other array. The part of the string which gets shortened are things contained by <>. For instance in the first array I have Ut and on the right side I have abcd<Utly> and I need to replace Utly with Ut. Basically I am trying to use regular expressions to find and replace the part which matches.
$1[0] = "Ut"; $2[0] = "abcd<Utly>"; foreach $thing(@1) { foreach $thing2(@2) { if($thing2 =~ /\s*<$thing\s*>/) { $thing2 =~ s/$thing/$thing/; } } }

Replies are listed 'Best First'.
Re: replacing strings using reg exp
by BrowserUk (Patriarch) on Oct 17, 2003 at 00:31 UTC

    There are a couple of errors in your code:

    1. if($thing2 =~ /\s*<$thing\s*>/)

      For your examples this line equates to

      if( 'abcd<Utly>' =~ /\s*<Ut\s*>/)

      There is no way that \s* will ever match 'ly'.

    2. $thing2 =~ s/$thing/$thing/;

      Even if the match was made, this line is saying: Look in 'abcd<Utly>' for 'Ut' and if you find it replace 'Ut' with 'Ut', which probably isn't what you want:)

    If your arrays are of any size, your nested loops comparing every element of one array with every element of the second is going to take a quite a long while to run...if they're small then that probably doesn't matter, but you could make the process more efficient by using a hash, but it's probably better to get this one working before you move on.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

Re: replacing strings using reg exp
by sauoq (Abbot) on Oct 17, 2003 at 01:04 UTC

    Ugh. You can use @1 and @2 if you are masochistic, but please don't make us read code like that. It's hideous.

    That said, I think you are trying to do this:

    my @abbrs = qw( Ut ); my @strings = qw( abcd<Utly> ); for $abbr ( @abbrs ) { for $str ( @strings ) { $str =~ s/<$abbr(.*)>/<$abbr>/; } }
    Notice that there is no reason for a separate if statement. The substitution only takes place if it can.

    -sauoq
    "My two cents aren't worth a dime.";
    
      That's all good advice, but I'd make some corrections / improvements to your regexp.

      One thing that you should be doing is using \Q and \E in your replacement. That is, you want to be replacing a literal abbreviation, not an abreviation-pattern. Also, you should limit that ".*". Granted, ".*?" would be good enough for most people, but I prefer to be explicit. Additionally, there is no reason to catch the latter half of the word being abreviated (so, drop the parenthesese). Last of all, you probably want to perform this replacement as many times as it occurs, so add a /g modifier:

      $str =~ s/<\Q$abbr\E[^>]*>/<$abbr­>/g;

      ------------
      :Wq
      Not an editor command: Wq
        That's all good advice, but I'd make some corrections / improvements to your regexp.

        Well, I wouldn't go so far as to call any of that a "correction."

        You offer at least one improvement, however. There's certainly no reason to capture as I was. Using [^>]* is arguably better as well, although it wouldn't be likely to make much of a practical difference in this case.

        As for your other changes, while I am a firm believer in robust and explicit regular expressions, I also believe strongly in keeping things simple. Given my understanding of the OP's question, I'd guess that both your \Q quoting and the /g modifier are probably unnecessary. (In fact, my decision not to use /g and my decision not to use non-greedy matching went hand in hand.)

        My sense of the problem after reading the explanation and the code was that each string had only one such replacement. (Perhaps I should've suggested a last; in there as well.) After reading it again, I can see where that might be an incorrect assumption. I may have given too much weight to the single example he gave.

        Still, keeping the regex relatively close to the OP's original attempt might help his understanding, and that, afterall, was my real intent.

        ++ for your rigor though and I do appreciate the feedback.

        -sauoq
        "My two cents aren't worth a dime.";