cranberry13 has asked for the wisdom of the Perl Monks concerning the following question:

I want to look through a file and locate any words that are in uppercase and italisize them in html:
$myfile =~ s/([A-Z]+)/\<i\>$1\<\/i\>/g;
the line above doesnt work. What am I doing wrong?

Replies are listed 'Best First'.
Re: Quick question about pattern matching uppercase letters
by mce (Curate) on Apr 27, 2004 at 14:47 UTC
    Hi,

    Some context code would be nice.

    Anyway, this should do the trick

    perl -p -e "s|\b([A-Z]+)\b|<i>\1</i>|g;" yourfile

    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    BMC, Belgium
      Better is s|\b([A-Z]+)\b|<i>$1</i>|gm;. Don't use backreferences if you don't have to. They're difficult to debug when they have an error. Plus, you'll need the /m modifier to match across multiple lines.

      Update: As pointed out to me, /m isn't needed here. This is a case of learning a rule early and never learning the reasons behind the rule. (/m is for "multiple lines")

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        //m changes the meaning of ^ and $ and there aren't any of those there, so it isn't needed. //m doesn't do anything else. And using \1 on the right side of a subst isn't actually a backreference and doesn't make anything harder to debug, it's just deprecated syntax.

        The use of \1 is a sign that the poster forgot to enable warnings, though.

Re: Quick question about pattern matching uppercase letters
by dragonchild (Archbishop) on Apr 27, 2004 at 14:35 UTC
    Firstly, wrap your code in <code> tags.

    Secondly, you'll need to add more info. Namely:

    1. What isn't working?
    2. The rest of your script, because the regex may be fine, but you might not be writing the altered text back to the file (for one).

    Remember, our mind-reading helmets are usually broken. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: Quick question about pattern matching uppercase letters
by tkil (Monk) on Apr 28, 2004 at 04:42 UTC

    Consider using something besides / as your regex delimiter when the pattern or replacement uses slashes. Also, you don't need to escape less-than or greater-than signs:

    $myfile =~ s!([A-Z]+)!<i>$1</i>!g;

    That regex looks fine to me. If I test it on the command line, it works:

    $ perl -lwe '$_="hi MOM and DAD"; print; s!([A-Z]+)!<i>$1</i>!g; print' hi MOM and DAD hi <i>MOM</i> and <i>DAD</i>

    Although, if you really want words, you should use the word boundary assertions (\b) around your series of upper-case letters:

    $myfile =~ s!\b([A-Z]+)\b!<i>$1</i>!g;

    The fact that you are trying to bind it to something called $myfile worries me, though; is that a variable that contains the entire contents of the file, or is it the filehandle itself? If it is the contents, you need to write them back out for the changes to be visible on disk.

    If it is a filehandle, then you need to loop over all the lines in the file and apply the regex to each line. (Or read the whole thing in.) Either way, you end up as above — you need to write it out for it to be visible on disk.

    If this is all you are doing, note that perl has a convenience switch (command-line option) to do exactly this: take a list of files, save the original, then apply a program to each file and write out the results to the original filename. See perlrun, look at the -i switch. It is most often used with -p or -n, and often with -l (dash ell), -a, and -e.

    As an example, to replace "mom" with "dad" in every .txt file in the current directory, saving backup copies of the original to .txt~ files:

    perl -i~ -plwe 's/mom/dad/g' *.txt

    If you are doing more processing than -i can accomodate, or if this is a part of another process, the template given in the -i documentation can help. To read in a text file and italicize all-caps words for display in HTML, I might do something like this:

    open my $fh, "source-data.txt" or die "opening source-data.txt: $!"; print "<blockquote>\n"; while ( my $line = <$fh> ) { # protect against most egregious HTML violations $line =~ s/&/&amp;/g; $line =~ s/</&lt;/g; $line =~ s/>/&gt;/g; # mark upper-case words as italic. not locale-safe. $line =~ s!\b([A-Z])+\b!<i>$1</i>!g; # output the result print $fh $line; } print $fh "</blockquote>\n"; close $fh or die "closing source-data.txt: $!";

    This code can also show why the default variable ($_) is so nice. Notice how much cleaner the while loop gets if we take advantage of the default variabe:

    while ( <$fh> ) { # protect against most egregious HTML violations s/&/&amp;/g; s/</&lt;/g; s/>/&gt;/g; # mark upper-case words as italic. not locale-safe. s!\b([A-Z])+\b!<i>$1</i>!g; # output the result print $fh $_; }
Re: Quick question about pattern matching uppercase letters
by dreadpiratepeter (Priest) on Apr 27, 2004 at 14:47 UTC
    don't you want:
    $myfile =~ s/([A-Z]+)/\$1\<\/i\>/g;
    Note the [] around the character classes. You were mathing a captal A followed by a dash, followed by 1 or more capital Z's.
    UPDATE: I just noticed that you were missing the code tags and that you may have put the [] in. In that case, ignore my response.


    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."