Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am using the following code to pull out any statement with a "href" in it. I dont know how to make this code ignore case, I know it has something to do with /i, but Im not exactly sure where to put that.
open (FILE, $filename); while(<FILE>){ # walk each file my $line = $_ ; chomp $line; #grabbing and printing everything between the body tags if (/<body.*?>/i ... /<\/body.*?>/i){ # this is a body line # extract the body ##changing .html to .asp in the links if (grep(/href.*\.html/,$line)) { (my $newline = $line ) =~ s/\.html/\.asp/g; print OUTFILE $newline . "\n"; next; } $body_temp = $_; $body_temp =~ s/(.*?)\<body\>(.*?)\<\/body\>/$2/i; chomp($body_temp); $body = "$body_temp" ; # Write the body to the output file print OUTFILE $body . "\n"; } } close(FILE);

Replies are listed 'Best First'.
Re: Ignoring case
by chromatic (Archbishop) on May 01, 2001 at 23:12 UTC
    The /i is a modifier at the end of a regex, just like /g. Your code to modify $body_temp has /i in the right place.

    If you were to use code like I provided in Re: Removing duplicate line, just add i after g in the second substitution. (I would recommend using my code, mostly because I'm full of True Perl Pride today.) We'll say something like:

    s/(a href=.+?\.)html/$1asp/gi;

    The order of the modifier doesn't matter in this case. (I can't think of a case where it does.) The important thing is that it's never /g/i or /i/g. Just combine them after the final slash of the regex.

    You can learn more about regular expressions and their modifiers in perldoc perlre.

Re: Ignoring case
by AidanLee (Chaplain) on May 01, 2001 at 23:12 UTC

    you're probably looking for a regular expression something like this:

    s/(<a href=".+?)\.html(">)/$1.asp$2/gi

    which should do a case-insensitive global replacement of '.html' at the end of a href value throughout your whole file. You may need to make it more flexible by including some '\s+' or '\.*?' in between portions of your tag depending on if your markup is not consistent.

Re: Ignoring case
by rchiav (Deacon) on May 01, 2001 at 23:45 UTC
    OK.. not much to do with your question since others have answered it, but as I suggested in your other post, you don't need to use 4 different variables to refrence the line you're working with. Here is your code, removing all the variables that you don't need.. oh and adding "use strict" and -w if you didn't have it already...
    #!/usr/bin/perl -w use strict; my $filename = './index.html'; open (FILE, $filename); open OUTFILE, '>out.asp'; while(<FILE>){ # walk each file chomp; #grabbing and printing everything between the body tags if (/<body.*?>/i ... /<\/body.*?>/i){ # this is a body line # extract the body #changing .html to .asp in the links if (/href.*\.html/i) { s/\.html/\.asp/gi; } s/(.*?)\<body\>(.*?)\<\/body\>/$2/i; # Write the body to the output file print OUTFILE "$_\n"; } } close(FILE); close OUTFILE;
    I think everyone would agree that only using one variable to manipulate one line of text makes things much clearer.

    Rich

Re: Ignoring case
by JojoLinkyBob (Scribe) on May 02, 2001 at 04:10 UTC
    Hmm you could always just:
    my $line = lc $_;
    Then as long as all your matching strings are lowercase, you'll be ok
    Desert coder