Ignoring case

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am using the following code to pull out any statement with a "href" in it. I dont know how to make this code ignore case, I know it has something to do with /i, but Im not exactly sure where to put that.


open (FILE, $filename);
    while(<FILE>){
        
        

        # walk each file
        my $line = $_ ;
        chomp $line;

        
        #grabbing and printing everything between the body tags
        if (/<body.*?>/i ... /<\/body.*?>/i){
         # this is a body line
          # extract the body
        
        ##changing .html to .asp in the links        
        if  (grep(/href.*\.html/,$line)) {
            (my $newline = $line ) =~ s/\.html/\.asp/g;            
        print OUTFILE $newline . "\n";
        next;
        }

         $body_temp = $_;
         $body_temp =~ s/(.*?)\<body\>(.*?)\<\/body\>/$2/i;
         chomp($body_temp);

         $body = "$body_temp" ;
        
         # Write the body to the output file
         print OUTFILE $body . "\n";
        }
    }
    close(FILE);
[download]

Comment on Ignoring case Download Code

Replies are listed 'Best First'.
Re: Ignoring case by chromatic (Archbishop) on May 01, 2001 at 23:12 UTC
The /i is a modifier at the end of a regex, just like /g. Your code to modify $body_temp has /i in the right place. If you were to use code like I provided in Re: Removing duplicate line, just add i after g in the second substitution. (I would recommend using my code, mostly because I'm full of True Perl Pride today.) We'll say something like: `s/(a href=.+?\.)html/$1asp/gi;` The order of the modifier doesn't matter in this case. (I can't think of a case where it does.) The important thing is that it's never /g/i or /i/g. Just combine them after the final slash of the regex. You can learn more about regular expressions and their modifiers in perldoc perlre.	[reply] [d/l]
Re: Ignoring case by AidanLee (Chaplain) on May 01, 2001 at 23:12 UTC
you're probably looking for a regular expression something like this: `s/(<a href=".+?)\.html(">)/$1.asp$2/gi` [download] which should do a case-insensitive global replacement of '.html' at the end of a href value throughout your whole file. You may need to make it more flexible by including some '\s+' or '\.*?' in between portions of your tag depending on if your markup is not consistent.	[reply] [d/l]
Re: Ignoring case by rchiav (Deacon) on May 01, 2001 at 23:45 UTC
OK.. not much to do with your question since others have answered it, but as I suggested in your other post, you don't need to use 4 different variables to refrence the line you're working with. Here is your code, removing all the variables that you don't need.. oh and adding "use strict" and -w if you didn't have it already... `#!/usr/bin/perl -w use strict; my $filename = './index.html'; open (FILE, $filename); open OUTFILE, '>out.asp'; while(<FILE>){ # walk each file chomp; #grabbing and printing everything between the body tags if (/<body.?>/i ... /<\/body.?>/i){ # this is a body line # extract the body #changing .html to .asp in the links if (/href.\.html/i) { s/\.html/\.asp/gi; } s/(.?)\<body\>(.*?)\<\/body\>/$2/i; # Write the body to the output file print OUTFILE "$_\n"; } } close(FILE); close OUTFILE;` [download] I think everyone would agree that only using one variable to manipulate one line of text makes things much clearer. Rich	[reply] [d/l]
Re: Ignoring case by JojoLinkyBob (Scribe) on May 02, 2001 at 04:10 UTC
Hmm you could always just: `my $line = lc $_;` [download] Then as long as all your matching strings are lowercase, you'll be ok Desert coder	[reply] [d/l]