in reply to (Ovid -- don't use a regex) Re: changing data
in thread changing data

Thanks Ovid, I tried your script and it still didnt change any of my data as needed. Is there something I am doing wrong?
  • Comment on Re: (Ovid -- don't use a regex) Re: changing data

Replies are listed 'Best First'.
(Ovid) Re(3): changing data
by Ovid (Cardinal) on Feb 25, 2002 at 20:18 UTC

    My understanding was that you needed to recursively search through all HTML documents and munge the meta tags. That was a guess because I didn't really know if you wanted HTML docs or not. Here's a list of things to consider:

    • Are there any error messages generated?
    • Did you change the $root_dir variable to point to the root directory of the documents that you wanted to change?
    • To determine if we have a correct document type, I use the following regex to check the extension: /\.html?/i. Is that correct? If not, update the regex. Also, that regex has a bug. It should be /\.html?$/i. Sorry 'bout that. (this bug merely creates extra .bak files. It's recoverable)
    • This program lists the files that it is processing and the files that it is skipping. Does that list match your expectations?

    Regarding the last bullet point: in the &wanted subroutine, $_ is the current filename you are processing and that is what is getting printed. If you need to tweak which files get processed, this is the variable to take a look at. Read the docs for the File::Find module for more information.

    Check those things and you'll have a good idea of how to proceed. I just tossed out the shell of what you were looking for. You'll need to adjust it to suit your personal needs.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Thanks now it works! If you have time could you please explain some of this as I am new to perl and still learning. Can you also explain what the -> is doing in this script? Is it pointing to a pre defined perl function?
      use strict; use File::Find; use HTML::TokeParser; my $bak_ext = '.bak'; my $root_dir = '/perl/bin'; find(\&wanted, $root_dir); sub wanted { # if the extension fits... if ( /\.html?$/i ) { print "Processing $_\n"; my $new = $_; my $bak = $_ . $bak_ext; rename $_, $bak or die "Cannot rename $_ to $bak: $!"; #WHAT IS THE + DOING HERE???? open NEW, "> $new" or die "Cannot open $new for writing: $!"; + my $p = HTML::TokeParser->new( $bak );#WHAT IS THE-> new($bak)DOING while ( my $token = $p->get_token ) { #I AM COMPLETELY LOST ON THIS TOKEN PART?? # this index is the 'raw text' of the token my $text_index = $token->[0] eq 'T' ? 1 : -1; # it's both a start tag and a meta tag if ( $token->[0] eq 'S' and $token->[1] eq 'meta' ) { $token->[ $text_index ] =~ s/FLORIDA\.//g; } print NEW $token->[ $text_index ]; #WHAT IS THE ->[ $text_index ] ?? } close NEW; } else { print "Skipping $_\n"; } }