Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a site that's about 20 different HTML pages and has about 50 or so images per page. Problem is, I forgot to add alt tags from the beginning and I don't have the time to go through and add alt tags for all of them.

I want to write a script that will take all .html/.htm/.shtml scripts in a current folder, open them and and alt tags to every image that doesn't already have them.

Reading files from the directory is something I can probably manage on my own but I wouldn't know an efficient way to run through all applicable files and substitute/add a new image ALT tag. I will be reading from a text file from the same directory as the script for ALT tags one per line meaning the first s/// will be using the first word on the first line of the text file, the second will s/// the second line for the text file, etc. And if it runs out of lines in the text file and still has more to s///, it'll start over and repeat from the top of the text file again.

Any suggestions on how to make a script like this?

  • Comment on Slurping and s/// a group of files from a list

Replies are listed 'Best First'.
Re: Slurping and s/// a group of files from a list
by davido (Cardinal) on Nov 02, 2004 at 19:01 UTC

    I would create an index file, where each imagename is mapped to an alt text-quip. 50 images times 20 pages is only 1000 items, so you can pull that index file into a hash for simplicity sake. In fact, since this is just a single-use script, don't bother with the text file, just put your filename (key) / text (value) pairs into a __DATA__ segment at the bottom of the script and read them into your hash with the <DATA> filehandle and a clever chomp and split.

    Then you'll iterate over each html file in the directory. You might consider using something like HTML::TokeParser so that you don't have to roll your own (possibly less robust) regexp solution for finding the <img src... tags. As you locate each tag, match the image name with one of the hash keys, and presto, you'll have the text that needs to be added. You write out a new tempfile with the changes, and rename it over the old html file. Then move on to the next html file.

    Be sure to keep backups until you've got it going right.


    Dave

      Yes, a proper parser (HTML::TokeParser::Simple is my personal favourite) should make this a pretty straightforward job. For example, assuming a hash (%alt_text) with the relevant 'src' attribute as the key and the text as the value (as davido suggested), something like this would be what you're looking for:
      use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( $filename ); while ( my $token = $p->get_token ) { if ( $token->is_tag('img') && !defined $token->get_attr('alt') ) { $token->set_attr('alt', $alt_text{ $token->get_attr('src') } ); } print $token->as_is; }
Re: Slurping and s/// a group of files from a list
by bpphillips (Friar) on Nov 02, 2004 at 19:08 UTC
    You can do this from the command line (this started out being fairly straightforward and ended up being pretty convoluted so it's obviously not the best solution as far as clarity goes, maybe not for performance either):
    perl -pi'.orig' -e 'BEGIN {open(FH,"alt_tag_file"); @alltags = <FH>; c +homp @alltags;}; @tags = @alltags unless(scalar(@tags)); s/(<img)([^> +]+>)/@T = ($1,$2); $T[0].($T[1] =~ m,\balt=, ? "" : q{ alt="}.shift(@ +tags).q{"}).$T[1]/eg *.html *.htm *.shtml
    All your original files will be renamed to be something like foo.html.orig (use perl -pi -e if you don't care about backups).

    I might be misunderstanding the OP's desired solution (based on davido's response) but this solution assumes that you don't need to match up image names to alt tags (the alt tags are in the correct order in the alt tag file)
    --Brian

      The OP did imply that the alt tags would be in order in the tag file. I was just making the point that I wouldn't do it that way, but rather, would use a hash to map image names to tags. And the reason is that if you miss-order just one item in the text list, you'll end up screwing up the whole thing. What a mess that could be. Imagine the image of a puppydog showing up with the text for a cat! But even worse, imagine every image past that point also being screwed up.;)

      Your method works fine too... as long as there are no mistakes when creating the image alt text file.


      Dave