Yesterday, a new monk was asking around the chatterbox for a regexp that could match an HTML image tag that doesn't have a alt attribute. Sounds easy to a newbie, but everyone who's ever tried dealing with HTML,using Regexes knows it's not. The reasons are obvious one has to deal with the possiblity of > and < tags in quotes, you don't know where a certain attribute is going to appear in a tag, etc etc. I'm no regex wizard and even the people on the 'box that were just said use HTML::*. I was one of these voices. But ever time I came back the same monk was repeating the same questions. I finally messaged that monk with a link to the following code.
#!/usr/bin/perl -w #program to find img tags w/o alt attributes use strict; use HTML::TokeParser; #build list of HTML files in the same directoy my @files=<*>; @files = grep(/[.]htm/i ,@files); #parse each file for my $file (@files) { my $p = HTML::TokeParser->new( $file ); #move through each html token in the file while (my $token = $p->get_token){ #find IMG start tags if ($token->[0] eq "S" && $token->[1] =~ /img/i) { my $alt_count = 0; for my $token (keys %{$token->[2]}){ #if alt tag is found count it ++$alt_count if $token =~ /alt/i; } if ($alt_count < 1){ #if we get here print a message and jump to the next +file print "$file is missing an alt attribute in an img ta +g\n"; last; } } } }

I tested it it works, it's easy to understand (if you read the HTML::TokeParser docs) and it presented an arguement for UTFM, over roll yer own. I did not get a message back from this monk. My assumption is that he's somewhere else now asking the same questions about negative lookahead and whatnot.

My point? I just don't understand the fear associated with using a module. the alternative is much scarier to me. decode CGI variables? parse HTML? I'm very busy, and I have other work to do. I thank the Perl gods for CPAN.


In reply to UTFM - Use the Friendly Modules by thunders

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.