Stegalex has asked for the wisdom of the Perl Monks concerning the following question:

I have just been given the task of making sure that our website is accessible to people with disabilities. Now I know that there is a long list of things that could make our site inaccessible, however, in the interest of time, I have zeroed in on the two biggest offenders:
- not having an alt tag within an img tag
- not having an alt tag within an area tag
What I would like to do is to write some code to scan our entire site and identify pages that are non-compliant. Can anyone help out by telling me how to compose a regular expression that would find img tags in an html file and then figure out whether there is an alt tag within the image tag?

Replies are listed 'Best First'.
Re: Scanning web pages for accessibility
by Masem (Monsignor) on Dec 13, 2001 at 06:12 UTC
    You may also consider a pre-developed tool, Bobby, which you can download and test your entire site for more than just the ALT tag problem. Bobby's been sites many times with Section 508 regulations for government web sites in order to improve accessiblity and is a well-trusted tool by others.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

(Ovid) Re: Scanning web pages for accessibility
by Ovid (Cardinal) on Dec 13, 2001 at 06:02 UTC

    The quickest way to do this is to use HTML Tidy. You can download it here. When you attempt to "Tidy" a page, it will give you plenty of hints about accessibility. If you're careful, you could just write a spider for your site and call tidy with backticks.

    Side note: I used to do some programming for Special Olympics and I'm glad to see that there are others out there who care about this issue :)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Scanning web pages for accessibility
by impossiblerobot (Deacon) on Dec 13, 2001 at 06:51 UTC
    Like Ovid and Masem, I'm going to recommend an external tool.
    If you (or your designers) are using Macromedia Dreamweaver for page design, there is a really nice extension for testing usability on the Macromedia site. It is configurable and is capable of some very detailed tests.
    If you decide to go ahead and build something similar in Perl, though, I'd like to see it! :-)

    Impossible Robot
Re: Scanning web pages for accessibility
by strat (Canon) on Dec 13, 2001 at 16:20 UTC
    I think, good names for links might also be an important issue, but this might be difficult to check automatically...

    Best regards,
    perl -e "print a|r,p|d=>b|p=>chr 3**2 .7=>t and t"

Re: Scanning web pages for accessibility
by Steve_p (Priest) on Dec 13, 2001 at 20:40 UTC
    Using the modules HTML::Treebuilder and HTML::Element should make this job rather easy. Here's a quick script that would probably do the job.
    use HTML::Element;
    use HTML::TreeBuilder;
      
    my $root = HTML::TreeBuilder->new;
    $root->parse_file('foo.html'); # source file
      
    foreach my $img ($root->find_by_tag_name('img')) {
        my $alt = $img->attr('alt');
        
        if(!$alt) {
           my $bad_img = $img->attr('src');
           print "There is no alt tag for the image $bad_img\n";
        }
    }
    
    For more info, I would highly suggest reading Sean M. Burke's articles in The Perl Journal #18 and #19 located in the archives at tpj.com.