Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

UTFM - Use the Friendly Modules

by thunders (Priest)
on Mar 08, 2002 at 23:06 UTC ( [id://150458]=perlmeditation: print w/replies, xml ) Need Help??

Yesterday, a new monk was asking around the chatterbox for a regexp that could match an HTML image tag that doesn't have a alt attribute. Sounds easy to a newbie, but everyone who's ever tried dealing with HTML,using Regexes knows it's not. The reasons are obvious one has to deal with the possiblity of > and < tags in quotes, you don't know where a certain attribute is going to appear in a tag, etc etc. I'm no regex wizard and even the people on the 'box that were just said use HTML::*. I was one of these voices. But ever time I came back the same monk was repeating the same questions. I finally messaged that monk with a link to the following code.
#!/usr/bin/perl -w #program to find img tags w/o alt attributes use strict; use HTML::TokeParser; #build list of HTML files in the same directoy my @files=<*>; @files = grep(/[.]htm/i ,@files); #parse each file for my $file (@files) { my $p = HTML::TokeParser->new( $file ); #move through each html token in the file while (my $token = $p->get_token){ #find IMG start tags if ($token->[0] eq "S" && $token->[1] =~ /img/i) { my $alt_count = 0; for my $token (keys %{$token->[2]}){ #if alt tag is found count it ++$alt_count if $token =~ /alt/i; } if ($alt_count < 1){ #if we get here print a message and jump to the next +file print "$file is missing an alt attribute in an img ta +g\n"; last; } } } }

I tested it it works, it's easy to understand (if you read the HTML::TokeParser docs) and it presented an arguement for UTFM, over roll yer own. I did not get a message back from this monk. My assumption is that he's somewhere else now asking the same questions about negative lookahead and whatnot.

My point? I just don't understand the fear associated with using a module. the alternative is much scarier to me. decode CGI variables? parse HTML? I'm very busy, and I have other work to do. I thank the Perl gods for CPAN.

Replies are listed 'Best First'.
Re: UTFM - Use the Friendly Modules
by gav^ (Curate) on Mar 08, 2002 at 23:22 UTC
    A slightly easier to follow version:
    use HTML::TreeBuilder; foreach (<*.html>) { my $tree = HTML::TreeBuilder->new_from_file($_); if ($tree->look_down('_tag', 'img', sub { !$_[0]->attr('alt') })) +{ print "$_ has a missing alt tag\n"; } $tree->delete; }

    gav^

Re: UTFM - Use the Friendly Modules
by mrbbking (Hermit) on Mar 09, 2002 at 02:49 UTC
    I'm one of those people who doesn't always run to CPAN every time I have a problem that I think someone else may have solved before. But I do agree that CPAN is one of the Best Things About Perl, so, after questioning my motives for a while, here's what I came up with:
    1. CPAN is enormous, with much duplication. How do I know which module to choose from a group of similarly-named modules?
    2. Sometimes I like the challenge of solving the problem myself.

    That's it. Maybe with more experience, I'd start to recognize more author names on CPAN and associate them with quality modules - to help me avoid spending too much time on the bad/old/poorly maintainted ones. That might help me avoid #1 more frequently.
    I do tend to avoid modules that have not been updated recently, especially if they have low version numbers.

    And - for what I do with Perl - I often have the luxury of playing with the problem on my own for a while. I like this.

    Now, I *do* go to CPAN immediately when I know of a specific module that I think will be helpful. I've done this with Text::Template, CGI (of course), LibXML and others. To figure out what module to try, I come here and play with Super Search. But when I have a problem and don't know where to start to find a module, I'd rather try to figure it out myself first.

    I'm not "afraid" of using a module. Just sometimes don't know where to start, and other times feel up to a challenge.

      I too can think of a few reasons that one would not want to, or would not be able to use CPAN or it's relatives. First off, windows users don't always have the ability to build cpan's unix-centric modules. Also some users indicate that they are deploying an app or CGI on a shared server, where they do not have root. So In my recommendations I try to list either modules that currently ship with perl, or modules that are trivial to make, nmake, cpan,or ppm and that have limited depedencies.

      HTML::TokeParser fits within these constraints, as It came with my Linux and Activestate perl Build.

Re: UTFM - Use the Friendly Modules
by cjf (Parson) on Mar 09, 2002 at 07:52 UTC
    I just don't understand the fear associated with using a module. the alternative is much scarier to me. decode CGI variables? parse HTML?

    Scarier to you because you know better by now. Try looking at this from a complete Perl novice's position. They see basically two options:

    1. Use a module. First they have to understand exactly what a module is, then which one to use, then where to get it, then how to download it, then how to install it, then how to use it in their script.
    2. Ask for a regex or, at most, a few short lines of code (their perspective, not mine), insert it in their existing script, done.

    For most tasks (especially param or HTML parsing) using a module is by far the better choice. Convincing someone of this is the hard part. To the uninitiated, your image tag example seems pretty simple, why would they need a module for it?

    Obviously you can't spend an hour discussing the various flaws in a regex with every monk that comes along. If they choose to ignore your advice in the first place that's up to them. They'll either eventually get tired of fixing buggy code and use modules, or they'll continue to write buggy code and won't be any competition in the job market ;-).

      I agree completely. As a former newbie myself, I remember my primary FUD with modules:

      1. How to install? Although documentation on installing modules can be found, they're not always written for the newbie audience. I know that a lot of monks swear by the CPAN module. Yes, it's great when it works... but a frightening experience to a newbie when it doesn't. First, just installing the CPAN module can be an exercise in futility... the looping behavior with excessive output is just enough to make those little hairs on the back of your neck stand on end.

        When you finally get CPAN installed, there's no guarantee it's always going to work. Try installing Net::SSH::Perl using CPAN. Good Luck. Dollars-to-doughnuts it will die on IDEA.pm installation. Just an example, YMMV.

        Personally, I prefer to install my modules by hand. It's almost as easy (perl Makefile.pl && make && make test && make install) and really gives you a better idea of what's going on and where. I like to know what's being installed in my system, if it's conflicting with existing items, and if it has any dependencies (wash, rinse, repeat). Some of this need for control likely stems from my parallel concern with systems security.

      2. Using most modules does not require that you're comfortable with OOP, but it sure suggests it, if you want to get the most out of the module. Yes, that's the whole design and purpose behind CPAN... reusable code. Nevertheless, it's a daunting task to learn OOP just so you can use one specific module.

      That said, I still think that the installation of modules is by far the darker of the two. I think the community would be well served to come up with some good (newbie-ized) introductory documentation to installing Perl modules.

      -fuzzyping
Re: UTFM - Use the Friendly Modules
by bluto (Curate) on Mar 09, 2002 at 00:42 UTC
    Fear generally comes from being unprepared. For example, I tend to use the modules bundled with Perl. I use other modules on CPAN if I've scoured the code or I know that a lot of people use them and I see some bug fix releases along the way. I start getting nervous when a module was last updated in 1997 and the current version is 0.02. I'm unprepared to deal with the consequences of these modules failing on a production machine.

    As far as this monk goes, who knows? I haven't been following the discussion you've mentioned, but FWIW, I've seen this kind of behaviour a lot and my general impression is that these folks barely have their perl "sea legs" and are unprepared to figure out how to install new modules and maintain them. It is sad though when they don't even make an effort to just give it a go. It is sadder still, and rather insulting, when they don't give a reason as to why a module wont work for them. Even a lame excuse (e.g. "My boss is paranoid about code that we didn't write.") is better than silence since you can help correct the notion that "not written here" == bad. Newbies that have the "Right Stuff" will work it through and say "thanks", rather than regurgitate the same question.

    bluto

      It's difficult to say why people would be adverse to using modules. It's very easy to recommend people to a given module, since the formatter for nodes recognizes [cpan://] and handles it so well. The person looking for HTML-parsing help was getting the right answers, and the people on the box were giving him the right answers. Even in SoPW messages, I see people point new users at the most useful CPAN modules, with no indication that the recommendation stuck. Fortunately, there are plenty of users who follow-up to their own question to let us know that they listened to, and benefitted from, our aid :-).

      --rjray

(crazyinsomniac) Re: UTFM - Use the Friendly Modules
by crazyinsomniac (Prior) on Mar 09, 2002 at 03:36 UTC
Re: UTFM - Use the Friendly Modules
by demerphq (Chancellor) on Mar 11, 2002 at 13:25 UTC
    Heh. I was there when this happened and found it quite maddening at the time. On reconsideration the source of the trouble seemed to be that the requestor was unaware of the subtle issues parsing HTML and thought the choice was between two equally successful techniques, writing a regex or using a module.

    The key point here was that he was unaware of the difficulties involved (despite certain venerable monks examples of the problems involved) and so incorrectly tried to use a simple approach that ultimately was doomed to failue.

    Yves / DeMerphq
    --
    When to use Prototypes?
    Advanced Sorting - GRT - Guttman Rosler Transform

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://150458]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 01:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found