in reply to Listing all <a>...</a> tags in HTML file

This is exactly one of those situations where you might start panicking at the thought of “regular-expression hell” ... until you see the light at the end of the tunnel:   CPAN!

If you surf to http://search.cpan.org and type in “HTML::”, you will at this writing be rewarded with 4,408 hits! Well, maybe that's a bit much, so if instead you search for “HTML href parser” you get a mere 104 pre-written packages to choose from.

So much for futzing with “regular-expression hell!” :-)

So, now your task stops being one of trying to figure out, (as though you were the first person on the planet...) “how do I do this (from scratch)?” (Answer:   you don't!) Instead, you have this broad collection of high-level widgets to choose-from, and so now your dual questions become:   “which one of these is the best for my task?” and, “how do I use this?” Quite a difference.

Generally, you'd like to find the most specific widget that seems to be most-focused upon your particular task. CPAN gives you a lot of that.

Dictum Ne Agas:   Do Not Do A Thing Already Done!

Incidentally... when I select and decide to use a CPAN module for an application-specific purpose, I still like to create a “my-application specific” package for use in my application. This package will encapsulate the “what, not how” of whatever my application is actually trying to do. In this way I compartmentalize my code into just one place, and I will clearly document what “my application” is doing. (Now I can say... “If you want to discover what that is, just perldoc the module. If you want to discover how we're doing it at the moment, read the module's source.”) If the first CPAN-module that I decided to employ isn't cutting the mustard anymore, I can re-implement just this one package so that it employs a different CPAN-module but provides the same services for my application as the previous version of this package did.

Oops... let me clarify that thought...

“My application-specific package” will use a CPAN-module to do the work ... that's the “how” ... but all of the mumbo-jumbo of actually doing that will be encapsulated into a package that is specific to my app. The rest of the app will use my package, while my package will in turn use the CPAN module to actually get the job done.

Replies are listed 'Best First'.
Re^2: Listing all <a>...</a> tags in HTML file
by James2000 (Initiate) on Nov 29, 2007 at 19:04 UTC
    Thanks all for quick responses,

    I think HTML::TokeParser will meet all my requirements (for example listing src=... sections in <img> tags), and I can avoid taking regular expressions route!

    Thanks again,

    James

Re^2: Listing all <a>...</a> tags in HTML file
by pc88mxer (Vicar) on Nov 29, 2007 at 22:09 UTC
    sundialsvc4 wrote:
    Dictum Ne Agas:   Do Not Do A Thing Already Done!

    A corollary of the same principle:

    Wait long enough, and someone will write what you need.

    Sorry, although I took 4 years of it in high-school, I can't provide the Latin equivalent.