Why Would HTML::LinkExtor return a hash of attributes?

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I've been struggling with HTML::LinkExtor for a while now, thanks to everyone who pointed me in its direction.

You can see the full documentation at http://search.cpan.org/author/GAAS/HTML-Parser-3.26/lib/HTML/LinkExtor.pm if you want to, but I'm just wondering what prompted the author to return the results as an array with a hash as the second item?

Here's the relevant section:

$p->links 
Returns a list of all links found in the document. The
returned values will be anonymous arrays with the 
follwing [sic] elements: 
  [$tag, $attr => $url1, $attr2 => $url2,...]
[download]

It's kind of confusing me. For a start, if it's a hash, shouldn't that be

[$tag, {$attr => $url1, $attr2 => $url2,...}]
[download]

instead?

And more to the point, I'm racking my not-inconsiderable knowledge of HTML to try and find a situation where a single tag could have two or more attributes which were links.

Apart from anything else, this structure leads to scary dereferencing being needed like this:

$p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/");
 sub cb {
     my($tag, %links) = @_;
     print "$tag @{[%links]}\n";
 }
[download]

Maybe

@{[%links]}
[download]

isn't scary to you but it is to me...
--

($_='jjjuuusssttt annootthheer
     pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
[download]

Comment on Why Would HTML::LinkExtor return a hash of attributes? Select or Download Code

Replies are listed 'Best First'.
Re: Why Would HTML::LinkExtor return a hash of attributes? by PodMaster (Abbot) on Aug 19, 2002 at 05:24 UTC
"what prompted the author to return the results as an array with a hash as the second item? " Where do you get that from? "It's kind of confusing me. For a start, if it's a hash, shouldn't that be ..." It's not a hash, where do you get hash from? And why is `print "$tag @{[%links]}\n";` scary to you? More than anything it's kind of silly to me, cause all that sub needs to be is `print "@_\n";` update: "And more to the point, I'm racking my not-inconsiderable knowledge of HTML to try and find a situation where a single tag could have two or more attributes which were links.". AFAIK, no attributes are ever "links". Duplicate SRC attributes wouldn't be valid HTML, and one of the 2 would be ignored. It's like this, if anyone writing HTML wants anybody to somewhat accurately interpret it, well, he's gotta write valid HTML, right? (right) `____________________________________________________` ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply] [d/l] [select]
Re: Re: Why Would HTML::LinkExtor return a <I>hash</I> of attributes? by Cody Pendant (Prior) on Aug 19, 2002 at 05:33 UTC
Isn't it a hash? Why does it have "key => val, key => val" if it isn't? Plus, if it isn't, why does the sub grab it as: `my($tag, %links) = @_;` [download] if it isn't a hash? `all that sub needs to be is print "@_\n"` Try it. You get both the "HREF" and the thing it's a link to. I don't want to extract "HREF" 5,000 times do I? -- `($_='jjjuuusssttt annootthheer pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;` [download]	[reply] [d/l] [select]
Re: Re: Re: Why Would HTML::LinkExtor return a hash of attributes? by PodMaster (Abbot) on Aug 19, 2002 at 05:44 UTC
`=>` Is a fancy comma (,). It allows you to say `print I => AM => NOT => QUOTING => WORDS => TIMES => 5;` [download] which would print `IAMNOTQUOTINGWORDSTIMES5` Using this fancy comma doesn't make a hash. A hash is a "data structure", and `I => AM => NOT => QUOTING => WORDS => TIMES => 5` is a list. Now you can do many things with lists. You can create arrays ( also data structures ) `my @ARRAY = ( I => AM => NOT => QUOTING => WORDS => TIMES => 5 );` [download] and you can create hashes `my %HASH = ( I => AM => NOT => QUOTING => WORDS => TIMES => 5 );` [download] . Do you follow now? `____________________________________________________` ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply] [d/l] [select]
Re: Why Would HTML::LinkExtor return a hash of attributes? by Arien (Pilgrim) on Aug 19, 2002 at 05:50 UTC
I'm racking my not-inconsiderable knowledge of HTML to try and find a situation where a single tag could have two or more attributes which were links. Well, `<object>` has the attributes `classid`, `codebase`, `data`, and `archive` (a space seperated lis of URIs). And even `<img>` could have multiple attributes that link: `src`, `longdesc`, and `usemap`. — Arien	[reply] [d/l] [select]
Re: Re: Why Would HTML::LinkExtor return a hash of attributes? by Cody Pendant (Prior) on Aug 19, 2002 at 06:01 UTC
`<img> could have multiple attributes that link: src, longdesc, and usemap.` [download] Aha! Now that makes sense. I bet I could have figured it out if I'd thought a bit longer. I'm so lazy. Thanks for your help, podmaster, but surely the guy is intending it to be used as a hash? One useful attribute of it being a hash would be to clobber incorrect HTML where a link had two HREFs. -- `($_='jjjuuusssttt annootthheer pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;` [download]	[reply] [d/l] [select]
Re: Re: Re: Why Would HTML::LinkExtor return a hash of attributes? by PodMaster (Abbot) on Aug 19, 2002 at 06:15 UTC
"but surely the guy is intending it to be used as a hash?" I'm not a mindreader. HTML::LinkExtor is a pretty mature module, and I doubt the interface should/will change. You can certainly try to persuade the guy (perlsonally i'd rather just write HTML::LinkExtractor which would do all the things you say here, but would also extract the link text (stuff in between `<a ..> </a>` tags). " One useful attribute of it being a hash would be to clobber incorrect HTML where a link had two HREFs. " You don't have to worry about that (when in doubt, test). `use HTML::LinkExtor; my $p = new HTML::LinkExtor( sub { print "@_\n" }, ); $p->parse( q{ <a href="BUTTER" href="SCOTCH"> <img src="AND" src="PEANUTS"> }); __END__ a href SCOTCH img src PEANUTS` [download] `____________________________________________________` ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply] [d/l] [select]
Re: Re: Re: Re: Why Would HTML::LinkExtor return a hash of attributes? by Cody Pendant (Prior) on Aug 19, 2002 at 23:52 UTC