in reply to Re: (jeffa) Re: Problems splitting HTML in to hash table
in thread Problems splitting HTML in to hash table
"looks like I'd still have to search for all the href links as it's pulling all the stuff out..."
That's much more trivial to do then you make it sound. Now, i don't know what a 'headline' is, so i am going to assume it is the text between the anchor tags. All you need to do is this:
Every time you add a key to hash, non-unique keys will overwrite the ones that already exists - i see no good reason to encapsulate this in a subroutine call.# create the parser, etc. my %hash; while (my $tag = $parser->get_tag('a')) { $hash{$parser->get_text} = $tag->[1]->{href}; } for (keys %hash) { print qq|<a href="$_">$hash{$_}</a>\n|; }
If you want unique URL's instead, simply switch $parser->get_text with $tag->[1]->{href} (and the keys with the values in the for loop). If you want to parse the href links even further, then i suggest the URI module:
There are soooo many cool modules out there to make your life easier. I personally have more fun writing 'glue code' than 'doing it all by hand'. Doing the later is a good way to learn, but after that, i say it is better and faster to use the help of the CPAN (and all the wonderful folks who contribute).use URI; # etc. my @list; while (my $tag = $parser->get_tag('a')) { my $uri = URI->new($tag->[1]->{href}); push @list, { path => $uri->path(), query => { $uri->query_form() }, text => $parser->get_text(), }; } print Dumper \@list;
"What I'm really stumped about though is why the code I posted was concatenating the values on the matches ...Any ideas on that?"
Nope, sorry. When i see someone doing it the wrong way, instead of trying to understand their logic i try to show them a more right way. It would take far too much energy do the former and liberal amount of PSI::ESP.
I know this came off as grumpy - but i really do wish you the best in your endeavor. Good luck!
jeffa
L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: (jeffa) 3Re: Problems splitting HTML in to hash table
by Popcorn Dave (Abbot) on Jun 12, 2002 at 02:37 UTC |