comment on

Sorry, but i didn't ask why you are looping, i asked why are you looping like that? But the point is mu. Read on. ;)

"looks like I'd still have to search for all the href links as it's pulling all the stuff out..."

That's much more trivial to do then you make it sound. Now, i don't know what a 'headline' is, so i am going to assume it is the text between the anchor tags. All you need to do is this:

# create the parser, etc.

my %hash;
while (my $tag = $parser->get_tag('a')) {
   $hash{$parser->get_text} = $tag->[1]->{href};
}

for (keys %hash) {
   print qq|<a href="$_">$hash{$_}</a>\n|;
}
[download]

Every time you add a key to hash, non-unique keys will overwrite the ones that already exists - i see no good reason to encapsulate this in a subroutine call.

If you want unique URL's instead, simply switch $parser->get_text with $tag->[1]->{href} (and the keys with the values in the for loop). If you want to parse the href links even further, then i suggest the URI module:

use URI;

# etc.

my @list;
while (my $tag = $parser->get_tag('a')) {
   my $uri = URI->new($tag->[1]->{href});
   push @list, {
      path  => $uri->path(),
      query => { $uri->query_form() },
      text  => $parser->get_text(),
   };
}
print Dumper \@list;
[download]

There are soooo many cool modules out there to make your life easier. I personally have more fun writing 'glue code' than 'doing it all by hand'. Doing the later is a good way to learn, but after that, i say it is better and faster to use the help of the CPAN (and all the wonderful folks who contribute).

"What I'm really stumped about though is why the code I posted was concatenating the values on the matches ...Any ideas on that?"

Nope, sorry. When i see someone doing it the wrong way, instead of trying to understand their logic i try to show them a more right way. It would take far too much energy do the former and liberal amount of PSI::ESP.

I know this came off as grumpy - but i really do wish you the best in your endeavor. Good luck!

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

In reply to (jeffa) 3Re: Problems splitting HTML in to hash table by jeffa
in thread Problems splitting HTML in to hash table by Popcorn Dave

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.