Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:
Fellow monks,
I've been beating my head over this one for the last few hours and am still completely stumped!
I'm trying to parse some html
<tr align="left" valign="top"> <td align="left" valign="top"> <table C +ELLPADDING="0" CELLSPACING="0"><tr><td> <a href="page.cfm?objectid=11 +933900&method=full&siteid=50144" CLASS="smallteaserpic">Costly false +alarms</a><BR> <font CLASS="headtypea"> A new policy aimed at tackling the huge waste of police time a +ttending false security alarm calls is to be introduced this week <a +href="page.cfm?objectid=11933900&method=full&siteid=50144">more</a> </font> </td></tr></table> <p> <table CELLPADDING="0" CELLSPACING="0"><tr> +<td> <a href="page.cfm?objectid=11933890&method=full&siteid=50144" CL +ASS="smallteaserpic">Mindless yobs terrorise OAP's</a><BR>
using the following code:
for (@list){ if ($list[$count]=~ m!page.cfm!iog){ $list[$count] =~ s/<img[^>]*>//iog; $list[$count] =~ s/\r\n//iog; $list[$count] =~ s!</?t(r|d|able)?[^>]*>!!iog; $list[$count]=~ m!(<a.+href.+>)(.+</a>)!iog; print "Count is: $count\n"; &add_it( $1, $2 ); } $count++; };
The add subroutine checks if the url is already present in a hash table, and if not, it adds it.
My problem is that I'm getting one long hash string here. It's not making multiple hash entries.
I know the technique *should* work as I've got it working on other html pages, but I'm absolutely stuck as to why this is not seperating the data in to seperate hash keys/elements.
I have even tried to get rid of the linefeeds to see if that was perhaps causing the mischief, but to no avail.
Anyone have any ideas?
Thanks in advance!
Some people fall from grace. I prefer a running start...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(jeffa) Re: Problems splitting HTML in to hash table
by jeffa (Bishop) on Jun 11, 2002 at 06:53 UTC | |
by Popcorn Dave (Abbot) on Jun 11, 2002 at 17:56 UTC | |
by jeffa (Bishop) on Jun 11, 2002 at 19:20 UTC | |
by Popcorn Dave (Abbot) on Jun 12, 2002 at 02:37 UTC | |
|
Re: Problems splitting HTML in to hash table
by Zaxo (Archbishop) on Jun 11, 2002 at 06:55 UTC |