Fellow monks,
I've been beating my head over this one for the last few hours and am still completely stumped!
I'm trying to parse some html
<tr align="left" valign="top"> <td align="left" valign="top"> <table C +ELLPADDING="0" CELLSPACING="0"><tr><td> <a href="page.cfm?objectid=11 +933900&method=full&siteid=50144" CLASS="smallteaserpic">Costly false +alarms</a><BR> <font CLASS="headtypea"> A new policy aimed at tackling the huge waste of police time a +ttending false security alarm calls is to be introduced this week <a +href="page.cfm?objectid=11933900&method=full&siteid=50144">more</a> </font> </td></tr></table> <p> <table CELLPADDING="0" CELLSPACING="0"><tr> +<td> <a href="page.cfm?objectid=11933890&method=full&siteid=50144" CL +ASS="smallteaserpic">Mindless yobs terrorise OAP's</a><BR>
using the following code:
for (@list){ if ($list[$count]=~ m!page.cfm!iog){ $list[$count] =~ s/<img[^>]*>//iog; $list[$count] =~ s/\r\n//iog; $list[$count] =~ s!</?t(r|d|able)?[^>]*>!!iog; $list[$count]=~ m!(<a.+href.+>)(.+</a>)!iog; print "Count is: $count\n"; &add_it( $1, $2 ); } $count++; };
The add subroutine checks if the url is already present in a hash table, and if not, it adds it.
My problem is that I'm getting one long hash string here. It's not making multiple hash entries.
I know the technique *should* work as I've got it working on other html pages, but I'm absolutely stuck as to why this is not seperating the data in to seperate hash keys/elements.
I have even tried to get rid of the linefeeds to see if that was perhaps causing the mischief, but to no avail.
Anyone have any ideas?
Thanks in advance!
Some people fall from grace. I prefer a running start...
In reply to Problems splitting HTML in to hash table by Popcorn Dave
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |