(Hopefully that title makes sense)

Fellow monks,

I've been beating my head over this one for the last few hours and am still completely stumped!

I'm trying to parse some html

<tr align="left" valign="top"> <td align="left" valign="top"> <table C +ELLPADDING="0" CELLSPACING="0"><tr><td> <a href="page.cfm?objectid=11 +933900&method=full&siteid=50144" CLASS="smallteaserpic">Costly false +alarms</a><BR> <font CLASS="headtypea"> A new policy aimed at tackling the huge waste of police time a +ttending false security alarm calls is to be introduced this week <a +href="page.cfm?objectid=11933900&method=full&siteid=50144">more</a> </font> </td></tr></table> <p> <table CELLPADDING="0" CELLSPACING="0"><tr> +<td> <a href="page.cfm?objectid=11933890&method=full&siteid=50144" CL +ASS="smallteaserpic">Mindless yobs terrorise OAP's</a><BR>

using the following code:

for (@list){ if ($list[$count]=~ m!page.cfm!iog){ $list[$count] =~ s/<img[^>]*>//iog; $list[$count] =~ s/\r\n//iog; $list[$count] =~ s!</?t(r|d|able)?[^>]*>!!iog; $list[$count]=~ m!(<a.+href.+>)(.+</a>)!iog; print "Count is: $count\n"; &add_it( $1, $2 ); } $count++; };

The add subroutine checks if the url is already present in a hash table, and if not, it adds it.

My problem is that I'm getting one long hash string here. It's not making multiple hash entries.

I know the technique *should* work as I've got it working on other html pages, but I'm absolutely stuck as to why this is not seperating the data in to seperate hash keys/elements.

I have even tried to get rid of the linefeeds to see if that was perhaps causing the mischief, but to no avail.

Anyone have any ideas?

Thanks in advance!

Some people fall from grace. I prefer a running start...


In reply to Problems splitting HTML in to hash table by Popcorn Dave

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.