in reply to Re^4: Parsing HTML into various files
in thread Parsing HTML into various files

It looks to me that the HTML of this HTML file is not compatible with the file I used to develop this code. Can I take a look at that file?

I suspect that, by contrast to the other file, here you used "th" tags to indicate the attribute names, instead of a combination of "td" and "b" tags.

Anyway, if that is indeed the case: don't panic. The parser can probably be tweaked to handle this, possibly by looking at the "colspan" attribute.

Replies are listed 'Best First'.
Re^6: Parsing HTML into various files
by Lady_Aleena (Priest) on Aug 25, 2010 at 19:05 UTC

    bart, you got it in 1. I totally forgot that the files that are online are completely different from the local files which I have structured a lot differently. I will just download the old files since the new ones are so very different. I took a look at the underlying code of the files that are online about 10 minutes ago right before I updated my reply to wfsp.

    Update: After getting the right files, I ran your script. It worked almost flawlessly, except that there are a nested tables in some of the descriptions that are throwing this off. I am looking at the following lines thinking that something should be added there to take into account the nested tables. When colspan = "4" ignore the tables within it.

    if($flag == 1) { $_ = ""; my $colspan = $token->get_attr('colspan'); if($colspan) { push @table, $colspan == 2 ? 'school(s)' : 'description'; } }
    Have a cookie and a very nice day!
    Lady Aleena