Okay, that seems to work HTML::TreeBuilder seems be more forgiving
however $tree->dump gives a lot of information, luckely _as_XML_intended looks more readable again
Now the next part... extracting the right pieces of information with XPath
some pieces will be quite easy, for example the title. Others will be from traversing a <TABLE>:
in the left colum there is a data description, like 'Author', in the right column the name, like 'Wall, L.' (sometimes inside the <a HREF=...>Author Name</a> which makes it a bit more complicated, for I only want the text)
my guess is to look for a text element in a <td> tag etc, that equals "Author" and then do something with the next sibling?
In reply to Re^2: extracting data from HTML
by Jurassic Monk
in thread extracting data from HTML
by Jurassic Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |