Since I'm not anxious to register on some website just so I can see what on earth the tables you're talking about look like, how about trying to give a semi-technical description for us? You said they're not regular HTML tables, so what are they? Are they graphic images? Are they javascript entities? Are they PDF files?
And since you've already tried everything you can think of, can you tell us what you thought of thus far, so that we know where you've already invested time? While you're at it, you might also let us know in what way your attempts fell short of meeting the need.
Since I don't know better, I'll suggest that most websites worth their weight in salt will also be lynx-friendly. That being the case, perhaps the easiest way to get at the data from the tables in question is to parse the all-text output from lynx. It's easy to grab the output from lynx. ..of course this assumes you're on a linux/unix type system. In this way, you can use the robustness of lynx -- a full-fledged text-based browser capable of handling cookies, and all sorts of curve-balls -- to intelligently dump the site to text.
| [reply] [Watch: Dir/Any] |
davido already has the problems with your post down, but for the general task of extracting data out of possibly nested tables, I can recommend mojotoads HTML::TableExtract. It turns HTML tables into easy accessible arrays.
Another solution might be Template::Extract, if you already understand the Template::Toolkit syntax and want to convert a HTML page back into a Perl structure.
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
| [reply] [Watch: Dir/Any] [d/l] |
Unfortunately in the case given here it's impossible to tell more without having a clue of how the table is made. If they use an image, for instance, it will be very hard to extract data from it using Perl. What we need for sure is that you'll have to automate a way to login to the site with your credentials before accessing the listings page; therefore I'd play with the LWP family modules.
| [reply] [Watch: Dir/Any] |