SteinerKD has asked for the wisdom of the Perl Monks concerning the following question:

As some might know I'm working on a script to collect gaming stats for squad members from a website. Now I'm getting close to a working thing (Much thanks to wfsp).

What I'm working on now is adapting a sub that wfsp wrote to collect data from specific spots on a webpage by feeding the sub coordinates my $databit = collect_data(x, y, z); basically.

Guess I'm getting tired and can't see the wood for all the trees now, but I just can't figure out the treebuilder function for my use (found a message here about it but all links were dead).

The code I'm adapting:

sub collect_data { my ($tab, $row, $cell) = @_; # we want table X my @tables = $t->look_down(_tag => q{table}); my $table = $tables[$tab]; # Xth row my @trs = $table->look_down(_tag => q{tr}); my $tr = $trs[$row]; # Xth column my @tds = $tr->look_down(_tag => q{td}); my $td = $tds[$cell]; # get any text my $data = $td->as_text; return $data; }

The original sub was to test if a new page existed and was feed a page number ($t) and it returned a new one if it existed. I however feed it the 3 positional variables and expect the data at that location to be feed back.

As you can see the $t is still in there as I'm unsure what the line should be without it. I've also inserted the supplies coordinates where I believe they should go to work as intended.

Replies are listed 'Best First'.
Re: using HTML::TreeBuilder to collect data for populating variables
by wfsp (Abbot) on Aug 04, 2010 at 07:20 UTC
    hi SteinerKD,

    First of all a lecture. :-)

    Read the advice you were given in the previous thread. If you don't post code that monks can run you can end up in hot water and you'll be scolded. You might know what you mean but you can't expect anyone else to know unless you tell them. A link to the previous thread might have been helpful. puts his mortar and gown back in the cupboard

    Now, what we need is some idea of what the HTML you'll be parsing looks like. Either a link to a page, or better, a stripped down example omitting stuff we don't need to know about (style sheets, scripts, formating attributes, etc.). In fact, having a bare bones skeleton of the HTML you are working with will help identify the best way to parse it. This is certainly the case during development/testing and it is the first thing I do. There might even be a Perl module near you that could help. :-) Give an example of the output you need.

Re: using HTML::TreeBuilder to collect data for populating variables
by aquarium (Curate) on Aug 04, 2010 at 01:10 UTC
    I'm sure this code really clever...however hard to figure out exactly what it does as the sub uses some globals as well as parameters passed to it, and also invokes some (undocumented) look_down and as_text subs attached to hash structures.
    from a modular code design and maintainability perspective not great code to work with.
    i suggest you contact the author of the code, rather than putting the named author's code up for review/explanation in this separate post.
    the hardest line to type correctly is: stty erase ^H
      hi aquarium,

      I suggested starting a new thread. The previous thread, collect data from web pages and insert into mysql had made some progress and I thought the next step warranted a new thread (we had got down to Re^11). Tackle a broad question in chuncks rather than in one go. I still think it was reasonable advice but perhaps a bit more explanation and some working code would have helped. :-)

      As SteinerKD has indicated $t is an HTML::TreeBuilder object and should have been passed to the sub. H::TB returns HTML::Element objects and it is there that $t->look_down() and $t->as_text are documented.

      It's not clever code but may well be hard to figure out and not great to work with. With hindsight it might be more appropriate to use something like HTML::TableExtract. In the previous thread we already had a H::TB object and that was pressed into service.

      Actually the code is fairly standard. However it does use a module which leverages another module to get some serious work done. The HTML::TreeBuilder documentation specifically refers to the HTML::Element documentation and makes it clear in several places that you are dealing with objects.

      True laziness is hard work

      It uses HTML::TreeBuilder and it was the original code writer that suggested I made a new topic as the old one where the sub comes from was getting a bit large.


      Sorry for wasting your time, I figured it out.

      The $t variable was rather essential as it contained the content the TreeBuilder had created and I wanted to navigate. When feeding that variable with coordinates it returned exactly what I was expecting it to!

      Now I need to figure out how to use the same code to grab the variable length list data from the pages as that gives a moving target.