Re^2: PERL HTML::TableExtractor

Thanks for the help! That worked great for printing to the cmd prompt...I have one last question, and then I'm going to try to get it to do the rest myself.

Each of the above lines needs to be put into a database field in other words in your example you printed:

my @cells = $r->look_down(
_tag => q{td},
width => q{48%},
valign => q{top},
);

Output:
SERVPRO® of Central Alabama II
Wilson, David & Christie
Phone: (205)678-2224
Fax: (205)678-2226
http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2226

How would I pull out the following information into different variables such as:

$location = SERVPRO® of Central Alabama II
$phone = (205) 678-2224
$fax = (205) 678.2226
$website = http://www.servpro.com/franchises/enhanced_asp/default.asp?fn=2226

Also, I had a few questions about the functions in your code and hope you could tell me what they do in plan english so that I can start implementing them? :)

Here is the code you created:

for my $cell (@cells){

my $bold = $cell->look_down(_tag => q{b});
print $bold->as_text, qq{\n};

for my $item ($cell->content_refs_list) {
next if ref $$item;
print $$item, qq{\n};
}

My first question is on

for my $item ($cell->content_refs_list)

Here is what I understand. The "for" loop is creating a new value for $item for each item in the array ($cell->content_refs_list), correct? So what is the $cell->contents_ref_list creating, and how does it know to create a new line at each break in the data? In general, what does the "->" do, and what does "content_refs_list" refer to?

Next you print $$item. Why use two $$ here?

I think I understand the rest, so if you could explain and help me with the points above, I should be good to go! Thanks for the awesome help...OOOMMMM!!! :)

Comment on Re^2: PERL HTML::TableExtractor

Replies are listed 'Best First'.
Re^3: PERL HTML::TableExtractor by wfsp (Abbot) on Dec 23, 2008 at 06:56 UTC
HTML::TreeBuilder builds trees of HTML::Elements. The methods we'll be looking at come from there. Keep the docs handy. :-) The table cells we are interested in look like the following (tidied up): `<td width="48%" valign="top"> <b>SERVPRO<sup><small>®</small></sup>of Northern Alabama</b> <br> Wilson, David & Christie <br> Phone: (205)678-2224 <br> Fax: (205)678-2226 <br> <a href='http://www.servpro.com/'>Visit their web site</a> </td>` [download] First we get an array of all those cells (an array of H::E objects) `my @cells = $r->look_down( _tag => q{td}, width => q{48%}, valign => q{top}, );` [download] In scalar context `$obj->look_down` returns the first found, in list context it returns all of them. For each cell `for my $cell (@cells){` [download] we first look down for the the bold tag element and print out the text within it `my $bold = $cell->look_down(_tag => q{b}); print $bold->as_text, qq{\n};` [download] we then iterate over a list of the elements `for my $item ($cell->content_refs_list) { next if ref $$item; print $$item, qq{\n}; }` [download] `$obj->content_refs_list` is another H::E method which, as you might guess, returns a list of references. Each reference is either a reference to an H::E object (i.e. another ref) or a reference to text. `next if ref $$item;` skips over other H::E objects (in this case the `<b>, <br> and <a> tags`) so what is left is a reference to text. `$$item` dereferences the reference. In fact this looks very similar to the example in the H::E docs. So go see. :-) Finaly we want to look down for the anchor tag `my $link = $cell->look_down( _tag => q{a}, );` [download] and print out the href attribute `print $link->attr(q{href}), qq{\n\n};` [download] Rather than print out the results you could push them onto an array (say, `@record`) so that `$record[0]` would be the location, `$record[1]` the phone number etc.. You can get the low down on the arrow `->` in perlreftut and perlref. We use it here to call an objects method. Good luck!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: PERL HTML::TableExtractor
by wfsp (Abbot) on Dec 23, 2008 at 06:56 UTC

HTML::TreeBuilder

HTML::Element

The table cells we are interested in look like the following (tidied up):

<td width="48%" valign="top">
  <b>SERVPRO<sup><small>&#174;</small></sup>of Northern Alabama</b>
  <br>
  Wilson, David & Christie
  <br>
  Phone: (205)678-2224
  <br>
  Fax: (205)678-2226
  <br>
  <a href='http://www.servpro.com/'>Visit their web site</a>
</td>
[download]

my @cells = $r->look_down(
  _tag => q{td},
  width => q{48%},
  valign => q{top},
);
[download]

$obj->look_down

For each cell

for my $cell (@cells){
[download]

my $bold = $cell->look_down(_tag => q{b});
print $bold->as_text, qq{\n};
[download]

  
for my $item ($cell->content_refs_list) {
  next if ref $$item;
  print $$item, qq{\n};
}
[download]

$obj->content_refs_list

next if ref $$item;

<b>, <br> and <a> tags

$$item

In fact this looks very similar to the example in the H::E docs. So go see. :-)

Finaly we want to look down for the anchor tag

 my $link = $cell->look_down(
   _tag => q{a},
 );
[download]

print $link->attr(q{href}), qq{\n\n};
[download]

@record

$record[0]

$record[1]

You can get the low down on the arrow -> in perlreftut and perlref. We use it here to call an objects method.

Good luck!

[reply]
[d/l]
[select]