Re: Recommendation on a module for HTML/XML extraction.

Just to start getting familiar with HTML::TreeBuilder::XPath, I took some time and came up with the following which seems to work on at least a single chunk of the file. I'll keep playing later, but it looks promising.

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;
use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new;

$tree->parse_file('txt.html');

my @nodes = $tree->findnodes('//*[@class="message reply"]');

for (@nodes){
    my $person 
      = $_->findvalue('span[@class="profile fn"]');
    my $time 
      = $_->findvalue('abbr[@class="time published"]/@title');
    my $msg 
      = $_->findvalue('abbr[@class="time published"]/div[@class="msgbo
+dy"]');

    print "$person :: $time :: $msg\n";
}
[download]

I truly appreciate all the feedback. Once I get something usable, I will likely look deeper into the suggestions by Your Mother.

Comment on Re: Recommendation on a module for HTML/XML extraction. Download Code