john32 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,


I'm working with a great module Template::Extract, for get value from a non structure document, but can't see the way to resolve this, i'm need to extract many time the same information, but doesn't work, see the example:


use Template::Extract; use Data::Dumper; my $obj = Template::Extract->new; my $template = << 'TU'; <ul>[% FOREACH record %] <li><A HREF="[% url %]">[% title %]</A>: [% rate %] - [% comment %]. [% ... %] [% END %]</ul> TU my $document = <<'TY'; <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah +. this text is ignored, too.</li></ul> TY open F,'>./file.txt'; print F Dumper( $obj->run($obj->compile($template), $document) ); close F;

With this script i'm get this:

$VAR1 = { 'record' => [ { 'rate' => 'A+', 'comment' => 'nice', 'url' => 'http://slashdot.org', 'title' => 'News for nerds.' }, { 'rate' => 'Z!', 'comment' => 'yeah', 'url' => 'http://microsoft.com', 'title' => 'Where do you want...' } ] };

Why?, what can i do to retrieve more data?

Thank in advance

Replies are listed 'Best First'.
Re: How template::extract works?
by Corion (Patriarch) on Sep 17, 2007 at 08:59 UTC

    You will need to look at the code.

    The documentation says You may set $Template::Extract::DEBUG to a true value to display generated regular expressions., so maybe that helps you, but otherwise I guess you will have to look at the Template::Extract::Run code why it seems to parse a list item only once.

    I like using XPath expressions or CSS selectors to extract data, currently by using Web::Scraper, but that module is still in its infancy so you might want to stay with Template::Extract. Also, due to that nature, Web::Scraper cannot conveniently split up a single chunk of text into several parts.

      Yes, thk, in fact i'm use debug true and the regexp is correct, but, i'm read the run package and don't understand very well, becouse unfortunately it is not commented and too much OO

      Becouse of that, i'm post here, with the hope that another monks finds had a similar problem

      Thank you