in reply to Extracting the data from a PDF

I did however find pdf2html. It will be a breeze to run it and then extract the data from the html that it produces.

<whine>But! I don't want to...<\whine>

Why not? Assuming you have the HTML in a single scalar:

use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( \$html ); while ( my $token = $p->get_token ) { next unless $token->is_text; print $token->return_text; }

There, that wasn't so hard, was it? (note that that example was pretty much cut-n-pasted directly from the POD)

Oh, and you have the slash backwards on that final whine :)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.