in reply to Extracting the data from a PDF
I did however find pdf2html. It will be a breeze to run it and then extract the data from the html that it produces.
<whine>But! I don't want to...<\whine>
Why not? Assuming you have the HTML in a single scalar:
use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( \$html ); while ( my $token = $p->get_token ) { next unless $token->is_text; print $token->return_text; }
There, that wasn't so hard, was it? (note that that example was pretty much cut-n-pasted directly from the POD)
Oh, and you have the slash backwards on that final whine :)
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
|
|---|