Re: Extracting the data from a PDF

I did however find pdf2html. It will be a breeze to run it and then extract the data from the html that it produces.

<whine>But! I don't want to...<\whine>

Why not? Assuming you have the HTML in a single scalar:

use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( \$html );

while ( my $token = $p->get_token ) {
    next unless $token->is_text;
    print $token->return_text;
}
[download]

There, that wasn't so hard, was it? (note that that example was pretty much cut-n-pasted directly from the POD)

Oh, and you have the slash backwards on that final whine :)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Re: Extracting the data from a PDF Download Code