Hi
I used to use the following to get the xml structure of a PDF file
my $xml = qx/mudraw -ttt $file/; my $tree = XMLin($xml, ForceArray => [qw/page block line span char/]); #going through the structure for my $page (@{$tree->{page}}) { for my $block (@{$page->{block}}) { for my $line (@{$block->{line}}) { for my $span (@{$line->{span}}) { my $string = join '', map {$_->{c}} @{$span->{char}}; #... do my stuff } } } $page_index ++ }
I need to drop mudrow now! Is there an alternative to get $xml? At best a pure Perl solution? I could find anything searching the web :(
In reply to PDF alternative to mudrow to get XML structure by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |