in reply to Re: Extracting data from a PDF to a spreadsheet
in thread Extracting data from a PDF to a spreadsheet

Thanks for the guidance. I do have some code, but I didn't figure it would help to see it if I didn't provide the file it's trying to extract data from. But I'll place it here anyway. <\p>

while(<STDIN>) { @section = split /Class: Invoice/, $_; @AdminData = split /\n/, $section[0]; @BodyTemp = split /Administrative Data:/, $_; @Body = split /Reply: click here/, $BodyTemp[0]; @Splitterhold = split/Payment Detail - Payment ID /, $_; foreach $Splitterhold(@Splitterhold) { $Splitterhold =~ s/InvoiceDate /Invoice Dateĉ /g; $Splitterhold =~ s/Customer ID /CustomerIDĉ /g; $Splitterhold =~ s/^Phone /Phoneĉ /g; $Splitterhold =~ s/Txn Type Post Day Amount \(USD\)\n/InvoiceDateĉ + /g; $Splitterhold =~ s/Card Type Card Number Exp Date BIN\n/CreditCard +ĉ /g; $Splitterhold =~ s/Name /Nameĉ /g; $Splitterhold =~ s/Address Line 1 /Addressĉ /g; $Splitterhold =~ s/City /Cityĉ /g; $Splitterhold =~ s/State /Stateĉ /g; $Splitterhold =~ s/Email Address /EmailAddressĉ /g; $Splitterhold =~ s/Home phone number /Homephonenumberĉ /g; $Splitterhold =~ s/Last modified on /Lastmodifiedonĉ /g; } #@sector = split /Payment Detail -/, $section[1], /administration +>/; if ($#Splitterhold > 0) { for ($x = 0; $x < $#Splitterhold; $x++) { @Split = split/\n/, $Splitterhold[$x]; @parse = split /ĉ/, @Split; if ($#parse > 0) { $parse[0] =~ s/\W//g; $parse[1] =~ s/\-//g; @AO{$parse[0]} = $parse[1]; } if ($#parsezero > 0) { $parsezero[1]=~ s/\-//g; $IV{$parsezero[0]} = $parsezero[1]; @IVone = push (@IV, @IV); print $IV; } } } $Body[1] =~ s/^one$/1/gi; $Body[1] =~ s/^two$/2/gi; $Body[1] =~ s/^three$/3/gi; $Body[1] =~ s/^four$/4/gi; $Body[1] =~ s/^five$/5/gi; $Body[1] =~ s/^six$/6/gi; $Body[1] =~ s/^seven$/7/gi; $Body[1] =~ s/^eight$/8/gi; $Body[1] =~ s/^nine$/9/gi; $Body[1] =~ s/^zero$/0/gi; $Body[1] =~ s/0ne/1/gi; @PostingBody = split/\n/, $Body[1]; for ($x = 0; $x < $#PostingBody; $x++) { $PostingBody[$x] =~ s/\s//gi; $PostingBody[$x] =~ s/\W//gi; $MC = NULL; if ($PostingBody[$x] =~ m/\d{3}.*\d{3}.*\d{4}/) { $PostingBody[$x] =~ s/\D//gi; $PostingBody[$x] =~ s/\W//g; $MC{'Digits'} = $PostingBody[$x]; } } @elements=('Digits'); for($x=0; $x< @elements; $x++) { print ($MC{$elements[$x]}."\t\t"); $MC = ""; } @elements=("PostID","Location","posted","Reply","Postersage","Part +ner", "AdType","PaidAd","AdPrice","Whitelisted","Name","Phone","Email"," +UserCreated","Settings", "Referrer","IP","AdCreated"); for($x=0; $x< @elements; $x++) { print(@AO{$elements[$x]}."\t"); $AO = ""; } @elements=("Lastmodifiedon", "InvoiceDate", "CreditCard", "Name", +"Address", "City", "State", "EmailAddress", "Homephonenumber", "Custo +merID"); for($x=0; $x< @elements; $x++) { print (@IVone{$elements[$x]}."\t"); $IV = ""; } { print "\n"; } }

Replies are listed 'Best First'.
Re^3: Extracting data from a PDF to a spreadsheet
by runrig (Abbot) on Jun 22, 2011 at 21:27 UTC
    You can supply a bit of data for posting in a self contained example by using the DATA handle, e.g.:
    while (<DATA>) { print "Got: $_"; } __END__ one two three
    Try to post the minimum amount of code and data that demonstrates the problem you're having (and fix your closing code tag).