eric256 has asked for the wisdom of the Perl Monks concerning the following question:

Okay I'm just starting to research this problem to see if it is feasible to do with the equipment I have in place, so please bare with me.

Problem: We have documents that are in paper form. These come in all sorts of ways, faxes, mail, whatever. In the end these paper documents get sorted by which office they should be sent to, and then they are grouped together at the end of the day and sent to the correct office. Right now we use a ScanToEmail machine and email it to each office. In the past we have faxed it. Both those solutions have limitations (fax machines don't always work, and emails can get quite large and cumbersome to download)

Proposed Solution: I am considering haveing them enter each batch into a web form, which would then enter the batch into a database with info about where it should go and any comments, and assign it a number. Then I thought I could print a cover page containing the same info and a bar code. The pages would then all be grouped together behind the cover sheet. One cover sheet per batch. One batch per location to be sent to. Then all the batches could be grouped together and emailed to a special email address (via the ScanToEmail machine.) I would have a program that monitors that email, downloads the attatchment and scans if for barcodes. It would then split the document apart by page, and assign (or send) them to the correct locations.

The Good:
  1. Minimal amount of work to send each document. Simply filling in a couple of coverpage forms.
  2. Allows end users to access individual files in the batch, even though they are sent as a whole. (send a big file, user recieves a set of small ones)
The Bad:
  1. I don't know anything about scanning PDF files for bar codes

I'm sure there are more good and bad points. I'm just trying to figure out now, is this feasible? Is it more work than its worth? Has anyone tried something like this? Any help, ideas, questions about my flawed logic,or whatever are appreciated.

Thanks in advance for your time and help!

p.s. i would be doing this in perl, hence the asking it here.

___________
Eric Hodges

Replies are listed 'Best First'.
Re: Digital Document Capture
by sgifford (Prior) on Aug 13, 2003 at 16:25 UTC

    It would probably be easier to encode the information in some way other than a bar code, such as a comment near the top or bottom of the PDF file.

    Bar codes are a useful way to get digital information out of a physical document, but they're generally not the best way to transmit digital information between two computers.

      The PDF is created automaticaly by the scanner. I never have a chance to encode it with anything. :-(

      The bar code would be printed as a seperate process. I was also thinking of just haveing one barcode per location. Then there would be just a standard cover sheet they could use to send the documents.

      Update: I can get the document scanned as tiff files, one page per file. I don't know if maybe tiff is easier to scan.

      ___________
      Eric Hodges

        The first thing I would ask myself if I was in your shoes would be why I was introducing an extra PC, some code, and an extra step for the sorting room people, when they system they have at the moment probably works OK as it is.....

        Assuming that you have a good answer to this, then you may be interested to know that PDF is derived from PostScript. Postscript is a fully fledged programming language - which can be written in text format. It also supports comments :)

        To see what I mean, paste the following output into a file called "test.ps", and open it in your favourite PostScript viewer.

        %!PS-Adobe-3.0 EPSF-1.2 %%Title: (test1.ps) %%LanguageLevel: 1 %%Creator: DTR %%CreationDate: Sun Dec 8 18:25:51 2002 %%For: Console %%DocumentMedia: A4 595.27559 841.88976 0 ( ) ( ) %%Orientation: Portrait %%Pages: 1 %%BoundingBox: 0 0 595 841 %%EndComments %%BeginProlog %%BeginResource: PostScript::Simple /u {} def /STARTDIFFENC { mark } bind def /ENDDIFFENC { % /NewEnc BaseEnc STARTDIFFENC number or glyphname ... ENDDIFFENC - counttomark 2 add -1 roll 256 array copy /TempEncode exch def % pointer for sequential encodings /EncodePointer 0 def { % Get the bottom object counttomark -1 roll % Is it a mark? dup type dup /marktype eq { % End of encoding pop pop exit } { /nametype eq { % Insert the name at EncodePointer % and increment the pointer. TempEncode EncodePointer 3 -1 roll put /EncodePointer EncodePointer 1 add def } { % Set the EncodePointer to the number /EncodePointer exch def } ifelse } ifelse } loop TempEncode def } bind def % Define ISO Latin1 encoding if it doesnt exist /ISOLatin1Encoding where { % (ISOLatin1 exists!) = pop } { (ISOLatin1 does not exist, creating...) = /ISOLatin1Encoding StandardEncoding STARTDIFFENC 144 /dotlessi /grave /acute /circumflex /tilde /macron /breve /dotaccent /dieresis /.notdef /ring /cedilla /.notdef /hungarumlaut /ogonek /caron /space /exclamdown /cent /sterling /currency /yen /brokenbar /section /dieresis /copyright /ordfeminine /guillemotleft /logicalnot /hyphen /registered /macron /degree /plusminus /twosuperior /threesuperior /acute /mu /paragraph /periodcentered /cedilla /onesuperior /ordmasculine /guillemotright /onequarter /onehalf /threequarters /questiondown /Agrave /Aacute /Acircumflex /Atilde /Adieresis /Aring /AE /Ccedilla /Egrave /Eacute /Ecircumflex /Edieresis /Igrave /Iacute /Icircumflex /Idieresis /Eth /Ntilde /Ograve /Oacute /Ocircumflex /Otilde /Odieresis /multiply /Oslash /Ugrave /Uacute /Ucircumflex /Udieresis /Yacute /Thorn /germandbls /agrave /aacute /acircumflex /atilde /adieresis /aring /ae /ccedilla /egrave /eacute /ecircumflex /edieresis /igrave /iacute /icircumflex /idieresis /eth /ntilde /ograve /oacute /ocircumflex /otilde /odieresis /divide /oslash /ugrave /uacute /ucircumflex /udieresis /yacute /thorn /ydieresis ENDDIFFENC } ifelse % Name: Re-encode Font % Description: Creates a new font using the named encoding. /REENCODEFONT { % /Newfont NewEncoding /Oldfont findfont dup length 4 add dict begin { % forall 1 index /FID ne 2 index /UniqueID ne and 2 index /XUID ne and { def } { pop pop } ifelse } forall /Encoding exch def % defs for DPS /BitmapWidths false def /ExactSize 0 def /InBetweenSize 0 def /TransformedChar 0 def currentdict end definefont pop } bind def % Reencode the std fonts: /Courier-iso ISOLatin1Encoding /Courier REENCODEFONT /Courier-Bold-iso ISOLatin1Encoding /Courier-Bold REENCODEFONT /Courier-BoldOblique-iso ISOLatin1Encoding /Courier-BoldOblique REENCO +DEFONT /Courier-Oblique-iso ISOLatin1Encoding /Courier-Oblique REENCODEFONT /Helvetica-iso ISOLatin1Encoding /Helvetica REENCODEFONT /Helvetica-Bold-iso ISOLatin1Encoding /Helvetica-Bold REENCODEFONT /Helvetica-BoldOblique-iso ISOLatin1Encoding /Helvetica-BoldOblique RE +ENCODEFONT /Helvetica-Oblique-iso ISOLatin1Encoding /Helvetica-Oblique REENCODEFO +NT /Times-Roman-iso ISOLatin1Encoding /Times-Roman REENCODEFONT /Times-Bold-iso ISOLatin1Encoding /Times-Bold REENCODEFONT /Times-BoldItalic-iso ISOLatin1Encoding /Times-BoldItalic REENCODEFONT /Times-Italic-iso ISOLatin1Encoding /Times-Italic REENCODEFONT /Symbol-iso ISOLatin1Encoding /Symbol REENCODEFONT /box { newpath 3 copy pop exch 4 copy pop pop 8 copy pop pop pop pop exch pop exch 3 copy pop pop exch moveto lineto lineto lineto pop pop pop pop closepath } bind def /circle {newpath 0 360 arc closepath} bind def %%EndResource %%EndProlog % TRY CHANGING SOME OF THESE VALUES TO GET A FEEL % FOR WHAT HAPPENS 0.6 0 0 setrgbcolor %red /Arial findfont 20 scalefont setfont newpath 1 u 450 u moveto (Hello world!) show stroke newpath 1 u 200 u moveto (This \(stuff\) was generated entirely from Perl) show stroke 0 0.8 0 setrgbcolor 50 u 50 u 150 u 150 u box stroke 1 1 0.2 setrgbcolor 100 u 100 u 50 u circle fill 0 0 0.8 setrgbcolor newpath 100 u 100 u moveto (This is a test) dup stringwidth pop 2 div neg 0 rmoveto show % A simple arc newpath 300 300 50 0 90 arc closepath stroke % A arc between 2 lines % giving the appearance of a rounded corner newpath 400 400 moveto 400 410 lineto 400 420 410 420 10 arct 420 420 lineto stroke % An example of a box in a dropout colour newpath 500 500 moveto 0 20 rlineto 20 0 rlineto 0 -20 rlineto closepath 1 1 1 setrgbcolor fill /Arial findfont 20 scalefont setfont 510 500 moveto 0.6 0.6 0.4 setrgbcolor 0 0.8 0 setrgbcolor 0 u 180 u 20 u 160 u box stroke 22 u 180 u 42 u 160 u box stroke 44 u 180 u 64 u 160 u box stroke /Arial findfont 18 scalefont setfont 0 0 0.8 setrgbcolor newpath 10 u 163 u moveto (B) dup stringwidth pop 2 div neg 0 rmoveto show newpath 32 u 163 u moveto (O) dup stringwidth pop 2 div neg 0 rmoveto show newpath 54 u 163 u moveto (X) dup stringwidth pop 2 div neg 0 rmoveto show /Arial findfont 12 scalefont setfont 0.8 0 0 setrgbcolor newpath 10 u 350 u moveto (My address is:) show stroke newpath 10 u 330 u moveto (HELLO) show stroke %%EOF

        NOTE - where some of the "REENCODEFONT" words have been wrapped above with a "+" sign, you need to remove the "+" sign and put them onto one line for this to work

        Credit is due to the PostScript::Simple module on CPAN for getting me started with this. Also - disclaimer - I know just enough PostScript to draw the circles, boxes, and text I drew on that page - no more.

        Anyway, that should be enough to get you started :). You should be able to insert a few harmless comments of your own at the top of a PostScript file (you may also be able to do it with PDF), and use these to keep track of where the document should go.

Re: Digital Document Capture
by benn (Vicar) on Aug 13, 2003 at 17:50 UTC
    A search for 'Barcode' on CPAN reveals plenty of modules for *creating* barcodes, but virtually nothing for reading them in again...it presumably is possible, but a lot would depend upon how your scanner creates them - either as a series of vectors or as a bitmap, I would presume. If the latter, you might be able to read in the image with GD and do some pixel-munching, but it sounds like a task-and-a-half.

    I don't know anything about your ScanToEmail machine, but would there be any way of spitting out the actual barcode number as part of the scanning process? Say, put the number into the Subject header or something?

    Cheers,
    Ben.

Re: Digital Document Capture
by waswas-fng (Curate) on Aug 13, 2003 at 18:30 UTC
    Welp no need for barcodes, use one of the pdf modules to create your cover page with routing info included and insert it as the front page. on the other side use a pdf module to get that information out. Bar codes are great as physical indexers because most of the time the barcode scanner reads a short index numnber or secuence that allows you to pull up related electronic data. They are the wrong tool for a purely digital transaction.

    -Waswas
Re: Digital Document Capture
by derby (Abbot) on Aug 13, 2003 at 20:01 UTC
    I'm confused (nothing new). It appears there is no need for barcoding at all. Instead of a "special email address", you could have "special email addresses" - one for each office. Then you would "scantoemail" distinct packages so that each individual email received is then also a distinct deliverable. A generic script (procmail filter) would take the incoming email, file it away in the filesystem, based on recipient (toplevel directory) and some unique number (or datetime as a subdir - with the actual contents of the directory in the subdir), update the db with the actual filesystem and other meta-data (maybe the scantoemail'ers can put a subject line on the email). Seems pretty basic and rote to me (but boy I'd hate to be the operator).

    -derby

Re: Digital Document Capture
by eric256 (Parson) on Aug 13, 2003 at 21:14 UTC

    Thanks for all the replies but i appear to have confused you all somewhere.

    The documents all start in paper form. Not digital. The cover sheet would also need to be in paper form not digital so that they could join it with which ever set of documents they want to transfer.

    So I start with all paper, and scan it to email (using an HP 4100mfp). There is no computer attatched to the MFP to attatch any form of digital document.

    I've been searching and have found some Barcode recognition software that i may try, since barcodes still seem to be the only safe way to put the info on need on a peice of paper and have it read by the scanner. Currently they do send each batch to its own email address, but that has its own limitations and downfalls.

    Thanks again!

    ___________
    Eric Hodges
      To not sure if this would work for you, and I may be way off base but how about this.

      Your current system is, sort->scan->send. And you are trying t simplify the send step. This appears to be the simplest step. Now if if you really want to get crazy how about this.

      scan->PDF->Bayesian filters->send

      Here as the paper docs arrive they are immediately scanned as PDF. Then using one of the PDF modules you pull the text out and send it through a series of Bayesian filters to determine where they should go. The PDF file is then attached into an email or loaded into a db based on the characteristics in the file. (The text version is only used for the Bayesian filter, and is discarded as soon as you deal with the PDF.)

      This reduces the workload of the sorters to just scanning the files.

        Would be great. Unfortunatly the sort part of the process includes more than just sorting. Often there are human actions (phone calls, scheduling, etc) that need to take place, and most of the documents are hardly legible (hand writting, doodled on, faxed many times, etc) to humans, let alone computers :-(

        ___________
        Eric Hodges