baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am having some issues with extracting text information from *.eps immage. Up until now i was just greping the show lines (example:
/ArialMT-ISOLatin1 findfont 32 scalefont setfont 0 0 0 setrgbcolor newpath 0 0 moveto (King James) show grestore grestore grestore 0 0 0 setrgbcolor [] 0 setdash 5 setlinewidth 0 setlinejoin 1 setlinecap newpath -1013.087 5437.645 moveto -574.44269 5148.3467 lineto stroke 0 0 0 setrgbcolor [] 0 setdash 5 setlinewidth 0 setlinejoin 1 setlinecap newpath -801.10602 5042.689 moveto -683.66547 4973.3872 lineto stroke 0 0 0 setrgbcolor [] 0 setdash 5 setlinewidth 0 setlinejoin 0 setlinecap newpath -764.50114 5103.5574 moveto -789.24211 5063.1816 -813.3093 5022.3968 -836.69272 4981.22 curvet +o stroke gsave [0.8480481 -0.52991926 0.52991926 0.8480481 -3204.0386 27.01 +0243] concat gsave [1 0 0 -1 -1554.9214 5600.4102] concat gsave /ArialMT-ISOLatin1 findfont 32 scalefont setfont 0 0 0 setrgbcolor newpath 0 0 moveto (M. L. King) show ...
) to get the information, but then I relized taht the names are not in the same order as in the figure. in the figure M.L.King is before King James. So, is there some proffessional tool (perl module) made for such manipulations?

thnx

b

Replies are listed 'Best First'.
Re: eps file perser needed
by thargas (Deacon) on Jun 16, 2014 at 17:24 UTC

    First, you don't say what you want to accomplish, so anything anyone answers is, at best, a guess.

    Assuming that you want to get all the text from a postscript file, probably your best bet is something like ps2ascii. It's not perfect, but nothing automated can be.

    A postscript "image" is really a program which creates an image when run by something which understands postscript. As a program, there are many different ways of producing the same output. In other words, extracting all the text, in the order it appears in the printed output, is, in the general case, a guess.

Re: eps file perser needed
by GotToBTru (Prior) on Jun 16, 2014 at 18:40 UTC

    A general search seems to suggest converting to PDF and extracting the text from there.

    1 Peter 4:10