in reply to RE: Reading PDF Files?
in thread Reading PDF Files?

I disagree for a number of reasons. First, I think the reason most people release files as PDF documents is because they have a need for precise formatting which HTML doesn't give. This includes much better print control. Second, PDF doesn't protect text at all; you can select text in even Adobe's PDF viewers for copying. The only viewers out there really designed for such protection are e-book readers. Lastly, I don't think it's the domain of Monks to judge someone's intentions with a project. I'd say that if you don't feel comfortable giving advice to someone, just don't give it. I especially think it's inappropriate to come down condemning someone without any knowledge of how the project will be used. I'd be inclined to think the OP intends to write an engine for searching through PDFs on an intranet, given the insanity of indexing anything more (in Perl, no less). All the above just MHO.

Replies are listed 'Best First'.
RE: RE: RE: Reading PDF Files?
by neshura (Chaplain) on Jun 07, 2000 at 21:54 UTC
    You are probably close to the mark. I don't know if either of us can generalize as to why people release their work in PDF (I should have said, "In my experience/opinion"). I have found that if you copy and paste PDF text, it drops a given letter from every word, so reconstructing the text is awfully time-consuming. I do not know if this behavior is universal or an optional behavior set at the time of creation.
    I -did- mean to imply that the seeker was trying to crack PDFs; I've done the same thing, for (what I believed to be) legitimate reasons -- namely laziness :) not wanting to type in all that damn text. But I suppose you are correct in that without knowing the legality or motive behind the poster's code, it's wrong to wimp out on answering questions.

    (You know the old saw about ass-u-me)

    e-mail neshura