in reply to Re^2: parse pdf
in thread parse pdf
i'm hoping to go directly from getting the pdfs off the web to a sql file (which is why i really wanted something that might do dom from the structure of the file) so, predone pdf utilities aren't especially useful to me here.
however, i seemed to have missed cam::pdf when searching for pdf modules which might be able to get me images of stuff that i can't format (i haven't looked, but i'm assuming that the math is in another type face - that, or i just weed it out with a regex) and then i should be able to get the object or line and have it output an image. (even though i said i didn't care, mathml would've been nice ;) )
|
|---|