ndnalibi has asked for the wisdom of the Perl Monks concerning the following question:
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: PDF::API2 processing time
by jethro (Monsignor) on Oct 21, 2008 at 19:15 UTC | |
You might use profiling to really check where the time is spent. Maybe you do something very inefficient. See Profiling your code. You might post the code of what you are doing. How can we improve code we don't know? There is also a mailing list for PDF::API2 users (check the wiki on the sourceforge page of the project). If PDF::API2 is the culprit they might have the inside knowledge you need. | [reply] |
Re: PDF::API2 processing time
by SilasTheMonk (Chaplain) on Oct 21, 2008 at 19:13 UTC | |
| [reply] |
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 21, 2008 at 20:00 UTC | |
| [reply] [d/l] |
by jethro (Monsignor) on Oct 22, 2008 at 02:19 UTC | |
Is this your normal indentation? Looks horrible. Check out perltidy, a script than can repair this. I would also suggest using
I see you have practically just one really big loop where you process each record. It would have been easy to do some really basic profiling by just putting print "Doing xyz now\n" every few lines and observing whether that minute for each record is wasted in one place or distributed evenly. Please read How do I post a question effectively?. You could have done the profiling before posting the code and then would have known which part of the code is important to post. You could have provided sample data (a few records) so that other monks could run your code. I don't have any experience with PDF::API2, but I'm guessing that a lot of the stuff you have in your loop could be moved before the loop. Definitely all the constant strings you create, all the font assignments, probably also the assignments of $pdb, $jpeg, $png, $tag. Just try it and see if it works. The same could be done with the images of the chars. Your code probably can be changed to:
If this works it could be quite a speedup since each char has to be read only once from disk instead of every time it is printed | [reply] [d/l] [select] |
Re: PDF::API2 processing time
by busunsl (Vicar) on Oct 22, 2008 at 08:43 UTC | |
PDF::API2 is a very good module for everything PDF, but it is slow. I went for Inline::C and HARU. It is fast but debugging is a PITA. Next version will be C++. | [reply] |
Re: PDF::API2 processing time
by roboticus (Chancellor) on Oct 22, 2008 at 11:19 UTC | |
[reply] | |
Re: PDF::API2 processing time
by ruzam (Curate) on Oct 30, 2008 at 02:02 UTC | |
I've had some experience with PDF::API2 and formatting images. The PDF::API2 code rips apart the source image, decompressing it pixel by pixel, then scales and reformats it into the appropriate PDF image content using nothing more than pure Perl. It's a slow process, sometimes painfully slow. I thought I was being smart working with large high resolution image sources, to preserve the quality of the final PDF, but that just brought the render process to it's knees. The most effective thing you can do to speed up render is reduce the size/quality of your source images in a graphic editor first. Reduce it to the smallest image size you think you can use without sacrificing the final result. This will make a huge difference. Any images you can generate in advance and include as a pre-generated background (take out of the loop) would also be a good idea. | [reply] |
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 22, 2008 at 15:42 UTC | |
I had to run a 1000 record file which should be done tomorrow (had to go out for Halloween), Then I have a month to work on this before the Winter Holiday project. Sorry for the ugly code - it looks better on my console. I did already think about adding print statements to help profile my code and always do this to debug issues, just never had a program I needed to profile before. I will do this first in the future before posting. I like the idea of pre-loading the images into memory, I agree, that could be a huge savings. Another thing I am thinking of is writing a vdx file instead of writing one large pdf. vdx includes each image only once in a serialized form along with PPML which is an xml-type markup language that references the images rather than includes them as pdf's. I've got 400 pages of documentation to read on this though and only recently began understanding the concepts of serializing images. FYI- this script adds variable images, not just text, which is part of the problem. I'll keep you posted on my findings. | [reply] |
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 29, 2008 at 16:26 UTC | |
That's awesome! | [reply] |
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 29, 2008 at 14:35 UTC | |
I moved the variables outside the loop - my bad - obviously not a very experienced move on my part "caching" the images made an incredble difference. Now I'll work on my indentation... FYI - once the images are fully loaded, it takes about 1/2 second per record versus over a minute! | [reply] |
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Nov 04, 2008 at 15:44 UTC | |
In most cases of standard screen readable pdfs I agree with you, but in my case we need high resolution images for the output. Lower res wouldn't work. However the changes jethro suggested solved all of the problems I was having and saved me from rewriting the whole project using PPML. Apparently by moving the image declarations and holding the variable images in an array, PDF::API2 builds the PDF with reused image links instead of writing copies of the images over and over. In my case 1000 records will be an average, so the time and size savings were huge. | [reply] |