It's possible that my code isn't doing what I think that it is, but I don't think I am opening the input file twice.
If I'm correct in my thinking (based on my interpretation of the modules documentation) what it does is open an input pdf and check to see how many pages it has. Then it looks to see if the output file that is currently open (if there is one) is the one that this pdf needs to be added to. If yes then it appends the input to the output and moves to the next input. If not it writes the output that is currently open (clearing it from memory and clearing the stream of data that was being built) and then opens the correct output file and appends the input to output and moves to the next input.
If the input files belong to the same output file in succession then the output file stays open having data added to its stream (term might not be right) until such time that an input file that goes elsewhere is opened at which time it will be written. So I should only be opening the input files once, but assuming every other file was supposed to go in a different output file then I would be doing a lot of opening closing of those files, though realistically large swaths of the data belong in the 1 and 2 page file with the others being more sporadic.
I am going to test the hash method you mentioned regardless but I hope that makes my intentions and code a bit clearer.
In reply to Re^4: Am I on the right track?
by Pharazon
in thread Am I on the right track?
by Pharazon
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |