karlgoethebier is right. Your concept is broken.

You can't concat several HTML documents to make one big document. Of course your computer let's you do exactly that, but while the result may be rendered by some generous browser, it is really junk.

Split the big file into the original documents, and pass each document separately to the PDF converter. Splitting should not be that hard, assuming the original documents are reasonably clean:

  1. Open the big file for reading
  2. open an output file
  3. read a line from the big file
  4. if the line contains something that looks like the start of a HTML document ("<!DOCTYPE", "<HTML", "<?xml"), write everything up to the match to the current output file, then close it, create a new file, write the match and everything following it to the new file.
  5. else, write the line to the current output file
  6. repeat from step 3 until eof
  7. close input and output files

You may need to add some special cases:

A simple trick is to assume that the <?xml and <!DOCTYPE declarations are relatively short, but a complete HTML document needs much more data, at least 500 characters (or something like that). So if tell OUTPUT returns a non-negative number less than 500 when matching a signature, don't create a new output file, but continue to write to the old output file. This also avoids an empty first file.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re^3: Split very big string in half by afoken
in thread Split very big string in half by fpscolin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.