On the problem as stated,I would tend to prefer substr rather than a regex. But of course you still have to look for an html boundary.

However, since you are reading many small html files into your string, I would think that it should be possible to read the files in a while loop and interrupt the process when you reach the limit to to the PDF conversion, and then to proceed with the reading of the files. Something like this (this is untested pseudo-code, not an actual solution, we don't have enough information for a real solution):

use File::Slurp; # (...) my $current_string = ""; my $current_size = 0; for my $file (@html_files) { my $new_file_string = read_file($file); # File::Slurp function my $len = length $new_file_string; if ($current_size + $len > $size_limit) { convert_to_pdf($current_string); $current_string = $new_file_string; $current_size = $len; } else { $current_string .= $new_file_string; $current_size += $len; } } convert_to_pdf($current_string) if $current_string;
Again, this is just untested pseudo-code to illustrate the idea, not an actual solution. There are a few edge-cases to consider: for example, this is likely to fail if a single html file is larger than $size_limit (I understand from what you said that this should not be the case, but you might have to accept in such a case that the resulting PDF file will be larger than your limit, or maybe raise an exception, whatever is best suited to your actual situation).

Je suis Charlie.

In reply to Re: Split very big string in half by Laurent_R
in thread Split very big string in half by fpscolin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.