Re: Split very big string in half

On the problem as stated,I would tend to prefer substr rather than a regex. But of course you still have to look for an html boundary.

However, since you are reading many small html files into your string, I would think that it should be possible to read the files in a while loop and interrupt the process when you reach the limit to to the PDF conversion, and then to proceed with the reading of the files. Something like this (this is untested pseudo-code, not an actual solution, we don't have enough information for a real solution):

use File::Slurp;
# (...)
my $current_string = "";
my $current_size = 0;
for my $file (@html_files) {
    my $new_file_string = read_file($file); # File::Slurp function
    my $len = length $new_file_string;
    if ($current_size + $len > $size_limit) {
        convert_to_pdf($current_string);
        $current_string = $new_file_string;
        $current_size = $len;
    } else {
        $current_string .= $new_file_string;
        $current_size += $len;
    }
}
convert_to_pdf($current_string) if $current_string;
[download]

Again, this is just untested pseudo-code to illustrate the idea, not an actual solution. There are a few edge-cases to consider: for example, this is likely to fail if a single html file is larger than $size_limit (I understand from what you said that this should not be the case, but you might have to accept in such a case that the resulting PDF file will be larger than your limit, or maybe raise an exception, whatever is best suited to your actual situation).

Je suis Charlie.

Comment on Re: Split very big string in half Download Code