RFC: Shortening line length in HTML Emails

My team is using TinyMCE in a web-application to create templates for HTML emails. (not my idea)

We've been confronted with strange errors where whitespaces occasionally where introduced in the middle of the emails after sending.

This was particularly ugly b/c sometimes HTML tags where broken, like in </sp an>

A closer investigation revealed that by RFC lines in Emails are not allowed to have more than 1000 characters (only fair) and that TinyMCE sometimes tended to glue HTML code into one "physical" line, especially when

"visual" lines where separated by <br> tags
or when the text was introduced by cut&paste from other applications.

So I need a pragmatic solution to avoid such "monster" lines after editing an email text.

I came up with the following idea, which should be as safe as possible without starting to parse HTML

prepend a \n before every <br> tag
if overlong unbroken text-chunks remain, replace the last blank with a \n
return an error to the user if the later fails

The idea is to change a minimal amount of HTML code in a transparent way.

(I suppose that <pre> -tags are not used with monster lines and that the inner code of HTML and CSS doesn't distinguish if a whitespace is a blank or a line-break)

That's the code I came up with, comments are welcome! :)

use strict;
use warnings;
use Data::Dump qw/pp dd/;


my $body = <<'__HTML__';

<br /><br/><br><break>
asdfghjk rtz ertzuiop rtzuiopu rtzuiop tzuiopu rtghljh
AaaaaaaaaaaaaaABbbbbbbbbbbbbbbB
__HTML__


#pp $body;

my $err = FC012_shorten_lines_mail_body(\$body);

#pp $body;

print $err,$body;



sub FC012_shorten_lines_mail_body {
   my ($body_ref) = @_;


   my $err = undef;
   # callback with closure for error
   my $replace_last_whitespace = sub {
      my ($chunk) = @_;
      # dd "CHUNK: $chunk";
      my $ok = $chunk =~ s/ ([^\s]*)$/\n$1/;

      unless ($ok) {
         my $snip_length = 4;           #  for testing, should be 40
         my $start_chunk = substr ($chunk,0,$snip_length);
         my $end_chunk   = substr ($chunk,-$snip_length,$snip_length);
         $err .= "Failed to shorten chunk >>$start_chunk...$end_chunk<
+<\n";
      }
      return $chunk;
   };

   # --- prepend all <br>-tags with real linebreak
   $$body_ref =~ s#(<br[ />])#\n$1#g; 

   # --- find all reamining chunks in one line and
   #     replace last whitespace with \n

   my $length = 15;                     # for testing, should be 998

   $$body_ref =~ s/([^\n]{$length})/ $replace_last_whitespace->($1) /g
+e;

   # --- return potential error message
   return $err ;
}
[download]

--->

Failed to shorten chunk >>Aaaa...aaaA<<
Failed to shorten chunk >>Bbbb...bbbb<<


<br />
<br/>
<br><break>
asdfghjk rtz
ertzuiop
rtzuiopu rtzuiop
tzuiopu rtghljh
AaaaaaaaaaaaaaABbbbbbbbbbbbbbbB
[download]

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

Comment on RFC: Shortening line length in HTML Emails Select or Download Code

Replies are listed 'Best First'.
Re: RFC: Shortening line length in HTML Emails by afoken (Chancellor) on Oct 27, 2016 at 18:44 UTC
Brute force attempt: Encode the HTML as-is in base64. Append that base64 blob to the mail. Indicate in a mail header that the message is base64-encoded. (`Content-Transfer-Encoding: base64` should be sufficient.) Base64 has nice short lines. Any mail client that can handle HTML should be able to handle base64-encoded HTML. You are already sending bloat. So that 33% overhead of base64 won't make it much worse. Other option: Encode as quoted-printable (`Content-Transfer-Encoding: quoted-printable`), break hard after N characters (typically N=75), and append a "=" at the break (soft break). See also Quoted-printable. I think there are modules at CPAN that en- and decode QP. QP usually does not add that much overhead as Base64, but it requires a little bit more "thinking" than stupidly shifting bits from and to base64. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^2: RFC: Shortening line length in HTML Emails by LanX (Saint) on Oct 27, 2016 at 19:28 UTC
> Encode the HTML as-is in base64. Uh... That's actually a very good idea. Thanks. I'll stick for a while with my "solution" though because I'd like to have more control over the generated code. Many colleagues are non techies and think if cut and paste doesn't produce reliable results, it's the fault of the programmer. Telling them repeatedly that HTML emails are not meant to work like PDF doesn't help at all. so I have to train the software to annoy them... ;) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re: RFC: Shortening line length in HTML Emails by Anonymous Monk on Nov 07, 2016 at 22:29 UTC
Here's what I use: `use Text::Format; my $text = 'some text string of words, possibly having more than 990 c +haracters ...'; my $text_obj = Text::Format->new( { columns => 990, firstIndent => 0, bodyIndent => 0, } ); $text = $text_obj->format($text);` [download] I've read that Sendmail's default is 990 rather than 1000; I think I ran into the 990 limitation when sending email through my local Postfix installation (which replaces Sendmail), and hence picked 990 as the "columns" parameter to send to Text::Format. Text::Format splits the string into words, meaning it splits on white space (\s+). A seven-letter word that starts at column 986 after a blank at column 985 wouldn't be printed as 5 characters ending at column 990 with the remaining two characters starting on column 1 of the next line; it would move the word to be the start of the next line. The previous line would end with the word whose last character is in column 984 (assuming for purposes of this example that a single blank space existed between that word and the word that's moved down to start the next line).	[reply] [d/l]