in reply to Output Format

If you are able to do so, I would suggest reverting back to your HTML version of this text and using its clues to help you format the text-only version. Minimally, you would like to have paragraph information, as I expect you are not wishing to lump the entire body of text into a single, unbroken paragraph. If that is not possible, you may wish to insert manually some paragraph markers. Paragraphs could be "created" based on a set number of sentences per paragraph, or a certain number of lines as the maximum per paragraph, but this would mean awkward paragraph breaks, as a computer and a human will not read it the same.

My next suggestion would be to use some LaTeX to format the text. It could be output to PDF. If you start with the HTML version, there may even be a tool (search for "HTML to LaTeX" online) that could do this without needing reinvent wheels in Perl. But even from the text alone, LaTeX could do wonders with formatting, and it can be set for virtually any paper size, with custom margins, font sizes, etc.

Blessings,

~Polyglot~

Replies are listed 'Best First'.
Re^2: Output Format
by Fletch (Bishop) on Dec 12, 2022 at 14:43 UTC

    Seconding this; throwing away all the markup then expecting to be able to reproduce layout (which normally would be derived from said markup) is (to put it bluntly) inane. All you could hope for from the undistinguished one long line mess is to throw it through something like Text::Wrap and hope. If you had the HTML still you could try something like pandoc and see what it can come up with.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thank you for your suggestions!

Re^2: Output Format
by perlmike (Novice) on Dec 12, 2022 at 22:36 UTC

    You are right. I will go back to the HTML files and go from there. Thank you for the input!