If you are able to do so, I would suggest reverting back to your HTML version of this text and using its clues to help you format the text-only version. Minimally, you would like to have paragraph information, as I expect you are not wishing to lump the entire body of text into a single, unbroken paragraph. If that is not possible, you may wish to insert manually some paragraph markers. Paragraphs could be "created" based on a set number of sentences per paragraph, or a certain number of lines as the maximum per paragraph, but this would mean awkward paragraph breaks, as a computer and a human will not read it the same.
My next suggestion would be to use some
LaTeX to format the text. It could be output to PDF. If you start with the HTML version, there may even be a tool (search for "HTML to LaTeX" online) that could do this without needing reinvent wheels in Perl. But even from the text alone, LaTeX could do wonders with formatting, and it can be set for virtually any paper size, with custom margins, font sizes, etc.