in reply to Re: Extract Paragraph From Text
in thread Extract Paragraph From Text

Hi sundialsvc4, Thanks for your reply! I will look into that and check what paragraph separators the text uses. Maybe it is that in the text the paragraph separator is not a blank line, so I got the unexpected output. I am not sure about this...

Replies are listed 'Best First'.
Re^3: Extract Paragraph From Text
by locked_user sundialsvc4 (Abbot) on Sep 09, 2015 at 22:11 UTC

    What I would expect is that text such as this might not contain any “end-of-line” character sequences at all.   Instead, the rendering engine would pour the text into the graphic container, line-by-line according to the size of the container and the selected font/font-size ... both of which presumably could change.   The only trustworthy “end-of-something” marker would be “end of paragraph,” but what might that be?   Who knows.

    In this situation, I would suggest two specific things:

    1. Get the information directly from the original source file, and do it in binary mode.   (In other words, don’t tell Perl to expect record-separators of any sort.   All you want Perl to do, is to read exactly the bytes that are there, exactly as they are.   And, you really need to read the entire file at once ... slurp!)
    2. Before writing the code to do that, look at the original source file with the hex-editor as previously discussed, to see what is actually there and what might reasonably be relied-upon.
    Don’t attempt to copy-and-paste into Perl source code:   you have no idea what your text-editor might actually do.   (And anything it might do, would only muddy the waters further.)

    Perl is an extremely powerful data-extraction tool that can most certainly do whatever-it-is that you determine needs to be done.   So, please follow-up in this thread and tell us what you’ve found.   We’ll be happy to then help you further.