Ionic has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to write a long link in POD syntax that needs (or at least is supposed) to be broken among multiple lines. Doing so, I noticed that perlpod seems to generally replace newlines with space characters (sometimes even multiple ones).

Example:

Refer to the L<foobar section in the Module::Really::Really::Really::Very::Very::Long documentation| Module::Really::Really::Really::Very::Very::Long/foobar>.

Now, the problem with that formatting is that, e.g., pod2html will error out because it cannot generate a reference to " Module::Really::Really::Really::Very::Very::Long" (mind the preceding whitespace).

I know that this problem can be worked around (mostly) by reformatting it so that the separator is not followed by a newline, but that's not a good general solution (and could be made moot by using an ever longer module name that exceeds the line limit and hence needs to be split up anyway).

Another instantiation of this issue can be seen here:

This is a sentence. It's ending here, but the paragraph continues. This is another sentence.

When rendering this with pod2man, it will generate one space character between sentence. and It's, but two(!) space characters between continues. and This. I don't want this to happen. Some people might argue that using a double space character after a punctuation character is sane and actually the way to go, but the argument is pretty much made moot by the introduced inconsistency - the pod2man generator doesn't handle "normal" textual punctuation characters in the same way, so you're ending up with a pretty wild mess.

Is there any way to actually escape newlines such that perlpod does not generate space characters instead? I've tried playing around with the usual methods, but so far couldn't find any.

Replies are listed 'Best First'.
Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by haukex (Archbishop) on Nov 08, 2020 at 10:01 UTC
    Doing so, I noticed that perlpod seems to generally replace newlines with space characters (sometimes even multiple ones).

    At least for Pod::Simple, this appears to happen deep within the parser, specifically, here. The preserve_whitespace probably wouldn't help here because it would just preserve the newlines. Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.

    but two(!) space characters between continues. and This. I don't want this to happen.

    Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss. An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.

      Oh, those are some excellent points. I probably should have looked at more code than just Pod::Simple::HTML.

      At least for Pod::Simple, this appears to happen deep within the parser, specifically, here.
      Right, and to specify, both Pod::Man and Pod::Simple::HTML naturally use it.

      The preserve_whitespace probably wouldn't help here because it would just preserve the newlines.
      Yes, it does quite exactly the opposite of what I'm looking for - selectively squash whitespace completely. But, oh, look at what does use preserve_whitespace! Just like Pod::Text, really. But even quickly scanning the code reveals some inconsistencies between code and comments in Pod::Text and Pod::Man - although for a completely different parsing aspect. In recap, Pod::Man seems to handle such line-broken L<> tags correctly, but that's almost not surprising since man doesn't have any (real) notion of (hyper-) links, as far as I know.

      To elaborate a bit: I don't even really plan on generating HTML documentation (at least not personally), but the HTML output was a prime candidate for testing my internal and external links within the documentation. podchecker does a basic sanity checks on links, but nothing's better than actually seeing your generated links in a fully generated documentation. I guess I could have also used pod2texi, but I try to avoid GNU Info because I can never remember how to use it correctly …

      Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.
      Unfortunately it looks like that. However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.

      Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss.
      Exactly - and I have been misinterpreting this as the same "space instead of newline" behavior, when in reality it's some special property of the roff language family. From the groff manual, I've learned that it's only applying that special rule if the punctuation character is located at exactly the end of the line. Interestingly, and I also learned that, the groff style guide recommends to start each sentence (and thus also end each sentence) on a proper line. Since Pod::Man is not changing whitespace (mostly), this explains why I noticed that behavior at times and didn't at other times.

      An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.
      Yep, that would help. Either that, or split up sentences on punctuation marks so that each sentence starts/ends on its proper line, to adhere to groff's recommendations (in which case every sentence would get a double space by default - which is fine, since then at least it would be consistent), although that would probably quickly get unwieldy. That would require a parser that is sophisticated enough to detect sentences (and, e.g., ignore mid-sentence punctuation characters or those surrounding the special characters given in the groff manual) and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.


      OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting, naturally, so I had to re-add the most important ones. Yes, fetching the data out of the browser would be possible, but then it's processed and not the bare markup any longer.

      OT2: [apc://] seems to be broken since the repository was renamed to "perl5". Given that, it's easier to link to GitHub, which is also one of the documented source code locations.

        However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.

        I think Pod::Simple is pretty complex (see e.g. Pod::Simple::BlackBox), I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild. An argument could also be made that perlpodspec is simply missing a more clear statement of "don't break the link part of L<> across lines"...

        That would require a parser that is sophisticated enough to detect sentences ... and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.

        Yes, I agree, and if you don't mind formatting your POD in a *roff-friendly way (POD is a pretty old format) then that's probably the best solution. (Update: Personally, I wouldn't put in the effort to do this, because I don't use man to read Perl docs, I always use perldoc, which doesn't do the double space after sentences. But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via man, perldoc, or HTML.)

        OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting

        I often re-add formatting by hand, but you could also access the XML version via the link at the top of the page under the title to get to the original markup.

        [apc://] seems to be broken

        Yeah, that one seems to be outdated. A hyperlink to GitHub is probably best nowadays, hopefully that repository won't change so soon (the renaming of the repo on perl5.git.perl.org unfortunately broke a bunch of my links that I need to go back and fix sometime...).

        since man doesn't have any (real) notion of (hyper-) links, as far as I know.

        Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ...

        didn't take me long to switch to a web browser for clicking on docs

Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by davies (Monsignor) on Nov 08, 2020 at 11:42 UTC

      The only point to consider is that it is not core.
      And that's a pretty huge deal-breaker. :/ For instance, my distro (Gentoo) isn't even shipping this conveniently packaged. If the documentation is not for something like a module that is put on CPAN, but internal modules within some wider piece of software and you're targeting a wide range of operating systems, you generally start to avoid non-core modules. Additionally, it looks like Pod::Parser - which Pod::Xhtml depends on for most of the actual content parsing - used to be part of core... until 5.32.0, when it was removed in favor of Pod::Simple.

      Frankly, I would hope that Pod::Simple showed some progress during the past 10 years. But I know, once bitten …

Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by Ionic (Acolyte) on Nov 09, 2020 at 04:27 UTC

    Is there any way to actually escape newlines such that perlpod does not generate space characters instead? I've tried playing around with the usual methods, but so far couldn't find any.
    And then, there was a bit of epiphany.

    One possibility that I did overlook is using Z<> to escape newlines. That's actually bending the rules, since perlpodspec states (emphasis mine):

    This code is unusual in that it should have no content. That is, a processor may complain if it sees Z<potatoes>. Whether or not it complains, the potatoes text should ignored [sic].
    And indeed, it does complain. At least podchecker and pod2man do. pod2html, on the other hand, does not. But even though the first two mention syntax errors, the document still seems to be parsed and rendered correctly.

    Applying this to my first example:

    Refer to the L<foobar section in the Module::Really::Really::Really::Very::Very::Long documentation|Z< >Module::Really::Really::Really::Very::Very::Long/foobar>.

    Applying it to the second example is a bit more difficult:

    This is a sentence. It's ending here, but the paragraph continues. Z< >This is another sentence.
    Without the added space right before the starting part of Z<, continues. and This would be concatenated together directly.

    Just to be clear: I consider this to be a crappy workaround rather than a real solution. A "real" solution would be adding whitespace escape capabilities to perlpodspec and the numerous POD parsers, which I should probably at least kick off via a bug report.

      Instead of Z<>, you could try Z<< >>. Here's some interim results; I'll leave you to test this with the various pod* programs you've mentioned.

      Here's what I get with Z<> (probably much the same as you're seeing):

      $ cat test_pod_z.pod =pod Refer to the L<foobar section in the Module::Really::Really::Really::Very::Very::Long documentation|Z< >Module::Really::Really::Really::Very::Very::Long/foobar>. =cut $ perldoc test_pod_z.pod ... Refer to the foobar section in the Module::Really::Really::Really::Very::Very::Long documentation. POD ERRORS Hey! The above document had some coding errors, which are explaine +d below: Around line 4: A non-empty Z<>

      Now with Z<< >>. The space before >>Module... is important; without it you'll get a different error.

      $ cat test_pod_zz.pod =pod Refer to the L<foobar section in the Module::Really::Really::Really::Very::Very::Long documentation|Z<< >>Module::Really::Really::Really::Very::Very::Long/foobar>. =cut $ perldoc test_pod_zz.pod ... Refer to the foobar section in the Module::Really::Really::Really::Very::Very::Long documentation.

      Just for completeness, and because I had it handy, without the space before >>Module...:

      Refer to the foobar section in the Module::Really::Really::Really::Very::Very::Long documentation POD ERRORS Hey! The above document had some coding errors, which are explaine +d below: Around line 4: Unterminated L<Z< ... >> sequence A non-empty Z<>

      — Ken

        Instead of Z<>, you could try Z<< >>.
        Of course, a very interesting interpretation of "mandatory whitespace" when using repeated angle brackets. :)
        It does seem to let Pod::Man ignore such errors, because whitespace is ignored, but sadly doesn't seem to be a general solution either.

        Example, generated by Pod::Simple::HTML (I have slightly amended the text to include a missing "the"):

        <p>Refer to the <a>foobar section in the Module::Really::Really::Reall +y::Very::Very::Long documentation</a></p>
        While pod2html didn't throw any error, the link is clearly broken (misses a location), and, what isn't immediately seen, any data after the link is missing, too. I.e., the same HTML output is generated by this input:
        =pod Refer to the L<foobar section in the Module::Really::Really::Really::Very::Very::Long documentation|Z<< >>Module::Really::Really::Really::Very::Very::Long/foobar>. asdf fff =cut
        Losing data is bad. :/

Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by tobyink (Canon) on Nov 09, 2020 at 17:55 UTC

    Yeah, I noticed the inconsistent spacing after full stops too. I only noticed it a couple of weeks ago.