Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by haukex (Archbishop) on Nov 08, 2020 at 10:01 UTC
|
Doing so, I noticed that perlpod seems to generally replace newlines with space characters (sometimes even multiple ones).
At least for Pod::Simple, this appears to happen deep within the parser, specifically, here. The preserve_whitespace probably wouldn't help here because it would just preserve the newlines. Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.
but two(!) space characters between continues. and This. I don't want this to happen.
Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss. An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.
| [reply] [d/l] [select] |
|
|
Oh, those are some excellent points. I probably should have looked at more code than just Pod::Simple::HTML.
At least for Pod::Simple, this appears to happen deep within the parser, specifically, here.
Right, and to specify, both Pod::Man and Pod::Simple::HTML naturally use it.
The preserve_whitespace probably wouldn't help here because it would just preserve the newlines.
Yes, it does quite exactly the opposite of what I'm looking for - selectively squash whitespace completely. But, oh, look at what does use preserve_whitespace! Just like Pod::Text, really. But even quickly scanning the code reveals some inconsistencies between code and comments in Pod::Text and Pod::Man - although for a completely different parsing aspect. In recap, Pod::Man seems to handle such line-broken L<> tags correctly, but that's almost not surprising since man doesn't have any (real) notion of (hyper-) links, as far as I know.
To elaborate a bit: I don't even really plan on generating HTML documentation (at least not personally), but the HTML output was a prime candidate for testing my internal and external links within the documentation. podchecker does a basic sanity checks on links, but nothing's better than actually seeing your generated links in a fully generated documentation. I guess I could have also used pod2texi, but I try to avoid GNU Info because I can never remember how to use it correctly …
Unfortunately, it looks to me like the "best" choice is to not break L<>s across multiple lines.
Unfortunately it looks like that. However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.
Looking at the output of pod2man, this appears to be an effect of *roff typesetting, see e.g. Sentences and .ss.
Exactly - and I have been misinterpreting this as the same "space instead of newline" behavior, when in reality it's some special property of the roff language family. From the groff manual, I've learned that it's only applying that special rule if the punctuation character is located at exactly the end of the line. Interestingly, and I also learned that, the groff style guide recommends to start each sentence (and thus also end each sentence) on a proper line. Since Pod::Man is not changing whitespace (mostly), this explains why I noticed that behavior at times and didn't at other times.
An argument could be made that Pod::Man should be merging POD paragraphs into one line to avoid the issue you're seeing.
Yep, that would help. Either that, or split up sentences on punctuation marks so that each sentence starts/ends on its proper line, to adhere to groff's recommendations (in which case every sentence would get a double space by default - which is fine, since then at least it would be consistent), although that would probably quickly get unwieldy. That would require a parser that is sophisticated enough to detect sentences (and, e.g., ignore mid-sentence punctuation characters or those surrounding the special characters given in the groff manual) and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.
OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting, naturally, so I had to re-add the most important ones. Yes, fetching the data out of the browser would be possible, but then it's processed and not the bare markup any longer.
OT2: [apc://] seems to be broken since the repository was renamed to "perl5". Given that, it's easier to link to GitHub, which is also one of the documented source code locations.
| [reply] [d/l] [select] |
|
|
However, I think that this could be classified as a bug or at least a somewhat valid intent to add a new feature for escaping capabilities to Pod::Simple.
I think Pod::Simple is pretty complex (see e.g. Pod::Simple::BlackBox), I'm not sure if changes like that are easy to implement and test, especially given the huge amount of POD out in the wild. An argument could also be made that perlpodspec is simply missing a more clear statement of "don't break the link part of L<> across lines"...
That would require a parser that is sophisticated enough to detect sentences ... and that, frankly, is probably too much to ask for. Instead, I could format my POD directly in a groff-recommended way, which generally shouldn't have any negative impact on other POD renderers.
Yes, I agree, and if you don't mind formatting your POD in a *roff-friendly way (POD is a pretty old format) then that's probably the best solution. (Update: Personally, I wouldn't put in the effort to do this, because I don't use man to read Perl docs, I always use perldoc, which doesn't do the double space after sentences. But if you don't mind doing this, then the nice thing about it is your docs will look consistent no matter if read via man, perldoc, or HTML.)
OT: what's the proper way to quote on PerlMonks? Just copying the text loses all formatting
I often re-add formatting by hand, but you could also access the XML version via the link at the top of the page under the title to get to the original markup.
[apc://] seems to be broken
Yeah, that one seems to be outdated. A hyperlink to GitHub is probably best nowadays, hopefully that repository won't change so soon (the renaming of the repo on perl5.git.perl.org unfortunately broke a bunch of my links that I need to go back and fix sometime...).
| [reply] [d/l] [select] |
|
|
|
|
|
|
since man doesn't have any (real) notion of (hyper-) links, as far as I know. Its been almost two decades, but I remember clicking on links in perlpod manpages ... I think using bin/info ...
didn't take me long to switch to a web browser for clicking on docs
| [reply] |
|
|
Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by davies (Monsignor) on Nov 08, 2020 at 11:42 UTC
|
| [reply] |
|
|
| [reply] |
|
|
| [reply] |
Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by Ionic (Acolyte) on Nov 09, 2020 at 04:27 UTC
|
Is there any way to actually escape newlines such that perlpod does not generate space characters instead? I've tried playing around with the usual methods, but so far couldn't find any.
And then, there was a bit of epiphany.
One possibility that I did overlook is using Z<> to escape newlines. That's actually bending the rules, since perlpodspec states (emphasis mine):
This code is unusual in that it should have no content. That is, a processor may complain if it sees Z<potatoes>. Whether or not it complains, the potatoes text should ignored [sic].
And indeed, it does complain. At least podchecker and pod2man do. pod2html, on the other hand, does not. But even though the first two mention syntax errors, the document still seems to be parsed and rendered correctly.
Applying this to my first example:
Refer to the L<foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation|Z<
>Module::Really::Really::Really::Very::Very::Long/foobar>.
Applying it to the second example is a bit more difficult:
This is a sentence. It's ending here, but the paragraph continues. Z<
>This is another sentence.
Without the added space right before the starting part of Z<, continues. and This would be concatenated together directly.
Just to be clear: I consider this to be a crappy workaround rather than a real solution. A "real" solution would be adding whitespace escape capabilities to perlpodspec and the numerous POD parsers, which I should probably at least kick off via a bug report.
| [reply] [d/l] [select] |
|
|
Instead of Z<>, you could try Z<< >>.
Here's some interim results; I'll leave you to test this with the various pod* programs you've mentioned.
Here's what I get with Z<> (probably much the same as you're seeing):
$ cat test_pod_z.pod
=pod
Refer to the L<foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation|Z<
>Module::Really::Really::Really::Very::Very::Long/foobar>.
=cut
$ perldoc test_pod_z.pod
...
Refer to the foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation.
POD ERRORS
Hey! The above document had some coding errors, which are explaine
+d
below:
Around line 4:
A non-empty Z<>
Now with Z<< >>. The space before >>Module... is important; without it you'll get a different error.
$ cat test_pod_zz.pod
=pod
Refer to the L<foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation|Z<<
>>Module::Really::Really::Really::Very::Very::Long/foobar>.
=cut
$ perldoc test_pod_zz.pod
...
Refer to the foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation.
Just for completeness, and because I had it handy, without the space before >>Module...:
Refer to the foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation
POD ERRORS
Hey! The above document had some coding errors, which are explaine
+d
below:
Around line 4:
Unterminated L<Z< ... >> sequence
A non-empty Z<>
| [reply] [d/l] [select] |
|
|
<p>Refer to the <a>foobar section in the Module::Really::Really::Reall
+y::Very::Very::Long documentation</a></p>
While pod2html didn't throw any error, the link is clearly broken (misses a location), and, what isn't immediately seen, any data after the link is missing, too. I.e., the same HTML output is generated by this input:
=pod
Refer to the L<foobar section in the
Module::Really::Really::Really::Very::Very::Long documentation|Z<<
>>Module::Really::Really::Really::Very::Very::Long/foobar>. asdf
fff
=cut
Losing data is bad. :/
| [reply] [d/l] [select] |
|
|
Re: Escape newlines in POD / (Selectively) don't generate space characters instead
by tobyink (Canon) on Nov 09, 2020 at 17:55 UTC
|
Yeah, I noticed the inconsistent spacing after full stops too. I only noticed it a couple of weeks ago.
| [reply] |