Hi monks,

What follows is my very first adventure in perl world, so please be merciful.

I wrote the following code to modify a bunch of pandoc-generated DokuWiki formatted files in order to convert some expressions to DokuWiki internal links. That went through many iterations and I am now somewhat satisfied with the result except for one mystery I am unable to pierce through.
It does not crash my code, nor does it sends any warning, but I can't explain this result.

Here's my problem:

In this substitution part of the code: .(defined($4) ? $4 =~ tr[\/*][]dr : '.') line 44, I check if the value of $4 is defined and if it is, I apply a tr to it to remove italic (//) or bold (**) marks if they are present. If $4 is undefined (meaning there are no dot, comma or semicolon after the last word of the expression), the conditional operator sends a dot to end the substitution.

So either this data entry:

**Voir :** proton, solution hydrogénée//.//
or that one:
**Voir :** proton, solution hydrogénée.
Give this result:
**Voir :** [[glossaire:entrees:p:proton|proton]], [[glossaire:entrees: +s:solution_hydrogenee|solution hydrogénée]].
Which is exactly what is needed.
But if there is no final dot or comma after solution hydrogénée:
**Voir :** proton, solution hydrogénée
What I get is:
**Voir :** [[glossaire:entrees:p:proton|proton]], [[glossaire:entrees: +s:solution_hydrogenee |solution hydrogénée ]].
With line breaks after solution_hydrogene and hydrogénée
Which is ok but I was expecting:
**Voir :** [[glossaire:entrees:p:proton|proton]], [[glossaire:entrees: +s:solution_hydrogenee|solution hydrogénée]].
No line breaks.
And I don't understand why and how those line breaks get inserted.
Actually, those breaks don't really matter, as DokuWiki format being some kind of lesser markdown, it just gives the same html output with or without them, but it worries me as a sign of my incomplete understanding of my own code.

It may have something to do with the /x modifier I suspect.

The actual code follows


1 #!/usr/bin/env perl 2 3 use 5.36.1; 4 use warnings; 5 use strict; 6 use utf8; 7 use autodie; 8 9 use warnings qw< FATAL utf8 >; 10 use open qw< :std :utf8 >; 11 use charnames qw< :full >; 12 use feature qw< unicode_strings >; 13 14 binmode(STDIN, ":utf8"); 15 binmode(STDOUT, ":utf8"); 16 binmode(STDERR, ":utf8"); 17 18 use Text::Undiacritic qw(undiacritic); 19 20 $^I = ".bak"; 21 22 while (<>){ 23 24 my $voir = $_; 25 26 $voir =~ s/ 27 (?:^\*\*Voir\s:\*\* 28 | 29 \G(?!^) 30 (?!\[)) 31 \K 32 (\s?) 33 ((\w[\/*]*) 34 (?:[^\.,;\n\r]\s?)+) 35 [\/*]*([\.,;])?[\/*]* 36 / 37 "$1\[\[glossaire:entrees:" 38 .lc(undiacritic($3)) 39 .":" 40 .lc(undiacritic($2 =~ tr[ \/*][_]dr)) 41 ."|" 42 .$2 =~ tr[\/*][]dr 43 ."\]\]" 44 .(defined($4) ? $4 =~ tr[\/*][]dr : '.') 45 /gemx; 46 47 print $voir; 48 }

Some context

  1. That code's job is to go through some 2500+ files, some of them several hundreds lines long, find lines beginning with **Voir :** and insert dokuWiki links in them,
  2. I use Text::Undiacritic because those text files contain accentuated characters I need to replace with their unaccentuated vesions to build file names,
  3. The Regex came first, then I searched for the best tool to use it through many files and that's how I ended up using perl.

Thanks for reading through !


In reply to Unexpected line breaks in substitution results by paschacroutt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.