in reply to Re: Removing empty line(s) with regex in a multiple strings in a variable
in thread Removing empty line(s) with regex in a multiple strings in a variable

This expression has been very useful but I am new to Perl and regex and I would like to fully understand the process of how this works. Would it be possible to ask if it can be broken down step by step. I am afraid I just find the capture groups and newline characters somewhat confusing.
  • Comment on Re^2: Removing empty line(s) with regex in a multiple strings in a variable

Replies are listed 'Best First'.
Re^3: Removing empty line(s) with regex in a multiple strings in a variable
by GrandFather (Saint) on Feb 10, 2019 at 20:53 UTC

    A good place to start is with the perlre documentation. It is fairly extensive, but is likely to be much more useful to you in the long run than me decomposing the expression I gave above. Note that there is a very useful "See Also" section at the bottom of the documentation - in fact you might want to check that out first. There is a trick that may help you break the regex down into easier to understand parts though:

    $str =~ s/ (^|\n) [\n\s]* /$1/gx;

    The x switch ignores white space (including new lines) so we can use white space to break up the parts of the regex into units.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re^3: Removing empty line(s) with regex in a multiple strings in a variable
by haukex (Archbishop) on Feb 10, 2019 at 22:21 UTC
    $str =~ s/(^|\n)[\n\s]*/$1/g;
    Would it be possible to ask if it can be broken down step by step.

    Here's one way to look at this regex: The replacement part is $1, so that means that whatever is matched by the first capture group (^|\n) is kept, while everything else ([\n\s]*) is removed. (^|\n) matches either: (1) ^ means to match at the beginning of the string, so any newlines or whitespace [\n\s]* at the beginning of the string are removed, or (2) \n, which means that any newlines or whitespace [\n\s]* after that newline character are removed, but the first newline character is kept. This is how empty lines, which are usually just a sequence of two newline characters \n\n, are changed into a single newline character, meaning the empty line(s) are removed.

    If you're unsure of any of these things, then I can recommend perlretut.

      Thank you both very much. Will crack on with the tutorials and documentation right now!