hilbert has asked for the wisdom of the Perl Monks concerning the following question:

Given the string: 'A B 1 2 3 4 5 6 7 8 9'
I want to put a semicolon between all the digits, replacing the space.

So, I want to get: 'A B 1;2;3;4;5;6;7;8;9'.

I coded in the file s.pl:

#!/usr/bin/perl -w use strict; while(<>) { my $row = $_; $row =~ s/(\d)\s+(\d)/$1;$2/g; print $row; } # while

then:

echo 'A B 1 2 3 4 5 6 7 8 9' | perl s.pl

returns:

A B 1;2 3;4 5;6 7;8 9

which is not what I expected...

Can somebody please tell me how to correct the s/// statement to get the expected result?

Thanks a lot.

Hilbert

Replies are listed 'Best First'.
Re: Beginner question about search and replace
by moritz (Cardinal) on Sep 16, 2011 at 10:10 UTC

    First let me explain why your solution doesn't work as expected:

    s///g starts each match where the previous match left off, or at the start of the string for the first match.

    The first match finds 1 2, and the next match searches for a \d, which is the 3 -- no overlap occurs.

    A possible fix is to not match the second digit:

    s/(\d)\s+/$1;/g;

    Which produces the output you want. If it's important that no substitution happens after a number but before a letter, you can use

    s/(\d)\s+(?=\d)/$1;/g;

    The (?=\d) looks for a digit, but it doesn't consume it (search for look-ahead in perlre).

    Update: Kudos for supplying your code, actual output and expected output. It makes answering your question easy, and can't be taken for granted. Welcome to perlmonks!

      Thanks a lot for your absolutely clear explanation and solution.
      You may also wish to note that if you're reading the data from a file and don't chomp the trailing newline, the two versions will produce different results.

      The first form will result in a trailing ';' because newline is part of the whitespace character class. (\s)

      -Greg

Re: Beginner question about search and replace
by choroba (Cardinal) on Sep 16, 2011 at 10:11 UTC
      Thanks a lot!
Re: Beginner question about search and replace
by luis.roca (Deacon) on Sep 16, 2011 at 12:30 UTC

    As moritz explained you were matching every pair of numbers when you wanted every number with a space following it.

    Another way to replace every space following a number with a semicolon could be *(untested):

      s/(\d)(?:\s+)/$1;/g

    The \d matches any single digit and is the same as [0-9]. As you learned in your attempt, the parentheses 'capture' that match and store it in memory so we can use it later. The second part of our match looks for at least one space following that digit but doesn't store it in memory because of the (?: ) since we're not planning on using it to help build our replace pattern.

    Aside from the perldocs on regexes, if you're interested, you might like Mastering Regular Expressions by Jeffrey E.F. Friedl and/or Data Munging with Perl by Dave Cross.

    UPDATE
    Same day, 16.Sep.2011 :: 02:35:24 PM :: Changed: s/(\d)(?:\s)+/$1;/g   To : s/(\d)(?:\s+)/$1;/g   Following AnomalousMonk's suggestion.


    "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote

      Why the (?: ) around the space character? It runs just fine as below.

      s/(\d)\s+/$1;/g;

        "Why the (?: ) around the space character? It runs just fine as below?"

        It does work well and is less complex to explain. I use the (?: ) to show the practice of not capturing matches into memory which wont be used in the replacement. In this specific example memory isn't going to be a problem because we're only dealing with a single string. But I personally like using it even as a way to mark what I want and don't want to work with in the replacement.

        "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote

        I agree with luis.roca's use of  (?:\s)+ in a pedagogic or self-documentary context as already explained above.

        I would be inclined to quibble with the use of  (?:\s)+ versus  (?:\s+) especially in a pedogogic example. While these two expressions behave in exactly the same way in all respects AFAIU, the corresponding capturing expressions  (\s)+ and  (\s+) behave very differently as to the characters captured, and in an explanatory example this might, by suggestion or implication, lead to great confusion.

Re: Beginner question about search and replace
by Anonymous Monk on Sep 16, 2011 at 19:13 UTC

    This is an alternative to a look-behind:

    s/\d\K\s+(?=\d)/;/g;