healingtao has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to substitute from one string to another in perl: Here are 4 examples that handle all my use cases: 1) From: 'GNRABS 2014-186' To: 'GNRABS14-186' 2) From: 'A10 2013-1' To: 'A1013-1' 3) From: 'CGBAM 2014-HD' To: 'CGBAM14-HD' 4) From: 'FHMS K032' To: 'FHMS-K032' Description: 1) The first part (before space) is just any number of characters and I need to use it as is 2) Second part starts after space and is the year I need to strip from 4 chars to 2 (as per first 3 examples) . Or skip if it doesn’t exist like example 4 3) The third part is what is remaining (including dash like all first 3 examples), but you always need a dash even if it doesn’t exist in the original string like examples 4 Can you recommend what code I need to use for this type of conversion from one string to another

Replies are listed 'Best First'.
Re: reg expression question
by Athanasius (Archbishop) on Jan 28, 2015 at 13:36 UTC

    Hello healingtao, and welcome to the Monastery!

    To begin, I would split the string on its first sequence of one or more whitespace characters. Only then would I apply a regular expression to the right-hand side of the split:

    #! perl use strict; use warnings; use Test::More; my %data = ( 'GNRABS 2014-186' => 'GNRABS14-186', 'A10 2013-1' => 'A1013-1', 'CGBAM 2014-HD' => 'CGBAM14-HD', 'FHMS K032' => 'FHMS-K032', ); is(rewrite($_), $data{$_}) for keys %data; done_testing(); sub rewrite { my ($string) = @_; my ($left, $right) = split /\s+/, $string, 2; if (my @m = $right =~ /\d{2}(\d{2})(.*)/) { $right = $m[0] . ($m[1] =~ /^-/ ? '' : '-') . $m[1]; } else { $right = '-' . $right unless $right =~ /^-/; } return $left . $right; }

    See also perlretut.

    Hope that helps,

    Update: Improved wording.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks very much to all of you who answered. You guys are awesome. I chose Athanasius's answer and everything I anticipated yesterday worked with this solution but there are a few more use cases that came up. Athanasius, While I'm still learning, do you mind updating code for the following additional cases: 1) Input: AB AL0024 Output with code provided: AB- Desired output: AB-AL0024 (if the year is missing, then is it possible to add a dash?) 2) Input: DRSVA 1994 K-2 Output with code provided: DRSVA94- K-2 Desired Output: DRSVA94-K-2 (spaces are not allowed in the output) 3) Input: PUN VALEY B Output with code provided: PUN-VALEY B Desired output: PUN-VALEYB (spaces are not allowed in the output) 4)Input: TIBET 2015 Output with code provided: TIBET15- Desired output: TIBET15 (is it possible to avoid dashes if nothing follows it?) The original cases were critical and these are special cases which will appear rarely if at all but I just wanted to code for it just in case. Let me know if this is possible. Thanks

        These special cases are fairly easy to accommodate:

        1. To prevent 4 consecutive digits from being wrongly identified as a year, specify that the digits occur at the start of the right-hand string:

          if (my @m = $right =~ /^\d{2}(\d{2})(.*)/) # ^ Add this

          Within a regex, the special character ^ means “match at the start of the line.”

        2. To remove spaces, use the substitution operator (with the /g modifier for global replacement):

          $right =~ s/\s+//g;
        3. As for 2.

        4. To prevent the string from ending in a dash, use the substitution operator again:

          $right =~ s/-$//;

          $ is another special regex character: it means “match at the end of the line.”

        See “Metacharacters” in perlre.

        Note the value of using a test-driven approach: I was able to add the 4 new input/output pairs to %data, make changes to get the new test cases to pass, and know that these modifications did not invalidate the original solution (because the 4 original test cases still pass).

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: reg expression question
by Anonymous Monk on Jan 28, 2015 at 07:48 UTC
Re: reg expression question
by Sathishkumar (Scribe) on Jan 28, 2015 at 07:58 UTC
    find below code
    $variable =~ s/([^<]*?)\s+\d{2}(\d{2}[^<]*?)$/$1-$2/igs; $variable =~ s/\s+/-/igs;