in reply to reg expression question

Hello healingtao, and welcome to the Monastery!

To begin, I would split the string on its first sequence of one or more whitespace characters. Only then would I apply a regular expression to the right-hand side of the split:

#! perl use strict; use warnings; use Test::More; my %data = ( 'GNRABS 2014-186' => 'GNRABS14-186', 'A10 2013-1' => 'A1013-1', 'CGBAM 2014-HD' => 'CGBAM14-HD', 'FHMS K032' => 'FHMS-K032', ); is(rewrite($_), $data{$_}) for keys %data; done_testing(); sub rewrite { my ($string) = @_; my ($left, $right) = split /\s+/, $string, 2; if (my @m = $right =~ /\d{2}(\d{2})(.*)/) { $right = $m[0] . ($m[1] =~ /^-/ ? '' : '-') . $m[1]; } else { $right = '-' . $right unless $right =~ /^-/; } return $left . $right; }

See also perlretut.

Hope that helps,

Update: Improved wording.

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: reg expression question
by healingtao (Novice) on Jan 29, 2015 at 07:14 UTC
    Thanks very much to all of you who answered. You guys are awesome. I chose Athanasius's answer and everything I anticipated yesterday worked with this solution but there are a few more use cases that came up. Athanasius, While I'm still learning, do you mind updating code for the following additional cases: 1) Input: AB AL0024 Output with code provided: AB- Desired output: AB-AL0024 (if the year is missing, then is it possible to add a dash?) 2) Input: DRSVA 1994 K-2 Output with code provided: DRSVA94- K-2 Desired Output: DRSVA94-K-2 (spaces are not allowed in the output) 3) Input: PUN VALEY B Output with code provided: PUN-VALEY B Desired output: PUN-VALEYB (spaces are not allowed in the output) 4)Input: TIBET 2015 Output with code provided: TIBET15- Desired output: TIBET15 (is it possible to avoid dashes if nothing follows it?) The original cases were critical and these are special cases which will appear rarely if at all but I just wanted to code for it just in case. Let me know if this is possible. Thanks

      These special cases are fairly easy to accommodate:

      1. To prevent 4 consecutive digits from being wrongly identified as a year, specify that the digits occur at the start of the right-hand string:

        if (my @m = $right =~ /^\d{2}(\d{2})(.*)/) # ^ Add this

        Within a regex, the special character ^ means “match at the start of the line.”

      2. To remove spaces, use the substitution operator (with the /g modifier for global replacement):

        $right =~ s/\s+//g;
      3. As for 2.

      4. To prevent the string from ending in a dash, use the substitution operator again:

        $right =~ s/-$//;

        $ is another special regex character: it means “match at the end of the line.”

      See “Metacharacters” in perlre.

      Note the value of using a test-driven approach: I was able to add the 4 new input/output pairs to %data, make changes to get the new test cases to pass, and know that these modifications did not invalidate the original solution (because the 4 original test cases still pass).

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,