in reply to split versus =~

G'day russlo,

Welcome to the Monastery.

"My question is: why?"

As others have pointed out, without any data, we can't really answer that. Here are a few possible reasons (non-exhaustive list):

For future reference, please provide a "Short, Self-Contained, Correct Example" and follow the guidelines in "How do I post a question effectively?".

"Additionally: what can I do to provide the correct splitting that we're looking for here?"

Comment your regex in full. By forcing yourself to document exactly what your regex does, you will more easily spot logic errors and typos. By writing your regex as I've done in the code below, it's very easy to make changes (e.g. at some future point perhaps Z can be negative or the string becomes "W-X-Y-Z"); fiddling around inside a regex which is jammed into a single string with no whitespace is highly error-prone.

As others have already suggested, write a test script. In the code below, I added an "expect failure"; mostly to show you what that outputs. I also noted you mentioned a problem with '-2-3-4'; to be honest, I didn't follow what the problem was, but I added it for testing anyway. Add more tests if you encounter problem input that isn't handled by the regex; you may also need to alter the regex itself if it doesn't cover all eventualities.

Note that with the way I've written the code, you can just add to @tests without needing to change any other part of the code.

You should also provide some validation and error reporting. What happens if the input doesn't match the regex? — on-screen warning? logfile entry? kill the script?

Here's my test script:

#!/usr/bin/env perl use strict; use warnings; use constant { STR => 0, EXP => 1, }; use Test::More; my @tests = ( ['1-2-3', '123'], ['-1-2-3', '-123'], ['1--2-3', ''], ['1-2--3', ''], ['1--2--3', ''], ['-1--2-3', ''], ['-1-2--3', ''], ['-1--2--3', ''], ['1-2-', ''], ['-1-2-', ''], ['garbage', ''], ['expect', 'failure'], ['-2-3-4', '-234'], ); plan tests => 0+@tests; my $re = qr{(?x: ^ # start of string ( # start capture X -? # optional leading minus \d+ # 1 or more digits ) # end capture X - # required hyphen ( # start capture Y \d+ # 1 or more digits ) # end capture Y - # required hyphen ( # start capture Z \d+ # 1 or more digits ) # end capture Z $ # end of string )}; for my $test (@tests) { my ($X, $Y, $Z, $got) = ('') x 4; if (($X, $Y, $Z) = $test->[STR] =~ $re) { $got = "$X$Y$Z"; } ok($got eq $test->[EXP], "Testing '$test->[STR]' is " . (length $test->[EXP] ? 'GOOD' : 'BAD') ); }

And here's the output:

1..13 ok 1 - Testing '1-2-3' is GOOD ok 2 - Testing '-1-2-3' is GOOD ok 3 - Testing '1--2-3' is BAD ok 4 - Testing '1-2--3' is BAD ok 5 - Testing '1--2--3' is BAD ok 6 - Testing '-1--2-3' is BAD ok 7 - Testing '-1-2--3' is BAD ok 8 - Testing '-1--2--3' is BAD ok 9 - Testing '1-2-' is BAD ok 10 - Testing '-1-2-' is BAD ok 11 - Testing 'garbage' is BAD not ok 12 - Testing 'expect' is GOOD # Failed test 'Testing 'expect' is GOOD' # at ./pm_11148100_re_parse.pl line 55. ok 13 - Testing '-2-3-4' is GOOD # Looks like you failed 1 test of 13.

— Ken