G'day russlo,

Welcome to the Monastery.

"My question is: why?"

As others have pointed out, without any data, we can't really answer that. Here are a few possible reasons (non-exhaustive list):

For future reference, please provide a "Short, Self-Contained, Correct Example" and follow the guidelines in "How do I post a question effectively?".

"Additionally: what can I do to provide the correct splitting that we're looking for here?"

Comment your regex in full. By forcing yourself to document exactly what your regex does, you will more easily spot logic errors and typos. By writing your regex as I've done in the code below, it's very easy to make changes (e.g. at some future point perhaps Z can be negative or the string becomes "W-X-Y-Z"); fiddling around inside a regex which is jammed into a single string with no whitespace is highly error-prone.

As others have already suggested, write a test script. In the code below, I added an "expect failure"; mostly to show you what that outputs. I also noted you mentioned a problem with '-2-3-4'; to be honest, I didn't follow what the problem was, but I added it for testing anyway. Add more tests if you encounter problem input that isn't handled by the regex; you may also need to alter the regex itself if it doesn't cover all eventualities.

Note that with the way I've written the code, you can just add to @tests without needing to change any other part of the code.

You should also provide some validation and error reporting. What happens if the input doesn't match the regex? — on-screen warning? logfile entry? kill the script?

Here's my test script:

#!/usr/bin/env perl use strict; use warnings; use constant { STR => 0, EXP => 1, }; use Test::More; my @tests = ( ['1-2-3', '123'], ['-1-2-3', '-123'], ['1--2-3', ''], ['1-2--3', ''], ['1--2--3', ''], ['-1--2-3', ''], ['-1-2--3', ''], ['-1--2--3', ''], ['1-2-', ''], ['-1-2-', ''], ['garbage', ''], ['expect', 'failure'], ['-2-3-4', '-234'], ); plan tests => 0+@tests; my $re = qr{(?x: ^ # start of string ( # start capture X -? # optional leading minus \d+ # 1 or more digits ) # end capture X - # required hyphen ( # start capture Y \d+ # 1 or more digits ) # end capture Y - # required hyphen ( # start capture Z \d+ # 1 or more digits ) # end capture Z $ # end of string )}; for my $test (@tests) { my ($X, $Y, $Z, $got) = ('') x 4; if (($X, $Y, $Z) = $test->[STR] =~ $re) { $got = "$X$Y$Z"; } ok($got eq $test->[EXP], "Testing '$test->[STR]' is " . (length $test->[EXP] ? 'GOOD' : 'BAD') ); }

And here's the output:

1..13 ok 1 - Testing '1-2-3' is GOOD ok 2 - Testing '-1-2-3' is GOOD ok 3 - Testing '1--2-3' is BAD ok 4 - Testing '1-2--3' is BAD ok 5 - Testing '1--2--3' is BAD ok 6 - Testing '-1--2-3' is BAD ok 7 - Testing '-1-2--3' is BAD ok 8 - Testing '-1--2--3' is BAD ok 9 - Testing '1-2-' is BAD ok 10 - Testing '-1-2-' is BAD ok 11 - Testing 'garbage' is BAD not ok 12 - Testing 'expect' is GOOD # Failed test 'Testing 'expect' is GOOD' # at ./pm_11148100_re_parse.pl line 55. ok 13 - Testing '-2-3-4' is GOOD # Looks like you failed 1 test of 13.

— Ken


In reply to Re: split versus =~ by kcott
in thread split versus =~ by russlo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.