L8on has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I have a slight issue with my script using split and a regex pattern. I need to split up file names into separate pieces and when I run split using a regex, the first item is blank. I've looked through the docs and a couple of my PERL books and it's obvious that I'm overlooking something simple. Here's what I've done:
$f='453445-5.bmp'; ($base, $dash, $copy, $ext) = split /^(\d+|\w+)(-)(\d+)(\.bmp)/, $f;
$base is always blank
$dash = 453445
$copy = -
$ext = 5
if I put a 5th var in there, then it will contain '.bmp'

How should this be to get the values loaded into the right vars? Also, what am I doing wrong to cause the first var to be blank?

(I don't know if I'm mixing something up, because this regex works exactly as I expect using s// operator )

Any wisdom/insight would be greatly appreciated.

Thanx, L8on

Replies are listed 'Best First'.
Re: Question about split
by dragonchild (Archbishop) on Dec 10, 2004 at 14:35 UTC
    You're using captures with split. You probably meant to say:
    $f='453445-5.bmp'; ($base, $dash, $copy, $ext) = $f =~ /^(\d+|\w+)(-)(\d+)(\.bmp)/;

    The regex is the delimiter to split. Split, AFAIK, assumes you have at least one value before the delimiter. So, you had a blank value before the delimiter, then the delimiter (which was your entire value).

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Question about split
by davorg (Chancellor) on Dec 10, 2004 at 14:38 UTC

    The first argument to split defines what you want to _split_ on. This implies that there will be an element that you want to keep _before_ the first match. In your case, that will always be an empty string as nothing comes before that in your data.

    You don't actually want to use split for this. You should just use a match operator instead.

    $f='453445-5.bmp'; ($base, $dash, $copy, $ext) = %f =~ /^(\d+|\w+)(-)(\d+)(\.bmp)/;

    Use m// when you know what you want to keep. Use split when you know what you want to throw away.

    Oh, and if you have PERL books, then I recommend getting rid of them and buying Perl books instead :)

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Thanx everyone! This is exactly what I was looking for. I was thinking ( for no good reason ) that the =~ operator would change the $f variable and it wouldn't stay intact for use later and that's why I didn't try it.

      I knew it was something on my end.

      Oh, and if you have PERL books, then I recommend getting rid of them and buying Perl books instead :)

      --I just realized that you're correct, I've always thought that is was all caps and never paid attention. :)

      thanx again,
      L8on
Re: Question about split
by ccn (Vicar) on Dec 10, 2004 at 14:35 UTC

    read perldoc -f split about parentheses inside of regexp and the first field

    You can get what you whant without a split

    ($base, $dash, $copy, $ext) = $f =~ /^(\d+|\w+)(-)(\d+)(\.bmp)/;
Re: Question about split
by revdiablo (Prior) on Dec 10, 2004 at 19:32 UTC

    Others have shown you how to solve the problem without using split, but I thought it might be nice to show how to solve it with split. I have made a number of assumptions that may not be correct, but I wanted to show you how to do this, just for your reference. (I also used a core module, File::Basename, that is very useful.)

    use File::Basename; my ($ext, $dash) = ('.bmp', '-'); # assumptions my $f = '453445-5.bmp'; my ($base, $copy) = split /$dash/, basename($f, $ext); print "$base$dash$copy$ext\n"
Re: Question about split
by graff (Chancellor) on Dec 11, 2004 at 16:35 UTC
    One more little nit-pick: saying  \d+|\w+ in a perl regex is sort of like saying "apples or fruit", because \w matches any alpha-numeric character or underscore. \w is equivalent to  [0-9A-Za-z_] (though it'll actually match a lot more than that, if you feed it non-English text data in utf8 strings).
      It's not nit-picking at all, I obviously overlooked that way back when I learned it and have been writing crap like that all this time. Thanx for catching that. ( I always wondered why everyone snickered at me so much when I handed them my code :)

      Thanx again,
      L8on