Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

split question

by Anonymous Monk
on Sep 08, 2001 at 02:54 UTC ( #111066=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a line of input that looks something like:

RPC, rpc #001b, (1987)

I'm attempting to strip the year from inside the parenthesis. The code is:

($remainder, $working) =~ split(/\s\(/, $inLine);

where $inLine is the above-referenced line. I was expecting to see the following after the split:

$remainder = RPC, rpc #001b,
$working = 1987)

However, $remainder contains the entirety of $inLine and $working is empty. What am I missing?

Replies are listed 'Best First'.
Re: split question
by blakem (Monsignor) on Sep 08, 2001 at 03:01 UTC
    Change =~ to = and you'll get the results you expect, though a regex is probably a better approach for this one anyway.


      So just what is the meaning of what he wrote, putting split, rather than m, s, tr, or y, after the =~ ?
        This is just a "for the record" post, as the answer has obviously already been given, but what the heck. I did a little experimenting (because I didn't know this before hand) with wantarray() in subs, and assuming that builtin functions work the same way, what actually happens goes something like this. Whenever a line of code like:
        @list =~ sub_or_function(); # or $scalar =~ sub_or_function();
        appears, the sub is called in scalar context (at least it did in the simple sub i wrote, which just printed out the context based on the value of wantarray()). So for split(), the return value would be 2, since the scalar version of split returns the number of fields the string was split into. So essentially, the line comes down to this:
        ($remainder, $working) =~ 2;
        which won't effect either variable, since =~ is only a binding operator, and does not assign value. Which is why your code left the values intact after attempting to change them. Also, that line of code will not work with warnings on, as I get "Use of implicit split to @_ at line.." when I ran the code with warnings on. Of course, this is only my interpretation of the results that I recieved, and could very well be wrong. ;)
Re: split question (boo)
by boo_radley (Parson) on Sep 08, 2001 at 03:04 UTC
    split may not be so hot for this, especially behind an =~ operator... .
    I'd suggest using a regex for this, something similar to :
    my $inLine= "RPC, rpc #001b, (1987)"; $inLine=~/\((\d+)\)/; print $1

    or, if you really wanna use split,

    my $inLine= "RPC, rpc #001b, (1987)"; ($itm, $date)= split /[()]/, $inLine; print $date;

    but I'd shy away from that, personally.

      Or if he really wants the "remainder":
      my $inLine= "RPC, rpc #001b, (1987)"; $inLine =~ /([^(]+)\((\d+)\)/; my ($remainder, $date) = ($1, $2);
      .. assuming $inline is the only line of data...
Re: split question
by chipmunk (Parson) on Sep 08, 2001 at 18:36 UTC
    I agree with the earlier responses that a regex is a better solution that split here. However, I noticed that all the suggested regex solutions access $1 et al. without making sure the regex match succeeded. A failed match does not reset the special regex variables -- they keep their values from the previous successful match!

    Here's one way to check the success of the match. Of course, you can change the structure, but the basic idea is to only access $1 et al. if the match succeeds.

    my $inLine = "RPC, rpc #001b, (1987)"; my $year; if ($inLine =~ /\((\d+)\)/) { $year = $1; } else { die "No year found in '$inLine'.\n"; } print "Found year '$year' in '$inLine'.\n";


    Here's a demonstration of a bug caused by not checking the success of the match.
    my $inLine = "RPC, rpc #001b, 1987"; $inLine =~ /(#\d+[a-z])/; my $rpcNum = $1; print "Found rpc num '$rpcNum' in '$inLine'.\n"; $inLine =~ /\((\d+)\)/; my $year = $1; print "Found year '$year' in '$inLine'.\n";
    This produces the following output.
    Found rpc num #001b in RPC, rpc #001b, 1987. Found year #001b in RPC, rpc #001b, 1987.
      Using the dialectic is another way to deal with this.

      #!/usr/bin/perl # Prints 'First: A', as you expect 'abcd' =~ /(a)/; print "First: ", uc $1, "\n"; # Also prints 'Second: A', though you expect it not to 'efgh' =~ /(a)/; print "Second: ", uc $1, "\n"; #==== # Prints 'Third: A' - again, as you expect 'abcd' =~ /(a)/ and print "Third: ", uc $1, "\n"; # Doesn't print anything - what you want 'efgh' =~ /(a)/ and print "Fourth: ", uc $1, "\n";
      A slightly more readable version might be:
      my $inLine = "RPC, rpc #001b, (1987)"; my ($year) = $inLine =~ /\((\d+)\)/; die "No year found in '$inLine'\n" unless defined $year;
      The reason for checking definedness is because of the potential for an erroneous year zero. (You might want to die on year 0 ... I dunno.)

      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: split question
by Amoe (Friar) on Sep 08, 2001 at 16:21 UTC
    Isn't this code more the approach you're after?
    use strict; use warnings; my $inLine = 'RPC, rpc #001b, (1987)'; $inLine =~ s/\((\d+)\)$//; my $working = $1; print $working;
    Sorry for the probably appalling regex, I am only a scribe :)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://111066]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2023-04-01 20:44 GMT
Find Nodes?
    Voting Booth?

    No recent polls found