ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:
Consider the following string: "a,b,c,d,5"
The format of the string is like this: "substr,substr,substr,...,value"
I use regex to check the string:
It works fine but it doesn't look very good.my ($value) = ($row =~ /.*,(.*)/); # gets value after the last comma if (looks_like_number($value)) { ($row =~ s/,[^,]*$//); # gets substring before the last comma # DO STUFF ... } # DO STUFF ...
I cant understand why in my ($value) = ($row =~ /.*,(.*)/); I need the brackets on the scalar but in ($row =~ s/,[^,]*$//); I don't need.
In other words, why is there is a syntax difference between the following two lines:
my ($value) = ($row =~ /.*,(.*)/); my ($val) = ($row =~ s/,[^,]*$//);
Testing: my $row = "a,b,c,d,15";
Output of first line: 15
Output of second line: 1 (why not a,b,c,d?)
How to do it in the same way?
|
---|
Replies are listed 'Best First'. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Re: difference in regex
by haukex (Archbishop) on May 29, 2018 at 14:03 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
You will find the answer to your question in "Regexp Quote-Like Operators" in perlop - basically, different regex operations have different return values in different contexts. See also perlretut for a tutorial.
In this table, "true" and "false" refer to Perl's notion of Truth and Falsehood. Remember not to rely on any of the capture variables like $1, $2, etc. unless the match succeeds! In my $foo = "bar"=~/a/;, the right-hand side of the assignment ("bar"=~/a/) is in scalar context. In my ($foo) = "bar"=~/a/; or my @foo = "bar"=~/a/;, the right-hand side is in list context. That's why, in your example, you need those parens in ($value): because you want the matching operation to return the contents of the capture group. Note that your expressions can be slightly simplified, not all the parens you showed are needed:
A few additional comments on your code:
Update: Added s///r to the table and added a few more doc links. A few other edits and updates. 2019-02-16: Added "Return Value on Failure" column to table, and a few other small updates. 2019-08-17: Updated the link to "Truth and Falsehood". | [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by ovedpo15 (Pilgrim) on May 29, 2018 at 14:11 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
As I mentioned on one of the posts on this thread - I would like to split it somehow into two scalars. I can use my ($a,$b) = ($row=~ /(.*),(.*)/); But if $row doesn't have commas it won't work. how do I make always put a string into $path for example: if "abc" it will be $path = "abc" and $value is undefined. if "abc,5" it will be $path = "abc" and $value = 5 if "a,b,c,5" it will be $path = "a,b,c" and $value = 5 | [reply] [Watch: Dir/Any] [d/l] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by haukex (Archbishop) on May 29, 2018 at 14:30 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Although personally I'd still use a conditional, of course it's possible to do it all in one regex. One way is by making the comma optional by putting a ? on a group, in this case I'm using a non-capturing (?:...) group, and I had to make the first part of the regex non-greedy so that it doesn't swallow an existing comma:
Update: An alternative that says a little more explicitly: either match a string with no commas in it, or, if there are commas, I want to match the thing after the last one: /^ (?| ([^,]*) | (.*) , ([^,]*) ) $/x Update 2: And it turns out this regex is much faster than the above! (try using it in this benchmark) | [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by Athanasius (Archbishop) on May 29, 2018 at 13:42 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hello ovedpo15, In Perl, a function’s return value(s) may be different depending on the context in which the function is called. (Whether they are or not depends on the internal details of the function itself.) The statement my $value = $row =~ /.*,(.*)/; calls the regex operator m// in scalar context, so it returns true if the match succeeds and false if it fails. But in the statement my ($value) = $row =~ /.*,(.*)/; the parentheses around $value put the call to m// into list context and a list of the matches is returned. By contrast, the substitution operator s/// returns the number of substitutions made regardless of the calling context. But you can change this behaviour by adding an /r modifier to the substitution. This creates a copy of the string (in this case $row), applies the substitutions (if any) to the copy, and returns that copy. E.g.
See the sections m/PATTERN/msixpodualngc and s/PATTERN/REPLACEMENT/msixpodualngcer in perlop#Regexp-Quote-Like-Operators. Hope that helps,
| [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by ovedpo15 (Pilgrim) on May 29, 2018 at 13:58 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I tried to use the following regex my($path,$value) = ($row =~ /(.*),(.*)/); to split the string. but if there are no commas it won't work. Which regex should I use in order to always put the string into $path so I can only check if $value is defined? for example: if "abc" it will be $path = "abc" and $value is undefined. if "abc,5" it will be $path = "abc" and $value = 5 if "a,b,c,5" it will be $path = "a,b,c" and $value = 5 The algo I would like to implement : As I see it the steps are: 1. if the string has commas: 1.a. get the last comma and check if the last substring is a number - if so put it in hash like this: $hash{$path} = $value; 1.b. if the substring after the last comma isn't a number - $hash{$path} = 1; 2. if string has no commas: $hash{$string} = 1; how to implement this? | [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by Eily (Monsignor) on May 29, 2018 at 14:11 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Because ($path, $value) is a list, you get the list of submatches (list context). But if you do something like: since the if expects a boolean, the operation will return true if something matches, and false otherwise (boolean context). And you can still access the left and right part as $1 and $2. So you can do:
| [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by haukex (Archbishop) on May 29, 2018 at 14:20 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
The algo I would like to implement : Although a good start, point 1.b. is unclear: in this case, do you want the whole string stored in $path, or just the part up until the last comma? For now I'm assuming the latter. Anyway, while there may always be "nicer" ways to write things in Perl (Update: and you haven't specified what you meant with "it doesn't look very good"), sometimes a good starting point is a direct translation:
Of course there's lots of potential for shortening that, e.g. by combining it with my example code from here. Update: A really simple shortening:
| [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by haj (Vicar) on May 29, 2018 at 13:58 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
A regular expression can tell you two things: Whether there's a match at all, and what some parts in the match are. You are using it in different ways, on two levels: $row =~ /.*,(.*)/ is a pattern match. It returns whether $row contains the pattern. If you have parentheses in the regex (and you have), then the part of the match within the parentheses is captured - and if you evaluate the pattern match in list context, these captures will be returned as a list. By writing my ($value) you create a list context, therefore you get whatever matched after the last comma. $row =~ s/,[^,]*$// is a substitution s/text/pattern/. Substitutions change the variable they operate upon, and they return the number of substitutions made, regardless of context. Hence the 1 in the second line: One substitution. You get the substring before the last comma in the variable $row by deleting the last comma and whatever follows it. If you want the second example to behave like the first, add a capture, and replace the substitution by a match, like this: | [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by wjw (Priest) on May 29, 2018 at 13:57 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I sometimes go to the following to remind myself of how pattern matching works. It is usually enough to jar loose something in my cluttered memory to get me going. Cluttered Memory Shaker ...the majority is always wrong, and always the last to know about it... A solution is nothing more than a clearly stated problem... | [reply] [Watch: Dir/Any] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by Veltro (Hermit) on May 29, 2018 at 13:15 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[reply] [Watch: Dir/Any] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by haukex (Archbishop) on May 30, 2018 at 13:47 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [reply] [Watch: Dir/Any] [d/l] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by anonymized user 468275 (Curate) on May 30, 2018 at 10:46 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
(untested) | [reply] [Watch: Dir/Any] [d/l] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Re: difference in regex
by sundialsvc4 (Abbot) on May 29, 2018 at 14:27 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
(After dropping up-votes on every single comment in this thread up to now ...) If what you literally want to do is to “split the string by commas and take the last piece,” what I would probably have done is to first split the string on a comma, then pop the last entry off the resulting array. This will work whether-or-not there is actually a comma in the string, since in that case the array will contain only one entry. I would prefer this approach because it represents a literal interpretation of how you originally described your objective, and because it’s how I am accustomed to see this sort of thing being done most of the time. (The split function has many useful features – read the doc page in its entirety.) | [reply] [Watch: Dir/Any] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
by jwkrahn (Abbot) on May 29, 2018 at 18:26 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
what I would probably have done is to first split the string on a comma, then pop the last entry off the resulting array. As in:
But you don't really need an array to do that because you can get the last value directly from the list that split returns:
| [reply] [Watch: Dir/Any] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
by haukex (Archbishop) on May 30, 2018 at 13:32 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm all for TIMTOWTDI, but I wondered about performance. And it turns out that while a split version is faster for short strings, performance suffers a lot the more commas there are in the string:
Update: This version using rindex beats both split and the regex. | [reply] [Watch: Dir/Any] [d/l] |