Re: Regex result being defined when it shouldn't be(?)
by haukex (Archbishop) on Nov 14, 2017 at 15:36 UTC
|
I haven't fully evaluated or tested your code, but a couple of comments and, if I understood correctly, the answer to your question:
| [reply] [d/l] [select] |
|
|
P.p.s: After thinking about why I would've been using the quantifiers outside vs inside, separate from maybe capturing only one repetition of a group, I figured it out:
Alternations. If you wanted a word among multiple choices but only 0-1 times you have a sort of choices:
(this|that|third_thing)?
((this)?|(that)?|(third_thing)?)
The first one is pretty clear, I want 0 or 1 of any of those words. It will return undef if I have 0.
The second one, I don't even trust it. I think I could match all 3 if they happen in a row. Additionally, there's probably 4 capture groups created as a result.
A quick search on if I had used 'alternation' properly: https://docstore.mik.ua/orelly/perl4/prog/ch05_08.htm
"When you apply the ? to a subpattern that captures into a numbered variable, that variable will be undefined if there's no string to go there. If you used an empty alternative, it would still be false, but would be a defined null string instead." | [reply] [d/l] |
|
|
The second one, I don't even trust it. I think I could match all 3 if they happen in a row.
No, it's fine, it reads like so: Match one of the three choices: "this" or "", "that" or "", or "third_thing" or "". Just like in your first example, the parentheses and alternation operator make sure that it will match only one of the three choices at that place in the regex.
Additionally, there's probably 4 capture groups created as a result.
Correct, but you can use non-capturing (?: ) parens to avoid that, i.e. ((?:this)?|(?:that)?|(?:third_thing)?) would make it have only one capturing group, like your first example. <update> And AnomalousMonk made an excellent point about (?| ) here. </update>
I'd recommend a read of perlrequick, perlretut, and perlre for all of these features and the ones I mentioned earlier. Also, for playing around with regexes and testing out what they do, see my post here.
| [reply] [d/l] [select] |
|
|
|
|
|
|
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my $s = 'apathetic';
;;
my @captures = $s =~ m{ (pat) | (te) | (rn) }xms;
dd \@captures;
;;
@captures = $s =~ m{ (?| (pat) | (te) | (rn)) }xms;
dd \@captures;
"
["pat", undef, undef]
["pat"]
See Extended Patterns in perlre.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
I'm not doing something if it is defined, I'm doing it if it's NOT defined.
An annoyance to me (I come from a C background) is a variable failing to be defined does NOT return 0 or a 'FALSE' definition, it returns "". For safety reasons and explicitness, I program in the explicit results of tests i.e. defined $var eq "" or defined var ne "". Using simply 'defined $var' and '! defined $var' isn't as clear as what Perl is doing internally.
If I do print "$3" from a match on 'var = 10' I do not get the same as print "". Regex DO NOT return "" on failing to match, they return undef. After further testing, it appears the difference is where the quantifier comes in:
use strict;
use warnings;
my $string = "string";
if( $string m/([5]?)string/ ){
print "? inside group: $1\n"; #prints fine
}
if( $string m/([5])?string/ ){
print "? outside group: $1\n"; #Use of uninitialized value $1 in c
+oncatenation (.) or string...
}
return 0;
P.s. the reason I'm doing this manually is because I'm making it as portable as possible and sensible to me. I'm running Perl on Windows 7/8/10, modern Linux, a Debian 2.6.32, etc. Production environment with too many distributions, internal/external network, all that jazz. I already had an issue where a CPAN module I would've liked had some Linux-only make commands. | [reply] [d/l] |
|
|
An annoyance to me (I come from a C background) is a variable failing to be defined does NOT return 0 or a 'FALSE' definition, it returns "".
Actually, that's not exactly what is going on. Perl has a special "false" value that is 0 when used in numeric context and "" in a string context, so in Perl if (boolean) and if (!boolean) are actually "explicit" tests for truth and falsehood for functions that return "true" and "false" values (this applies to just about every builtin, of course there are some rare special cases). Have a look at Truth and Falsehood. Once you get used to this, I hope you'll find if (!defined(...)) (or any of its variants like if (not defined(...)) or unless (defined(...))) more natural. At least personally, I was initially confused when I read if ( defined($x = $1) eq "" ), and I thought you might accidentally be misapplying an idiom like if ( (my $x = $1) eq "foo" ) (which does the assignment and then the comparison).
If I do print "$3" from a match on 'var = 10' I do not get the same as print "". Regex DO NOT return "" on failing to match, they return undef. After further testing, it appears the difference is where the quantifier comes in:
Right, which is why I left your $3, that is (])?, out of my explanation, and explicitly referred to your $1 (([@%\$]?)), which you were asking about :-)
... portable ... I already had an issue where a CPAN module I would've liked had some Linux-only make commands.
According to CPAN Testers, Config::Perl runs on Linux, MSWin32, Cygwin, Darwin (Mac OS X), and various *BSD, and from Perl versions 5.8.1 thru 5.26.1.
Update 2019-08-17: Updated the link to "Truth and Falsehood".
| [reply] [d/l] [select] |
|
|
defined $var or equivalently defined($var) will return the integer 1 (which is a TRUE value; 1 is also TRUE in C, so this shouldn't confuse you) if the variable is defined. It will return undef (which is a FALSE value) a FALSE value (see haukex's answer) if the variable is undefined. You then take that value, either 1 or undefthe FALSE value, and stringify it. The integer 1 stringifies into "1". The FALSE value undef stringifies into "". If you don't want undef FALSE to become "", don't stringify. (The eq operator is forcing the stringification on both its arguments.)
If you really just want a boolean that decides whether the $var is defined or not, just use the truthiness of the result of defined $var -- that is explicitly the boolean test for whether the $var is defined, and the defined $var and !defined $var syntax are explicitly saying "variable is defined" and "variable is not defined". This is similar to C: if you define a function int is_five(int x) { return (x==5); }, then the return value of is_five(var) and !is_five(var) are explicit ways of testing whether or not the variable is 5. From your claim, in C, I would have to write is_five(var)==-1 to verify that var is 5, and is_five(var)==0 to verify that var is not 5, which I vehemently disagree with: that notation obfuscates what c is doing, not clarifies what it's doing internally. Just trust that Perl will do the right thing with boolean expressions in a boolean context, just like you trust that C does the right thing with boolean results in a boolean context.
if it's the lack of parentheses that are confusing you, then use the parentheses.
Aside: Urgh... I did one last refresh before hitting create, and saw that haukex beat me by a minute or two again. :-(. I went to all the trouble of writing this up, so I'll hit create anyway.
update: I was wrong: defined($var) doesn't return undef or 1; it returns the special value, as haukex said.
c:> perl -le "print defined($x)//'<undef>'; print defined($x)||'untrue
+'"
untrue
c:>
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
Re: Regex result being defined when it shouldn't be(?)
by choroba (Cardinal) on Nov 14, 2017 at 16:42 UTC
|
There are modules on CPAN that can help you building a parser. For example, you can use Marpa::R2 in the following way:
#!/usr/bin/perl
use warnings;
use strict;
use Marpa::R2;
my $dsl = << '__DSL__';
lexeme default = latm => 1
:default ::= action => ::first
Config ::= Assignment Config action => merge
| Assignment action => creat
+e_config
Assignment ::= Var (space equals space) Value (space) action => assig
+n
Value ::= number
| String
String ::= (quote) Quoteds (quote)
Quoteds ::= Quoted Quoteds action => conca
+t
| Quoted
Quoted ::= nonquote
| quotedquote action => quote
Var ::= Name
|| Array
Array ::= atsign Name Index action => name_
+index
Name ::= alpha alnum action => conca
+t
Index ::= (leftsquare) number (rightsquare)
space ~ [\s]*
alnum ~ [\w]+
alpha ~ [[:alpha:]]
atsign ~ '@'
equals ~ '='
leftsquare ~ '['
nonquote ~ [^']
number ~ [\d]+
quotedquote ~ '\'[']
quote ~ [']
rightsquare ~ ']'
__DSL__
sub concat { $_[1] . $_[2] }
sub name_index { [ $_[2], $_[3] ] }
sub quote { "'" }
sub assign { [ ref $_[1] ? @{ $_[1] } : $_[1], $_[2] ] }
sub merge {
my %config = %{ $_[-1] };
(2 == @{ $_[1] } ? $config{ $_[1][0] } : $config{ $_[1][0] }[ $_[1
+][1] ])
//= $_[1][-1];
return \%config
}
sub create_config {
my %config;
$config{ $_[1][0] }
= @{ $_[1] } == 2 ? $_[1][1]
: do {
my $ar = [];
$ar->[ $_[1][1] ] = $_[1][2];
$ar
};
\%config
}
my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl});
my $input = do { local $/; <DATA> };
my %config = %${ $grammar->parse(\$input, 'main') };
use Data::Dumper; print Dumper \%config
__DATA__
@arr[2] = 3
str = 'xyz'
@arr[2] = 5
str = 'abc\'d'
@arr[1] = '#+#:'
Output:
$VAR1 = {
'str' => 'abc\'d',
'arr' => [
undef,
'#+#:',
'5'
]
};
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |