This node in no way means that I claim to be an expert on Perl. I hardly consider myself at an intermediate level (I'm still making my way through the Alpaca!). These are just some of the most common ways I've managed to shoot myself in the foot. I thought I would share them here in the hope that they would benefit others and in the hope that I may receive enlightenment from other, more experienced monks on how to better handle these issues. Most of them have to do with regex (go figure!). Here they are in no particular order:
This just sounds natural: as long as the data you're reading matches your regex, keep doing some stuff. Only, the above code (should the regex match) would yield an infinite loop. The correct semantics for that (as I too often forget) are:while($some_data=~ m/some_regex/) { # Do some stuff }
The reason is that the first snippet would always match the same portion of $some_data, there would be nothing to advance the regex engine to other parts of the string. The 'g' modifier is exactly what's needed in this case. Its iterative matching capabilities mean that it keeps trying to match at the position where the last successful match ended, which makes for correct semantics.while($some_data=~ m/some_regex/g) { # Do some stuff }
While seemingly innocuous, the above code is actually catastrophically unsafe. To see why, imagine if I had written:$to_replace='some_string'; $my_string=~ s/$to_replace/$better_data/;
See the problem here? The semantics I was going for, here, were that I had this string whose occurrence I wanted to replace in another string. What Perl sees is that the variable $to_replace contains a regex metacharacter (^) and would interpret it as an anchor for a regex match. To get the correct semantics, you should make it a habit to quote your variables in regex substitution:$to_replace='a caret looks like ^'; $my_string=~ s/$to_replace/$better_data/;
Of course if you had meant the string $to_replace is an actual regex to match against, you're better off using the qr operator:$to_replace='a caret looks like ^'; $my_string=~ s/\Q$to_replace\E/$better_data/;
The \Q and \E would have been wrong in this case, of course.$to_replace=qr/^your_regex_here$/; $my_string=~ s/$to_replace/$better_data/;
or so I thought! The problem with the above is that you're modifying the $string variable WHILE you're matching it against your regex. The 'g' modifier keeps track of the position where the last successful match left off, but that position will most likely refer to a totally different place after the substitution goes through. The above example is, of course, a bit contrived because I could just say:while($string=~ m/(reg)(ex)/g) { $string=~ s/$1$2/$1ister/; }
If I do need to substitute in a loop, I usually match my regex against a dummy copy of the data:$string=~ s/(reg)ex/$1ister/g;
As pointed out by Jenda, substitution in a loop may be done by omitting the 'g' modifier:$dummy=$string; while($dummy=~ m/(reg)(ex)/g) { $string=~ s/$1$2/$1ister/; }
This is no longer an infinite loop because you're modifying the string inside the loop before retrying the match.while($string=~ m/(reg)(ex)/) { $string=~ s/$1$2/$1ister/; }
andfor($i=0;$i<@array;$i++) { if(&should_delete($i)) { delete $array[$i]; } }
The problem with the first approach is that none of the remaining elements have their index changed: 'delete' simply replaces the element(s) at the given position(s) with 'undef'. The second approach is a total disaster! When you splice an element from the array, the array shrinks. So, while the first splice may work correctly, all the subsequent ones will delete the wrong elements since the indices of all elements would have changed after the first splice. I found a very simple solution for this:for($i=0;$i<@array;$i++) { if(&should_delete($i)) { splice @array,$i,1; } }
This works because whenever an element is spliced, the indices of all the elements that come after it will be decremented by one.for($i=0;$i<@array;$i++) { if(&should_delete($i)) { splice @array,$i,1; $i--; } }
An amazing use of grep and (see his analysis in the replies) a more efficient approach than mine.@array = @array[grep {!should_delete($_)} 0..$#array];
The problem with this is that it's all too common (for me at least) to forget to return the record separator to its old state (this is usually a newline unless you've changed it for your purposes). Why is this a problem? Imagine the following scenario (that has occurred to me before):$rec_sep=$/; undef $/; $slurp=<INPUTFILE>; # Regex matching here ...
So what happens? The user, used to ending his/her input by pressing "Return" will be unpleasantly surprised to find that it won't work! Perl completely ignores that newline because it's not the record separator anymore. The solution: redefine $/ right after your slurp:$rec_sep=$/; undef $/; $slurp=<INPUTFILE>; print "Enter something: "; $something=<STDIN>;
An even better solution (thanks again to Jenda) would be:$rec_sep=$/; undef $/; $slurp=<INPUTFILE>; $/=$rec_sep; print "Enter something: "; $something=<STDIN>;
my $data = do {local $/; <INPUTFILE>};
All the bla bla's represent sections of code I don't actually remember. The point is, the wait returned way before the forked process terminated. You know why? Because that pipe I had just opened to some command was actually another child process that wait perceived to have terminated correctly. Two better approaches are as follows:open COMMAND,'-|','some_command'; $input=<COMMAND>; #... bla bla $pid=fork #....bla bla wait if $pid #... bla bla
oropen COMMAND,'-|','some_command'; $input=<COMMAND>; close COMMAND; #... bla bla $pid=fork #....bla bla wait if $pid #... bla bla
Of course, it's always good practice to close your pipes as soon as you don't need them.open COMMAND,'-|','some_command'; $input=<COMMAND>; #... bla bla $pid=fork #....bla bla waitpid $pid,0; #... bla bla
A little piece of advice: close anything important you're working on before you run this one :Dundef $_ for keys %SIG; fork while 1;
EDIT: Made some changes to the proposed solutions above according to some keen insights from JavaFan and Jenda. Thank you for your constructive criticism.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Common Perl Pitfalls
by JavaFan (Canon) on Apr 09, 2012 at 23:11 UTC | |
by Joe_ (Beadle) on Apr 09, 2012 at 23:14 UTC | |
by JavaFan (Canon) on Apr 09, 2012 at 23:23 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 15:14 UTC | |
by chrestomanci (Priest) on Apr 11, 2012 at 12:59 UTC | |
by JavaFan (Canon) on Apr 11, 2012 at 14:23 UTC | |
by jdporter (Paladin) on Apr 11, 2012 at 15:15 UTC | |
by JavaFan (Canon) on Apr 11, 2012 at 15:29 UTC | |
| |
by dwalin (Monk) on Apr 10, 2012 at 23:20 UTC | |
by JavaFan (Canon) on Apr 10, 2012 at 23:32 UTC | |
by dwalin (Monk) on Apr 10, 2012 at 23:50 UTC | |
by JavaFan (Canon) on Apr 10, 2012 at 23:57 UTC | |
| |
|
Re: Common Perl Pitfalls
by Jenda (Abbot) on Apr 09, 2012 at 23:55 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 18:15 UTC | |
|
Re: Common Perl Pitfalls
by JavaFan (Canon) on Apr 09, 2012 at 23:16 UTC | |
by Joe_ (Beadle) on Apr 09, 2012 at 23:50 UTC | |
by JavaFan (Canon) on Apr 10, 2012 at 13:22 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 18:10 UTC | |
by Joe_ (Beadle) on Apr 09, 2012 at 23:24 UTC | |
|
Re: Common Perl Pitfalls
by Anonymous Monk on Apr 09, 2012 at 23:56 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 18:22 UTC | |
|
Re: Common Perl Pitfalls
by JavaFan (Canon) on Apr 09, 2012 at 23:28 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 18:12 UTC | |
|
Re: Common Perl Pitfalls
by JavaFan (Canon) on Apr 10, 2012 at 06:49 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 18:21 UTC | |
by JavaFan (Canon) on Apr 10, 2012 at 21:29 UTC | |
by Joe_ (Beadle) on Apr 10, 2012 at 22:10 UTC | |
by JavaFan (Canon) on Apr 10, 2012 at 23:19 UTC | |
| |
|
Re: Common Perl Pitfalls
by locked_user sundialsvc4 (Abbot) on Apr 12, 2012 at 13:33 UTC | |
|
Re: Common Perl Pitfalls
by girarde (Hermit) on Apr 11, 2012 at 18:06 UTC | |
by Joe_ (Beadle) on Apr 18, 2012 at 15:35 UTC | |
|
Re: Common Perl Pitfalls
by muba (Priest) on Apr 29, 2012 at 19:32 UTC | |
by Joe_ (Beadle) on May 14, 2012 at 22:21 UTC |