Re: substitution in regular expression
by AnomalousMonk (Archbishop) on Apr 23, 2014 at 19:59 UTC
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = 'ABCDEF';
;;
my @triplets = $s =~ m{ (?= (...)) }xmsg;
printf qq{'$_' } for @triplets;
"
'ABC' 'BCD' 'CDE' 'DEF'
If you want to simultaneously do substitutions to change the match string so that it ends up as 'DEF' or 'EF', that's trickier (at least, it's tricky to do with a single substitution operation), but I'm assuming substitution is just an artifact of the potential approach you happened to come up with, i.e., it's an XY Problem. Please advise on this point.
Update: See Re^3: substitution in regular expression for a string-modifying s/// solution.
| [reply] [d/l] [select] |
|
|
Thanks for the help, it's ok to modify the string. I would like to do it in the way I described if it's possible.
| [reply] |
|
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = 'ABCDEF';
;;
my @triplets;
$s =~ s{ (?= (...)) . }{ push @triplets, $1; ''; }xmsge;
;;
print qq{'$s'};
printf qq{'$_' } for @triplets;
"
'EF'
'ABC' 'BCD' 'CDE' 'DEF'
Update: Here's another s/// solution. I'm not sure I like it so much: I'm always suspicious, perhaps without cause, of code embedded in a regex. In addition, the @triplets array must be a package global (ideally local-ized) due to a bug in lexical management that wasn't fixed until Perl version 5.16 or 5.18 (I think — I don't have access to these versions and I'm too lazy to check the on-line docs).
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = 'ABCDEF';
;;
local our @triplets;
$s =~ s{ (?= (...) (?{ push @triplets, $^N })) . }''xmsg;
;;
print qq{'$s'};
printf qq{'$_' } for @triplets;
"
'EF'
'ABC' 'BCD' 'CDE' 'DEF'
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
|
|
|
Re: substitution in regular experssion
by Anonymous Monk on Apr 23, 2014 at 19:54 UTC
|
You've placed code (push(@dic,$1);) inside the replacement part of the regular expression.
Each three-letter code will be replaced by the string "push(@dic,$1);" instead of the code being executed,
and because of that the string will never get shorter.
Even though you could get the code to execute by adding the /e modifier on the regex, it still wouldn't do what you want (since the replacement value would be the return value of the push call), and so it's better to just move that code outside the regular expression.
Since you're matching three letters with your regular expression, and you want to replace those with the last two of those three letters, it's easier to just write it that way:
while(length($line)>2){
$line =~ s/([A-Z]([A-Z]{2}))/$2/;
push(@dic, $1);
}
I'm sure other monks will have (TI)MTOWTDI and more elegant solutions, but the above gets what you want with only a few changes. | [reply] [d/l] [select] |
|
|
| [reply] |
|
|
| [reply] |
|
|
It seems you've edited your node to remove the while loop you originally had. Please don't do that without marking your updates because it confuses things, now monks won't know which version of your question to answer.
... it doesn't seem to work
In what way? Do you get an error, or are you seeing unexpected results? Because it works for me:
use Data::Dumper;
print Dumper([build_dictionnary()]);
sub build_dictionnary{
my $line="ABCDEF";
my @dic;
while(length($line)>2){
$line =~ s/([A-Z]([A-Z]{2}))/$2/;
push(@dic, $1);
}
return @dic;
}
# Output (whitespace compressed):
# $VAR1 = [ 'ABC', 'BCD', 'CDE', 'DEF' ];
I would like to know the way to do it without the while loop.
Why?
| [reply] [d/l] |
|
|
|
|
|
|
|
Re: substitution in regular expression
by Laurent_R (Canon) on Apr 23, 2014 at 21:37 UTC
|
Regex might not be the best way. And don't make mistakes on using loops or not. When you use the s///g operator (i.e. with the g modifier), you are in effect doing an implicit loop, even if it does not appear to be the case. Just as when you are using the grep or the map function, it may look as you are not looping on the source list or array, but you are just doing an implicit loop in that case (and the explicit loop of a for/foreach solution might often be actually slightly quicker).
All this to introduce the fact that I will propose a rather concise solution with an explicit loop in the following Perl one-liner:
$ perl -le 'my $s = "ABCDEF"; print substr $s, $_, 3 for 0..length($s)
+-3;'
ABC
BCD
CDE
DEF
I did not check, but it is likely to be faster that any regex on large data input. Check it and tell your teacher about your findings on the various solutions, you might get an A+.
| [reply] [d/l] [select] |
|
|
Thanks for the additional idea and explanations. Good to see you have a sense of humor as well ;)
| [reply] |
Re: substitution in regular expression
by trizen (Hermit) on Apr 23, 2014 at 22:15 UTC
|
One-line solution:
"ABCDEF" =~ /([A-Z]{3})(?{print "$1\n"})(?!)/; | [reply] [d/l] |
|
|
Thanks for the idea, I'll write it down. Just one thing, could you explain the:
(?{print "$1\n"})(?!)
I don't understand the question mark before the print block. Also why the (?!) at the end. I noticed that removing it only prints ABC, but I don't understand why.
Thank you | [reply] [d/l] |
|
|
Short explanation: (?{...}) means to execute arbitrary Perl code inside a regular expression, and (?!) makes the regex engine to fail and backtrack, trying to match from the last_pos + 1. When it starts matching ABC, it prints it, fails, backtracks and starts matching from B the next three letters, giving us BCD. The process repeats until the internal regex counter reaches the end of the string.
I know, I'm really bad at explaining things to humans, but, fortunately, Athanasius explained this better once.
Please see: Re: RegEx + vs. {1,}
| [reply] [d/l] [select] |
|
|