Re: Regex stored in a scalar
by Laurent_R (Canon) on Aug 21, 2015 at 19:26 UTC
|
s/\),\(/\)\n\(/
is a substitution operator, it is not a regex. Only the part between the first two slashes is a regex, not the rest.
So if you want to make substitutions you probably want to capture two inputs from the user or from the command line, the searched pattern and the substitution (not tested).
my $regex = <STDIN>;
chomp $regex;
my $subst = <STDIN>;
chomp $subst;
while (<INPUTFILE>) {
my $line = $_;
s/$regex/$subst/gi;
}
And for a big file, you might want to add the o modifier, it may be faster (but that may depend on your version of Perl).
Update: Oh, BTW, if you're used to do it in vi, you might consider sed. You should feel at home.
Update 2: I crossed out the first part of my answer, as it was at least very incomplete, as kindly pointed out by AnomalousMonk. Only the second part was really relevant to the OP problem.
| [reply] [d/l] [select] |
|
|
It would be better to quote the pattern with qr// rather than using /o. Like this:
$regex = qr/$regex/
| [reply] [d/l] |
|
|
Yes, you're probably right, ++, this was just a quick additional note for speed, not much to to with the OP question.
| [reply] |
|
|
c:\@Work\Perl>perl -wMstrict -le
"my $regex = '(?xms) ((.) \2{2,})';
;;
for my $s (qw(aeiou aeeiou aeiiiou aeioooou)) {
print qq{match: captured '$1'} if $s =~ $regex;
}
"
match: captured 'iii'
match: captured 'oooo'
... but that won't work ...
The first comment I made is actually rather trivial in the face of your second point; the substitution s/\),\(/\)\n\(/ is, indeed, a substitution and not a regex — and bang, the whole endeavor hits a brick wall.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
Yes, AnomalousMonk, you are right ++, and I actually knew that what I was saying was not quite right, not that I was thinking to you said in your comments, but I was thinking that you can have a regex in the form:
if (m{pattern}) { # ...
or many other delimiter pairs. I really wanted to get that point out of the way quickly to get to the real thing about the substitution not being a regex, so that I was a bit negligent in the way I wrote that first part.
You're absolutely right, the first part of my comment stood for correction.
| [reply] [d/l] |
|
|
sed would likely be wicked fast for this, too.
| [reply] [d/l] |
|
|
It depends on the quality of the sed implementation (and also the Perl version). I have seen cases where Perl was 2 to 5 times faster than either sed or awk (I don't remember for sure from which vendor), although this was more than 10 years ago. In the more recent tests I made (but with rather old OS), there was no significant difference and it would also depend on the complexity of the processing being applied.
I think that, in general, tests are required to decide the best way to go (if it matters at all, e.g. if your files to be processed are really so large that it will make a significant difference for you).
| [reply] [d/l] [select] |
|
|
I do like your sed suggestion. I have a snippet I've used for 20 years to either process 1 or many files, and it's fast and low load. I've changed 10's of thousands of files on a server in mear moments (after extensive testing of course!! to save restoring)
This command will find and replace the string 'old' with 'new' in all files with the htm/html extension recursively from where you run the command. be careful, there's no undo! Use your regex as usual. Hopefully someone will find this snippet useful, I sure have 1000's of times!
find . -name '*.htm*' -type f | xargs sed -i 's/old/new/g'
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
True, it won't work this way, but this quick test under the Perl debugger show that it might be feasible with a slight syntax tweak and one further step to process it. Here, I am passing the replacement string with \n to my debugger session:
$ perl -de 42 foo\\n
Loading DB routines from perl5db.pl version 1.33
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): 42
DB<1> $c = shift;
DB<2> x $c
0 'foo\\n'
DB<3> $c =~ s|\\n|\n|;
DB<4> x $c
0 'foo
'
DB<5>
So, this seems to work, although it might not be the most elegant construct. Of course, it also assumes that you know when writing your program beforehand, you might need some newline characters to be reprocessed.
| [reply] [d/l] [select] |
|
|
|
|
Re: Regex stored in a scalar
by Anonymous Monk on Aug 21, 2015 at 19:36 UTC
|
If the only thing you're doing inside the loop is one or two regex substitutions, and the way you describe it, the scripts sound like they're throwaways, you may want to look at the -e, -p and maybe also -i switches in perlrun, i.e. write one-liners:
$ cat foo.txt
one
two
three
$ perl -wMstrict -pe 's/^t(?!h)/th/; s/(.)\1/$1/g' foo.txt > bar.txt
$ cat bar.txt
one
thwo
thre
$ perl -wMstrict -pe 's/th/ph/g' -i.bak bar.txt
$ cat bar.txt
one
phwo
phre
$ cat bar.txt.bak
one
thwo
thre
| [reply] [d/l] [select] |
Re: Regex stored in a scalar
by BillKSmith (Monsignor) on Aug 22, 2015 at 03:43 UTC
|
You can execute your substitution on the default string $_.
use strict;
use warnings;
my $string = '(abc),(def),(ghi)';
my $substitution = 's/\),\(/\)\n\(/gi';
$_ = $string;
eval "$substitution";
print;
OUTPUT:
(abc)
(def)
(ghi)
| [reply] [d/l] [select] |
|
|
Since my original reply, I have discovered that my concept of "build the command and evaluate it" can be generalized to meet your original requirement.
use strict;
use warnings;
my $regex=<STDIN>; #Entering s/\),\(/\)\n\(/gi
chomp $regex;
open (INPUTFILE, "< $filein");
while (<INPUTFILE>) {
my $line=$_;
#$line =~ $regex;
eval "\$line =~ $regex";
};
| [reply] [d/l] |
Re: Regex stored in a scalar
by atcroft (Abbot) on Aug 22, 2015 at 06:15 UTC
|
I wanted to do something similar to this recently, but with the left and right-hand patterns stored in a database. The problem I ran into, however, was if I tried to use capture variables, such as the following (contrived) example:
Any suggestions?
| [reply] [d/l] [select] |
|
|
#! perl
use strict;
use warnings;
my $c = q{asdfghjk};
my @regex =
(
{ lh => q{(gh)}, rh => q{__$1__}, },
{ lh => q{(h_)}, rh => q{_h!$1!}, },
);
print q{Original: }, $c, "\n";
for my $i (0 .. $#regex)
{
if ($c =~ /$regex[$i]{lh}/)
{
my $s = $1;
my $d = $regex[$i]{rh};
$d =~ s/\$1/$s/;
$c =~ s/$regex[$i]{lh}/$d/;
}
}
print q{Final: }, $c, "\n";
Output:
17:37 >perl 1352_SoPW.pl
Original: asdfghjk
Final: asdf__g_h!h_!_jk
17:39 >
This is far from elegant, and I keep thinking there must be a simpler way involving s///ee — but I haven’t found it.
Anyway, hope that helps,
| [reply] [d/l] [select] |
|
|
c:\@Work\Perl>perl -wMstrict -le
"my $c = q{asdfghjk};
print qq{ original: '$c'};
;;
my @regex = (
{ lh => q{(gh)}, rh => q{__$1__}, },
{ lh => q{(h_)}, rh => q{_h!$1!}, },
);
;;
for my $hr_s (@regex) {
$c =~ s[ (?-x)$hr_s->{lh}]{ qq{qq{$hr_s->{rh}}} }xmsgee;
print qq{intermediate: '$c'};
}
;;
print qq{ final: '$c'};
"
original: 'asdfghjk'
intermediate: 'asdf__gh__jk'
intermediate: 'asdf__g_h!h_!_jk'
final: 'asdf__g_h!h_!_jk'
Since s///e or s///ee is string eval, AnonyMonk's warning/advice here still holds. See Re: Evaluating $1 construct in literal replacement expression and associated nodes for more discussion.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
$_ = 'foo';
$left = '(.)(.)';
$right = '$1$2$2$1';
s{$left}{"qq{$right}"}ee;
print "$_\n";
s{$left}{eval "qq{$right}"}e;
print "$_\n";
__END__
foofo
foofofo
first /e turns "" into a string qq{$1$2$2$1}
second /e interpolates qq{$1$2$2$1} at the correct time and substitutes into the original string
string eval is eval so arbitrary code could be executed
So, to make it safer, instead of eval ... use some form of String::Interpolate/String::Interpolate::RE | [reply] [d/l] |
|
|
Thanks Anonymous Monk and AnomalousMonk,
So, the technique is to doubly double-stringify the RHS before doubly evaluating it! Analogous to the trick of using @{ [...] } to interpolate a function-returned list into a string.
I like String::Interpolate (the module, not its documentation!):
#! perl
use strict;
use warnings;
use String::Interpolate qw( interpolate );
my $c = q{asdfghjk};
my @regex =
(
{ lh => q{(gh)}, rh => q{__$1__}, },
{ lh => q{(h_)}, rh => q{_h!$1!}, },
);
print q{Original: }, $c, "\n";
for my $i (0 .. $#regex)
{
$c =~ s/ $regex[$i]{lh} / interpolate($regex[$i]{rh}) /ex;
}
print q{Final: }, $c, "\n";
Output:
13:37 >perl 1352_SoPW.pl
Original: asdfghjk
Final: asdf__g_h!h_!_jk
13:37 >
Cheers,
| [reply] [d/l] [select] |
Re: Regex stored in a scalar
by 1nickt (Canon) on Aug 21, 2015 at 18:57 UTC
|
$line =~ /$regex/;
The way forward always starts with a minimal test.
| [reply] [d/l] |
Re: Regex stored in a scalar
by james28909 (Deacon) on Aug 21, 2015 at 19:43 UTC
|
EDIT: Updated script.
This works for me:
use strict;
use warnings;
print "Enter left side for search: ";
my $LeftSide = <STDIN>;
print "Enter right side replacement: ";
my $RightSide = <STDIN>;
chomp($LeftSide);
chomp($RightSide);
while(<DATA>){
print if s/$LeftSide/$RightSide/g;
#print "Replaced \"$LeftSide\" with \"$RightSide\" at: line $.: $_
+" if ($_ =~ s/$LeftSide/$RightSide/g);
}
__DATA__
hello
this. is line 2
line, 3 ),(
this is test for line 4 )
testing line ,5
now testing line (6)
ran it with script.pl \),\( \)\n\(
Outputs:
C:\Users\James\Desktop>test.pl \),\( \)\n\(
Replaced "\),\(" with "\)\n\(" at: line 3 "line, 3 \)\n\("
Posting some example input would be helpful :) | [reply] [d/l] [select] |
|
|
Hum, I was just going to tell something to the effect that your script was not very useful, but as I hit the reply button, I just saw your edited version. This is indeed much more to the point.
| [reply] |
|
|
I tested it then noticed that he was indeed trying to search and replace. Did a ninja edit ; I was also about to change ARGV's to <STDIN>'s and chomp them as well.
| [reply] |
|
|
__DATA__
(123),(456),(789)
(abc),(def),(ghi)
the result should be
(123)
(456)
(789)
(abc)
(def)
(ghi)
poj
| [reply] [d/l] [select] |
|
|
I was able to get that output by changing the script like:
print $file $_ if s/$LeftSide/$RightSide/eegi;
And then left side is just a comma ',' without the quotes and right side is "\n" WITH the quotes
EDIT: Using above little snippet, Use left side as \),\( and right side as "\)\n\(". And from what I understand the above snippet is the same as doing:
print $_ if s/$LeftSide/eval $RightSide/egi;
So I also suggest reading PerlDoc: eval.
Heres another more hackish way to get the job done haha:
Create your main script like so, with keywords in the s///
# main.pl
# all this is just a template that creates "run.pl"
use strict;
use warnings;
while(<DATA>){
print $_ if s/search_here/replace_here/gi;
}
__DATA__
(123),(456),(789)
(abc),(def),(ghi)
Then this following script will search and replace the keywords "search_here" and "replace_here" in the script above, with whatever you input and put it in "run.pl"!
# prepare_run.pl
open my $file, '+<', 'main.pl'; #your original script we will replace
+keywords
open my $run, '+>', 'run.pl'; #newly created script that we will execu
+te below
print "Enter left side of s///: ";
chomp(my $LeftSide = <STDIN>);
print "Enter right side of s///: ";
chomp(my $RightSide = <STDIN>);
while(my $line = <$file>){
print $run $line if $line !~ /.*search_here.*/ || /.*replace_here.*/;
print $run $line if $line =~ s/(.*)search_here(.*)/$1$LeftSide$2/ &&
+ $line =~ s/(.*)replace_here(.*)/$1$RightSide$2/;
}
close($file);
close($run);
system("run.pl"); #or whatever the the equivalent of your OS.
Here is the script that the above will create and run:
# run.pl, will be created after running "prepare_run.pl" while using "
+main.pl" as a template.
use strict;
use warnings;
while(<DATA>){
print $_ if s/\),\(/\)\n\(/gi;
}
__DATA__
(123),(456),(789)
(abc),(def),(ghi)
Download all the above and then just run "prepare_run.pl" and it will copy lines from main.pl while replacing keywords with your regex from STDIN and put it all in run.pl for execution. You can use \),\( and \)\n\( for STDIN per normal without using any quotes.
Here is the output:
(123)
(456)
(789)
(abc)
(def)
(ghi)
| [reply] [d/l] [select] |
|
|
|
|
Re: Regex stored in a scalar
by anonymized user 468275 (Curate) on Aug 24, 2015 at 14:08 UTC
|
Feeding the regex in via STDIN seems a bit clunky. What about using command line options, e.g. -s "regex" -r "replacement" and see getopt for a wealth of option parsers to load in their arguments.
| [reply] |
Re: Regex stored in a scalar
by OtakuGenX (Initiate) on Aug 25, 2015 at 17:08 UTC
|
OK I ended up going with s/$search/$replace/g . I guess I had hoped to do it the other way simply for ease.
Thank you ALL you comments are AWESOME and help a TON!!! | [reply] |