in reply to Re^23: search and replace strings in different files in a directory
in thread search and replace strings in different files in a directory

I notice you have this:

while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{&amp;amp;}{&amp;}g; ... }

presumably because, when the input line already contains &amp;, the first substitution changes it to &amp;amp;, so the second substitution is needed to change it back again! Better to replace these two substitutions with a single substitution using a negative look-ahead assertion (?!...). Proof-of-concept:

14:25 >perl -wE "my @s = ('Fred & Wilma', 'Barney &amp; Betty'); for ( +@s) { s{&(?!amp;)}{&amp;}g }; say for @s;" Fred &amp; Wilma Barney &amp; Betty 14:25 >

See “Look-Around Assertions” in perlre#Extended-Patterns.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^25: search and replace strings in different files in a directory
by PitifulProgrammer (Acolyte) on Sep 10, 2014 at 11:02 UTC

    Dear Athanasius

    Thanks a mil for posting your regular expression.

    It is quite funny, since that line that caught your interest was no longer part of the code, I must have posted this particular version by accident.

    However, this will surely resolve some issues to come.

    Your help is much appreciated

    Thanks a mil again, I will bookmark the extended regex patterns, I am sure I might be needing them soonish

    Kind regards

    C.

      Dear Monks

      As promised last time, I ran the code using new files that needed to be checked at work

      Given the previous comments, examples and the lovely testing script one of you provided, I found out that some of the directories had not been touched due to unicode characters (mostly umlauts) and whitespace in the actual file name.

      I went through previous posts in the forum and the web checking for answers. One suggestion was using the Encode module for the file names I am reading from the text file.

      I took a look at the module, but I am a bit at a loss how to implement the module in the subroutines and modules which are already in use. I assume that the GetPaths subs needs some editing, provided I am on the right track.

      Would be grand if you guys could give me a hint on a) whether Encode is the right module for reading umlaut-and-whitespace-packed file names. b) if not => any other solution to the issue

      Thanks a mil in advance

      Kind regards

      C
      #!/usr/bin/perl -- use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Encode qw(encode decode); Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace