in reply to Re^22: search and replace strings in different files in a directory
in thread search and replace strings in different files in a directory

Dear all

This is my final (slightly anonymised version) of the code, which is working for me as intended.

#!/usr/bin/perl -- use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{&amp;amp;}{&amp;}g; s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace
  • Comment on Re^23: search and replace strings in different files in a directory
  • Download Code

Replies are listed 'Best First'.
Re^24: search and replace strings in different files in a directory
by Athanasius (Archbishop) on Sep 10, 2014 at 04:35 UTC

    I notice you have this:

    while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{&amp;amp;}{&amp;}g; ... }

    presumably because, when the input line already contains &amp;, the first substitution changes it to &amp;amp;, so the second substitution is needed to change it back again! Better to replace these two substitutions with a single substitution using a negative look-ahead assertion (?!...). Proof-of-concept:

    14:25 >perl -wE "my @s = ('Fred & Wilma', 'Barney &amp; Betty'); for ( +@s) { s{&(?!amp;)}{&amp;}g }; say for @s;" Fred &amp; Wilma Barney &amp; Betty 14:25 >

    See “Look-Around Assertions” in perlre#Extended-Patterns.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Dear Athanasius

      Thanks a mil for posting your regular expression.

      It is quite funny, since that line that caught your interest was no longer part of the code, I must have posted this particular version by accident.

      However, this will surely resolve some issues to come.

      Your help is much appreciated

      Thanks a mil again, I will bookmark the extended regex patterns, I am sure I might be needing them soonish

      Kind regards

      C.

        Dear Monks

        As promised last time, I ran the code using new files that needed to be checked at work

        Given the previous comments, examples and the lovely testing script one of you provided, I found out that some of the directories had not been touched due to unicode characters (mostly umlauts) and whitespace in the actual file name.

        I went through previous posts in the forum and the web checking for answers. One suggestion was using the Encode module for the file names I am reading from the text file.

        I took a look at the module, but I am a bit at a loss how to implement the module in the subroutines and modules which are already in use. I assume that the GetPaths subs needs some editing, provided I am on the right track.

        Would be grand if you guys could give me a hint on a) whether Encode is the right module for reading umlaut-and-whitespace-packed file names. b) if not => any other solution to the issue

        Thanks a mil in advance

        Kind regards

        C
        #!/usr/bin/perl -- use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Encode qw(encode decode); Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace