in reply to Re^24: search and replace strings in different files in a directory
in thread search and replace strings in different files in a directory

Dear Athanasius

Thanks a mil for posting your regular expression.

It is quite funny, since that line that caught your interest was no longer part of the code, I must have posted this particular version by accident.

However, this will surely resolve some issues to come.

Your help is much appreciated

Thanks a mil again, I will bookmark the extended regex patterns, I am sure I might be needing them soonish

Kind regards

C.
  • Comment on Re^25: search and replace strings in different files in a directory

Replies are listed 'Best First'.
Re^26: search and replace strings in different files in a directory
by PitifulProgrammer (Acolyte) on Sep 12, 2014 at 10:43 UTC

    Dear Monks

    As promised last time, I ran the code using new files that needed to be checked at work

    Given the previous comments, examples and the lovely testing script one of you provided, I found out that some of the directories had not been touched due to unicode characters (mostly umlauts) and whitespace in the actual file name.

    I went through previous posts in the forum and the web checking for answers. One suggestion was using the Encode module for the file names I am reading from the text file.

    I took a look at the module, but I am a bit at a loss how to implement the module in the subroutines and modules which are already in use. I assume that the GetPaths subs needs some editing, provided I am on the right track.

    Would be grand if you guys could give me a hint on a) whether Encode is the right module for reading umlaut-and-whitespace-packed file names. b) if not => any other solution to the issue

    Thanks a mil in advance

    Kind regards

    C
    #!/usr/bin/perl -- use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Encode qw(encode decode); Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_paths.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace

        Dear Anonymous Monk(s)

        I recently went back to the script and was trying to implement the changes you kindly provided. Unfortunately, I did not succeed in resolving the issue with umlauts.

        I think this is mostly due to the fact that I do not understand what the snippet you referred to really does and how it can be integrated into the present script.

        This is the snippet from the other post.

        #!/usr/bin/perl -- BEGIN { if ( eval { require Win32; 1 } ) { require ex::override; require Win32::Unicode::Native; ex::override->import( GLOBAL_stat => sub (;*) { &Win32::Unicode::Native::stat }, GLOBAL_lstat => sub (;*) { &Win32::Unicode::Native::stat } +, map({ my $name = $_; my $prototype = prototype("CORE::$name"); "GLOBAL_$name" => eval "sub($prototype){&Win32::Un +icode::Native::$name}"; } qw/ chdir link mkdir open readlink rename rmdir symlink unlink utime closedir opendir readdir /, ) ); } } use Path::Tiny qw/ path /; for my $drive ( @drives ){ my @picdirs = grep /^\d{4}_\d{2}_\d{2}$/, eval { path( $drive, $dir )->children }; if( @picdirs ){ ...; } }

        I can see in the sample code from the pathnames_under_windows_8 post that path tiny is called, but I have been unable to decipher the code above and how it should affect path::tiny

        Moreover, I was wondering when you mentioned that Tiny had already done the decoding for me. If so, why are there still problems with umlauts, is that OS-related.

        I am asking because I figure understanding the underlying problem might give me an insight about how to resolve this. I know I am moving in a more or less theoretical discussion, but it might help.

        Your insight and comments are much appreciated. Sorry in advance for all the rookie questions (well there is so much to learn and it is fun).

        Looking forward to your reply.

        Kind regards

        C.

        PS: I guess it helps re-posting the current version of the replace script I am talking about

        use 5.014; use strict; use warnings; use Path::Tiny qw/ path /; use POSIX(); use autodie qw/ close /; use File::BOM; use Carp::Always; use Data::Dump qw/ dd /; use Encode qw(encode decode); Main( @ARGV ); exit( 0 ); sub Main { #my( $infile_paths ) = @_; #if run via my( $infile_paths ) = 'C:\dev\test_path_names.txt'; chomp $infile_paths; my @paths = GetPaths( $infile_paths ); for my $path ( @paths ){ RetrieveAndBackupXML( $path ); } return @paths; } ## end sub Main sub GetPaths { use File::BOM; ## my @paths = path( shift )->lines_utf8; my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" } + ); s/\s+$// for @paths; # "chomp" return @paths; } ## end sub GetPaths sub RetrieveAndBackupXML { my( $directory ) = shift; ## same as shift @_ ## my $date = POSIX::strftime( '%Y-%m-%d', localtime ); #suffix + for the backup-file, e.g. 2014-08-01 my $bak = "$date.bak"; my @xml_files = path( $directory )->children( qr/\.xml$/ ); for my $file ( @xml_files ) { Replace( $file, "$file-$bak" ); } } ## end sub Main # Fix xml entities and create a copy of the original file before editi +ng sub Replace { my( $in, $bak ) = @_; path( $in )-> copy( $bak ); #create a copy of $in with the ending( +s) specified in $bak my $infh = path( $bak )->openr_raw; my $outfh = path( $in )->openrw_raw; while( <$infh> ) { s{&}{&amp;}g; ## In some case does not match as intended s{\s>\s}{&gt;}g; s{\s<\s}{&lt;}g; print $outfh $_; } close $infh; close $outfh; } ## end sub Replace