comment on

Dear Monks

As promised last time, I ran the code using new files that needed to be checked at work

Given the previous comments, examples and the lovely testing script one of you provided, I found out that some of the directories had not been touched due to unicode characters (mostly umlauts) and whitespace in the actual file name.

I went through previous posts in the forum and the web checking for answers. One suggestion was using the Encode module for the file names I am reading from the text file.

I took a look at the module, but I am a bit at a loss how to implement the module in the subroutines and modules which are already in use. I assume that the GetPaths subs needs some editing, provided I am on the right track.

Would be grand if you guys could give me a hint on a) whether Encode is the right module for reading umlaut-and-whitespace-packed file names. b) if not => any other solution to the issue

Thanks a mil in advance

Kind regards

#!/usr/bin/perl --
use 5.014;
use strict;
use warnings;
use Path::Tiny qw/ path /;
use POSIX();
use autodie qw/ close /;
use File::BOM;
use Carp::Always; 
use Data::Dump qw/ dd /;
use Encode qw(encode decode);


Main( @ARGV );
exit( 0 );

sub Main {

    #my( $infile_paths ) = @_; #if run via
    my( $infile_paths ) = 'C:\dev\test_paths.txt';
    chomp $infile_paths;
    my @paths = GetPaths( $infile_paths );
    for my $path ( @paths ){
        RetrieveAndBackupXML( $path );
    }
    return @paths;
} 

## end sub Main

sub GetPaths  {
      use File::BOM;
##    my @paths = path( shift )->lines_utf8;
    my @paths = path( shift )->lines( { binmode => ":via(File::BOM)" }
+ );
    s/\s+$// for @paths; # "chomp"
    return @paths;
} 

## end sub GetPaths 

sub RetrieveAndBackupXML {
    my( $directory ) = shift; ## same as shift @_ ##  
    my $date      = POSIX::strftime( '%Y-%m-%d', localtime );  #suffix
+ for the backup-file, e.g. 2014-08-01
    my $bak       = "$date.bak"; 
    my @xml_files = path( $directory )->children( qr/\.xml$/ );
    for my $file ( @xml_files ) {
        Replace( $file, "$file-$bak" ); 
    }
} 

## end sub Main

# Fix xml entities and create a copy of the original file before editi
+ng
sub Replace {
    my( $in, $bak ) = @_;
    
    path( $in )-> copy( $bak ); #create a copy of $in with the ending(
+s) specified in $bak
    
    my $infh  = path( $bak )->openr_raw;
    my $outfh = path( $in )->openrw_raw;
    
    while( <$infh> ) {
        s{&}{&amp;}g; ## In some case does not match as intended
        s{\s>\s}{&gt;}g;
        s{\s<\s}{&lt;}g;
        print $outfh $_;
    }
  
    close $infh;
    close $outfh;
}

 ## end sub Replace
[download]

In reply to Re^26: search and replace strings in different files in a directory by PitifulProgrammer
in thread search and replace strings in different files in a directory by PitifulProgrammer

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.