james28909 has asked for the wisdom of the Perl Monks concerning the following question:

I want to ask something and hope that someone can shed some light in my direction. I started writing a script that at the time seemed like it was gonna be a breeze lol. but the deeper i got into the script the more complex i seen that it got. I have two arrays. One array has the md5 directory/all sub directories and filename of the first directory (without . and ..). The second one has the md5 directory/and all its sub directories with the filenames (without . and ..).

what i am trying to do is compare both directories like:
for my $data0(@array0){ my ($md50, $filepath0, $filename0)= split (/\s+, $data0); for my $data1(@array1){ my ($md51, $filepath1, $filename1) = split (/\s+/, $data1); if ($md50 !~ $md51 && $filename0 =~ $filename1){ make_path($filepath0); copy("$filepath0$filename0, $filepath1); } else{ #do nothing } } }
now to the question. i am trying to compare each file in the directories and sub directories of 2 args. but i am only trying to copy the ones that do NOT match and to that particular directory only. both args have different directory trees as well and i think that is the kicker and is why it is confusing me so badly :l i cannot for the life of me figure this out lol. i can get it to copy files but its either all files get copied into all directories, or none. It would be optimum if i could indeed preserve the filepath as well...
what am i doing wrong? ive used strict and warnings and diagnostics, but it is not returning any errors at all, so i am out of options because the script runs and completes, but it will not copied the files.

Replies are listed 'Best First'.
Re: File copy based on conditions in two arrays
by Athanasius (Archbishop) on Oct 09, 2014 at 03:49 UTC

    Hello james28909,

    First, the code snippet provided is incomplete, and contains at least one, and probably two, syntax errors. Second, to understand what you are trying to do, I think we will need to see sample input data, together with the desired output. In the meantime I will make one observation: this line:

    if ($md50 !~ $md51 && $filename0 =~ $filename1){

    is unlikely to work correctly across all cases. For example, if $md50 is '123456' and $md51 is '12345', the first condition will evaluate to false (because the match succeeds); but (I’m guessing) the logic of your code requires it to be true. Much simpler to use ne and eq here in place of !~ and =~, respectively:

    if ($md50 ne $md51 && $filename0 eq $filename1){

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      ok, i will clean the code up some and post it. but like i say, it completes fine, and will even copy files while preserving the original directorys to a new path eg: new_path/$filepath0/$filename0. the problem does come in when i try to copy the $filename0 to $filepath1. also $filepath1 has the same directory structure, but the base foldername is a little different. i want to copy /$filename0 to $filepath1/. it just doesnt make any sense. ive tried everything i can imagine lol. Ill get this code cleaned up and posted. ill try to post a zip file with everything in it as well. so you can just unzip and run the script.
Re: File copy based on conditions in two arrays
by james28909 (Deacon) on Oct 09, 2014 at 08:20 UTC
    EDIT: nvm some information has been given to me about this and i will absolutely need a listfile with absolute filenames and paths of files that need to be copied. but you can however look over the code... and easily use it to compare two directories and get md5s and whatever. i was copying whole directories while preserving filepaths as well :) but if you feel like you know why it wasnt atleast copying from one path to the other please fill me in cz i would love to know haha

    anyway sory to bother.

    ok girls and biys here is the script. take it easy on me tho, im still green lol.

    I am wanting to copy from the 455dex to 455cex OR from 455dex to a temp folder. the main purpose of this script is to scan the cex directory and copy the dex files to it if they are different. .... i say again :) copy from 455dex to 455cex and based on difference in structure.
    use strict; use warnings; use diagnostics; use File::Copy::Recursive qw(fcopy rcopy); use File::Path qw(make_path remove_tree); use Digest::MD5; use Time::HiRes qw( time ); my $start = time(); my $dir1 = $ARGV[0]; my $dir2 = $ARGV[1]; if ( not defined $dir1 ) { print "\nUsage: rexscan.pl [folder]\n"; exit(0); } elsif ( not defined $dir2 ) { $dir2 = ''; } remove_tree( "CEX", "DEX" ); my @first_arg = (); my @second_arg = (); my @dex_md5_array = (); my @cex_md5_array = (); my $countera = 0; my $counterb = 0; first_arg($dir1); second_arg($dir2); get_files(); sub first_arg { my ($dir) = @_; my ($dh); if ( !opendir( $dh, $dir ) ) { return; } while ( my $file = readdir($dh) ) { next if ( -d $file ); my $path = "$dir/$file"; if ( -d $path ) { first_arg("$path"); } else { push( @first_arg, "$dir/$file" ); } } } sub second_arg { my ($dir) = @_; my ($dh); if ( !opendir( $dh, $dir ) ) { return; } while ( my $file = readdir($dh) ) { next if ( -d $file ); my $path = "$dir/$file"; if ( -d $path ) { second_arg("$path"); } else { push( @second_arg, "$dir/$file" ); } } } sub get_files { for my $element1 (@first_arg) { next if ( -d $element1 ); $element1 =~ m#^(.*?)([^/]*)$#; my ( $dex_directory, $dex_temp_file) = ( $1, $2 ); my $dex = "$dex_directory$dex_temp_file"; open (my $dex_file, '<', $dex)|| die "line 90 $!"; my $dex_md5 = Digest::MD5->new->addfile($dex_file)->hexdigest; + push @dex_md5_array, "$dex_md5 $dex_directory $dex_temp_file\n +"; } for my $element2 (@second_arg) { next if -d $element2; $element2 =~ m#^(.*?)([^/]*)$#; my ( $cex_directory, $cex_temp_file ) = ( $1, $2 ); my $cex = "$cex_directory$cex_temp_file"; open (my $cex_file, '<', $cex) || die "line 101"; my $cex_md5 = Digest::MD5->new->addfile($cex_file)->hexdigest; push @cex_md5_array, "$cex_md5 $cex_directory $cex_temp_file"; } for my $cex_md5_array (@cex_md5_array) { my ( $md5c, $cpath, $cfile ) = split /\s+/, ($cex_md5_array); + #cex md5 filepath and file stored here chomp ( $md5c, $cpath, $cfile ); for my $dex_md5_array (@dex_md5_array) { my ( $md5d, $dpath, $dfile ) = split (/\s+/,$dex_md5_array +); #dex md5 filepath and file stored here chomp ( $md5d, $dpath, $dfile ); if ($md5c ne $md5d && $cfile eq $dfile){ make_path("DEX"); fcopy("$dpath$dfile", "DEX/$cpath"); #or use this to copy directly to the other directory. #fcopy("$dpath$dfile", "$cpath"); } else { print ''; } } } } my $end_run = time(); my $end = time(); my $runtime = sprintf( "%.5f", $end - $start ); print "This script took $runtime seconds to execute\n";
    and i am wanting to copy from the 455dex directory to the 455cex directory OR copy the DEX directory to a temp folder while preserving cex filepath <.<
    i dont see why it cant be accomplished as a simple file copy process, and why it has to be so hard for me, but what started off as a simple project turned into a pretty hefty task. hopefully someone sees something i dont.

      Hi, I have some advice for you:
      1. When defining an empty list, do not assign () to it, it is a waste of code (as a list is ALWAYS defined, but empty, at creation time)
      2. Use exceptions. Your functions (which aren't commented much, as to what they expect, or what they produce, by the way) simply return if something goes wrong. Try using
        die "Error message for the user"
        instead.
      3. Check arguments BEFORE going into functions, not after those functions failed. It makes more sense to check ahead that you have what is needed (and display your usage message if that's not the case) than try anyway, and then react if it failed.
      4. In your first_arg and second_arg functions, your "next if -d $file" doesn't seem to make much sense, to me. Do you really want to skip the file or directory names matching subdirectories of your current directory ?
      5. Avoid using the same name for different things, it gets confusing. Having an array and a sub share the same name works fine with Perl, but you'll have more chances of making a mistake yourself. Naming the array @first_arg, and the sub get_first_args, for instance, would be cleaner.
      6. Your first_arg and second_arg functions do exactly the same thing, except they fill different lists. This is what references are for, you know: Either give the reference to the array you want to fill as a parameter, or define an array IN your sub, then return its reference (and work with it after). It is much cleaner (to get a ref of something, just put a '\' in front of it. For instance, \@first_arg is a ref to @first_arg. To get the thing back from its ref, just prefix the ref value with the type, for instance, if $ref holds the ref to an array, then @$ref is the array. Or @{$ref}, if you prefer (or have multiple reference levels)
      7. When using exceptions (die), you do not need to print the line number yourself, it is automatically added by Perl if you do not add "\n" to your message (and more up to date, for instance, there are more than 11 lines between your "line 90" and "line 101" messages...)
      8. Use hashes, to regroup things that belong together. In your @dex_md5_array, you put strings regrouping three elements: The MD5, the directory, and the file name. You should, instead, put these in a hash, and put a reference to this hash in the array. For instance, you could write
        push @myArray, {'md5' => $md5, 'dir' => $dir, 'file' => $fileName};
        And to retrieve something from the hash reference, just write, for instance ${$hashRef}{'md5'}, or whatever. This way, you do not have to parse them again later. Oh, and chomp them when you put them in, not when you use them (it's much better to have clean values, than having to clean them each time you want to use them).
      9. Your element1 and element2 loops are the same, with only the source and destination arrays changing. Make a sub, and use references, you'll be glad you did.
      10. When comparing the MD5 of the files you wish to copy...
        • Do you really wish to compare MD5 first, when you're really wanting to look for files having the same name ?
        • Can different files have the same base name, but be in different directories, on the same "side" ? And, if so, do you really wish for the last file with that name to overwrite ALL the files with the same name in the destination directories ?
        thank you for your reply :) i was unaware of declaring empty arrays is a no no. i will correct that mistake for all future scripts i write. i did have die throughout the code but i cleaned it up a little bit before i posted it, but even with die i was getting no errors at all. even using strict warnings and diagnostics i were getting no errors at all. and trust me when i say that i did check every argument before going into any function. before i cleaned up the code i probably had 40 or 50 lines of print $variable just to be absolutely sure of what was in it.


        when it goes thru the first sub routine, it gets the filename and directory path and pushes it to "first_arg" and "second arg" array. then if you look at the first two functions (the first two for loops) in the "get_files" sub, it takes each item in the first and second arg arrays and joins the filename back together so it can get an md5 of said file in its respective array. once it gets the md5, it pushes all that info md5, filepath and filename back to a different array (@cex_md5_array and @dex_md5_array, which is used for comparng the directories)
        <br? then when it gets to the second set of for loops in sub "get_files" (where it is looping thru the arrays that have md5s in it) it takes each item in the array and splits it into the md5 filepath and filename. and then compares it.

        what this script id trying to accomplish is comparing two directories and copy missing files the first directory to the second. or in other words "imaging" the first directory unless file exists in the second directory given in arg two.

        sorry about not posting an indepth expanation to this, but hopefully this explanation helps.

        also here is the files for you to try it out. it includes both directories and the script, so just unzip it and run the script.

        I can print the complete directory/file/md5 to a file of both args, so the script works great except for the comparing function inside the last two for loops at the bottom. and i am unsure what conditions to feed it to make it copy the correct way. maybe its a directory name difference causing it to fail tho that seems unlikely. i just need it to copy from one directory to the other if md5 doesnt match and files doesnt match, then copy that file from arg one directory to arg two directory, or even from arg one dir to a temp directory

        I have read you comment and i see what you mean. The next script i write i will take into consideration the points you have made about using refs and chomping before i push to an array. and the hash this really confused me for some reason haha, but the way you explained how to use them seemed pretty straight forward actually. thank you for taking the time and hopefully my explanation of this script is informative

        also with little modification to the if statement at the bottom of this script, you can backup a complete directory and its sub paths and files. put this as the condition and it will scan the first arg (tho you still have to input two args) and back it up to a "backup" folder.
        my $x = 0; if($x == 0){ if (-e "backup/$dpath/$dfile"){ print "done"; exit; } fcopy("$dpath$dfile", "backup/$dpath");
        like i said because of the way the script is setup, you still need to punch in two args or else it will just run thru the script very quickly. first arg will be the one that gets backed up.