in reply to File copy based on conditions in two arrays

EDIT: nvm some information has been given to me about this and i will absolutely need a listfile with absolute filenames and paths of files that need to be copied. but you can however look over the code... and easily use it to compare two directories and get md5s and whatever. i was copying whole directories while preserving filepaths as well :) but if you feel like you know why it wasnt atleast copying from one path to the other please fill me in cz i would love to know haha

anyway sory to bother.

ok girls and biys here is the script. take it easy on me tho, im still green lol.

I am wanting to copy from the 455dex to 455cex OR from 455dex to a temp folder. the main purpose of this script is to scan the cex directory and copy the dex files to it if they are different. .... i say again :) copy from 455dex to 455cex and based on difference in structure.
use strict; use warnings; use diagnostics; use File::Copy::Recursive qw(fcopy rcopy); use File::Path qw(make_path remove_tree); use Digest::MD5; use Time::HiRes qw( time ); my $start = time(); my $dir1 = $ARGV[0]; my $dir2 = $ARGV[1]; if ( not defined $dir1 ) { print "\nUsage: rexscan.pl [folder]\n"; exit(0); } elsif ( not defined $dir2 ) { $dir2 = ''; } remove_tree( "CEX", "DEX" ); my @first_arg = (); my @second_arg = (); my @dex_md5_array = (); my @cex_md5_array = (); my $countera = 0; my $counterb = 0; first_arg($dir1); second_arg($dir2); get_files(); sub first_arg { my ($dir) = @_; my ($dh); if ( !opendir( $dh, $dir ) ) { return; } while ( my $file = readdir($dh) ) { next if ( -d $file ); my $path = "$dir/$file"; if ( -d $path ) { first_arg("$path"); } else { push( @first_arg, "$dir/$file" ); } } } sub second_arg { my ($dir) = @_; my ($dh); if ( !opendir( $dh, $dir ) ) { return; } while ( my $file = readdir($dh) ) { next if ( -d $file ); my $path = "$dir/$file"; if ( -d $path ) { second_arg("$path"); } else { push( @second_arg, "$dir/$file" ); } } } sub get_files { for my $element1 (@first_arg) { next if ( -d $element1 ); $element1 =~ m#^(.*?)([^/]*)$#; my ( $dex_directory, $dex_temp_file) = ( $1, $2 ); my $dex = "$dex_directory$dex_temp_file"; open (my $dex_file, '<', $dex)|| die "line 90 $!"; my $dex_md5 = Digest::MD5->new->addfile($dex_file)->hexdigest; + push @dex_md5_array, "$dex_md5 $dex_directory $dex_temp_file\n +"; } for my $element2 (@second_arg) { next if -d $element2; $element2 =~ m#^(.*?)([^/]*)$#; my ( $cex_directory, $cex_temp_file ) = ( $1, $2 ); my $cex = "$cex_directory$cex_temp_file"; open (my $cex_file, '<', $cex) || die "line 101"; my $cex_md5 = Digest::MD5->new->addfile($cex_file)->hexdigest; push @cex_md5_array, "$cex_md5 $cex_directory $cex_temp_file"; } for my $cex_md5_array (@cex_md5_array) { my ( $md5c, $cpath, $cfile ) = split /\s+/, ($cex_md5_array); + #cex md5 filepath and file stored here chomp ( $md5c, $cpath, $cfile ); for my $dex_md5_array (@dex_md5_array) { my ( $md5d, $dpath, $dfile ) = split (/\s+/,$dex_md5_array +); #dex md5 filepath and file stored here chomp ( $md5d, $dpath, $dfile ); if ($md5c ne $md5d && $cfile eq $dfile){ make_path("DEX"); fcopy("$dpath$dfile", "DEX/$cpath"); #or use this to copy directly to the other directory. #fcopy("$dpath$dfile", "$cpath"); } else { print ''; } } } } my $end_run = time(); my $end = time(); my $runtime = sprintf( "%.5f", $end - $start ); print "This script took $runtime seconds to execute\n";
and i am wanting to copy from the 455dex directory to the 455cex directory OR copy the DEX directory to a temp folder while preserving cex filepath <.<
i dont see why it cant be accomplished as a simple file copy process, and why it has to be so hard for me, but what started off as a simple project turned into a pretty hefty task. hopefully someone sees something i dont.

Replies are listed 'Best First'.
Re^2: File copy based on conditions in two arrays
by kzwix (Sexton) on Oct 09, 2014 at 11:11 UTC
    Hi, I have some advice for you:
    1. When defining an empty list, do not assign () to it, it is a waste of code (as a list is ALWAYS defined, but empty, at creation time)
    2. Use exceptions. Your functions (which aren't commented much, as to what they expect, or what they produce, by the way) simply return if something goes wrong. Try using
      die "Error message for the user"
      instead.
    3. Check arguments BEFORE going into functions, not after those functions failed. It makes more sense to check ahead that you have what is needed (and display your usage message if that's not the case) than try anyway, and then react if it failed.
    4. In your first_arg and second_arg functions, your "next if -d $file" doesn't seem to make much sense, to me. Do you really want to skip the file or directory names matching subdirectories of your current directory ?
    5. Avoid using the same name for different things, it gets confusing. Having an array and a sub share the same name works fine with Perl, but you'll have more chances of making a mistake yourself. Naming the array @first_arg, and the sub get_first_args, for instance, would be cleaner.
    6. Your first_arg and second_arg functions do exactly the same thing, except they fill different lists. This is what references are for, you know: Either give the reference to the array you want to fill as a parameter, or define an array IN your sub, then return its reference (and work with it after). It is much cleaner (to get a ref of something, just put a '\' in front of it. For instance, \@first_arg is a ref to @first_arg. To get the thing back from its ref, just prefix the ref value with the type, for instance, if $ref holds the ref to an array, then @$ref is the array. Or @{$ref}, if you prefer (or have multiple reference levels)
    7. When using exceptions (die), you do not need to print the line number yourself, it is automatically added by Perl if you do not add "\n" to your message (and more up to date, for instance, there are more than 11 lines between your "line 90" and "line 101" messages...)
    8. Use hashes, to regroup things that belong together. In your @dex_md5_array, you put strings regrouping three elements: The MD5, the directory, and the file name. You should, instead, put these in a hash, and put a reference to this hash in the array. For instance, you could write
      push @myArray, {'md5' => $md5, 'dir' => $dir, 'file' => $fileName};
      And to retrieve something from the hash reference, just write, for instance ${$hashRef}{'md5'}, or whatever. This way, you do not have to parse them again later. Oh, and chomp them when you put them in, not when you use them (it's much better to have clean values, than having to clean them each time you want to use them).
    9. Your element1 and element2 loops are the same, with only the source and destination arrays changing. Make a sub, and use references, you'll be glad you did.
    10. When comparing the MD5 of the files you wish to copy...
      • Do you really wish to compare MD5 first, when you're really wanting to look for files having the same name ?
      • Can different files have the same base name, but be in different directories, on the same "side" ? And, if so, do you really wish for the last file with that name to overwrite ALL the files with the same name in the destination directories ?
      thank you for your reply :) i was unaware of declaring empty arrays is a no no. i will correct that mistake for all future scripts i write. i did have die throughout the code but i cleaned it up a little bit before i posted it, but even with die i was getting no errors at all. even using strict warnings and diagnostics i were getting no errors at all. and trust me when i say that i did check every argument before going into any function. before i cleaned up the code i probably had 40 or 50 lines of print $variable just to be absolutely sure of what was in it.


      when it goes thru the first sub routine, it gets the filename and directory path and pushes it to "first_arg" and "second arg" array. then if you look at the first two functions (the first two for loops) in the "get_files" sub, it takes each item in the first and second arg arrays and joins the filename back together so it can get an md5 of said file in its respective array. once it gets the md5, it pushes all that info md5, filepath and filename back to a different array (@cex_md5_array and @dex_md5_array, which is used for comparng the directories)
      <br? then when it gets to the second set of for loops in sub "get_files" (where it is looping thru the arrays that have md5s in it) it takes each item in the array and splits it into the md5 filepath and filename. and then compares it.

      what this script id trying to accomplish is comparing two directories and copy missing files the first directory to the second. or in other words "imaging" the first directory unless file exists in the second directory given in arg two.

      sorry about not posting an indepth expanation to this, but hopefully this explanation helps.

      also here is the files for you to try it out. it includes both directories and the script, so just unzip it and run the script.

      I can print the complete directory/file/md5 to a file of both args, so the script works great except for the comparing function inside the last two for loops at the bottom. and i am unsure what conditions to feed it to make it copy the correct way. maybe its a directory name difference causing it to fail tho that seems unlikely. i just need it to copy from one directory to the other if md5 doesnt match and files doesnt match, then copy that file from arg one directory to arg two directory, or even from arg one dir to a temp directory

      I have read you comment and i see what you mean. The next script i write i will take into consideration the points you have made about using refs and chomping before i push to an array. and the hash this really confused me for some reason haha, but the way you explained how to use them seemed pretty straight forward actually. thank you for taking the time and hopefully my explanation of this script is informative

      also with little modification to the if statement at the bottom of this script, you can backup a complete directory and its sub paths and files. put this as the condition and it will scan the first arg (tho you still have to input two args) and back it up to a "backup" folder.
      my $x = 0; if($x == 0){ if (-e "backup/$dpath/$dfile"){ print "done"; exit; } fcopy("$dpath$dfile", "backup/$dpath");
      like i said because of the way the script is setup, you still need to punch in two args or else it will just run thru the script very quickly. first arg will be the one that gets backed up.
        • Well, for assigning () to a newly declared array, it is not a "no no", it is merely something I advise against. It would be exacly like writing "my $foo = undef;", that is, assigning exactly what is already there. Hence, a waste of time, and space.
        • About your parameter checking, what I meant is that the code where you display "usage: ..." and exit is AFTER you call the functions working with your parameters. You should put it before, and merely check that you have parameters, or display usage and exit.

          I also advise you to put solid parameter checking in your functions, when performance is not an issue. It is much better to have a function die with an explicit message like "ERROR: myFunctionName() expected 3 parameters" if some expected parameters are undefined, for instance, than have it behave unexpectedly, in a silent way.

        • What I was getting at is that you separate the directory name, and the file name, in your algorithm, and then check only the file name. Hence, you may overwrite your files several times.

          For instance, you have directory A, containing directories A1 and A2, each of those having a file named 'a'. Then, in directory B, you have a file named 'a'. Well, now you iterate:
          First, you would work for A/A1/a, compare it with B/a, and, depending on the MD5, replace or keep B/a.
          Then, you would work for A/A2/a, compare it with B/a, and, depending on the MD5, replace or keep B/a.

          Somehow, I doubt that was what you were wanting to do :)