c4jjm has asked for the wisdom of the Perl Monks concerning the following question:

I am currently trying to collect every line of certain files within sub directories and output them all into a single large file... this works great, but it has to re-append ALL files every time I add a new sub directory.

What I'd like:

When running the script, it checks to see if it has already taken the data out of each file, if it has, it skips that file, and moves to the next. When it finds a file that has not been appended, it will append to the existing large file.

What I have tried:

I have the script generating a second file containing all file paths that it appends, then it's supposed to open that file the next time it runs, it should see that a file path is in there, and then not append that data to the large file again. The problem is it either appends everything again every time, or it appends all data the first time, and nothing (even new stuff) the second time.

I attempted all the styles of array checking found in the "How to find out if X is an element in an array?" Q&A, but none will work. where the current file path is the element, and the array filled by reading the file paths out of the second folder.

What I think the problem is:

The array is filled, but my if statement is not functioning correctly, it needs to check if the new file path is the same as ANY in the second file(my array), if it is, then it does not append the data again, if, after checking ALL elements of the array the file path does not match, then it appends and adds the new file path to the second file.

if (grep { -f $_ and $_ eq $FilePath } @DirsArray){} else{ open (FILE, $FilePath ) or die "Cannot open file: $!"; printf("\n" . $FilePath . "\n"); while ( $line = <FILE> ) { push(@outLines, $line); #to be sent to appended large file } close FILE; print (LASTFILES $FilePath."\n"); #prints to second file }
Any ideas why its not matching the array with the file path (also, I tried pretty much all the ways listed on that Q&A page, not just this version)?

Thanks,

Josh

Replies are listed 'Best First'.
Re: Append new files only.
by roboticus (Chancellor) on Nov 10, 2011 at 00:33 UTC

    c4jjm:

    Are you aware that you can use tar? It already has the ability to only append newer files to the archive. It won't be the same file format, but it might be able to help you out.

    Anyway, for your coding problem, it looks like you just need to chomp your lines as you read them, otherwise the end-of-line at the end of the filename will cause you a mismatch: After all "foo" doesn't equal "foo\n", which is what it looks like your program is doing.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      THANK YOU ROBOTICUS!

      It's always the little stuff that makes all the difference...Everything works great now with chomp.

      Also, I'm curious, I have not used tar and the only tar I am familiar with is a Linux object, so my guess is that it only works in Linux? (This particular script needs to run in Windows.)

        c4jjm:

        While it's a *nix program, there are versions for Windows too. I just mentioned it because I was most familiar with it. There are other programs that will do the same thing. I'd imagine that since WinZIP can read tar files, that it probably has the ability to just add new/modified files to the ZIP file without reprocessing the entire thing. (Just guessing, no real knowledge of it.)

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Append new files only.
by jwkrahn (Abbot) on Nov 10, 2011 at 11:54 UTC
    if (grep { -f $_ and $_ eq $FilePath } @DirsArray){} else{

    Is usually written as:

    unless ( grep { -f $_ and $_ eq $FilePath } @DirsArray ) {


    printf("\n" . $FilePath . "\n");

    Is better written as:

    print "\n" . $FilePath . "\n";


    while ( $line = <FILE> ) { push(@outLines, $line); #to be sent to appended large file }

    No need for a loop there, you could just do:

    push @outLines, <FILE>; #to be sent to appended large file

      jwkrahn, Thanks for the suggestions.

      The if statement actually does stuff for both cases, I just put the relevant portion up, but I did try removing the loop, turns out the loop is faster than the straight push, I tried 9 files (~165k lines total):

      push @outLines, <FILE>; Completed in ~0.9 seconds

      And with the while loop it completed in ~0.4 seconds.