kurt2439 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to figure out whether there is a matching folder under a different part of the directory tree and then whether there is a .pdf file in that folder. The "ALLFILES" contains pdf files (a resource fork from OSX and the "real" file). I have not cleaned up the ALLFILES to include just 1 copy of the file, but you can see below I get different results for each even though they data I am stripping from the files paths is the same. Am I not understanding something about the glob function in perl? I very rarely use perl or program so forgive me

#!/usr/bin/perl use File::Glob qw(:globally :nocase); my $PROGDIR = "/home/jchase/espresso/"; my $ALLFILES = "$PROGDIR"."ALL.txt"; my $INELIGIBLE = "$PROGDIR"."INELIGIBLE.txt"; my $INELIGIBLE_NO_FOLDER = "$PROGDIR"."INELIGIBLE_NO_FOLDER.txt"; my $INELIGIBLE_NO_PDF = "$PROGDIR"."INELIGIBLE_NO_PDF.txt"; my $ELIGIBLE = "$PROGDIR"."ELIGIBLE.txt"; my $DEBUG = "$PROGDIR"."DEBUG.txt"; #Open the eligible file list open(FILE, "$ALLFILES") or die("Error reading $ALLFILES."); #Open this file for debug information open(DEBUG, ">>$DEBUG") or die("Error reading #DEBUG."); #Loop through the eligible file list and determine if there is a #matching folder in the INTERIORS archive folder structure while ( ! eof(FILE) ) { $FULLNAME = readline( *FILE ); print DEBUG "Working with $FULLNAME"; #Split the array on the folder levels via "/" @FULLARRAY = split(/\//,$FULLNAME); #Recontrsuct the folder name while substituting COVERS archive #into INTERIORS archive. This is to find out if that folder ex +ists $INTERIORNAME = "/".$FULLARRAY[1]."/".$FULLARRAY[2]."/"."INTER +IORS archive"."/".$FULLARRAY[4]."/".$FULLARRAY[5]."/"; print DEBUG "Looking for $INTERIORNAME\n"; if ( ! -d "$INTERIORNAME" ) { open(LOG,">>$INELIGIBLE_NO_FOLDER") or die("Error reading $INELIGIBLE_NO_FOLDER"); print DEBUG "Did not find folder! Printing to Ineligib +le_no_folder\n\n"; print LOG "$FULLNAME"."\n"; close LOG; } else { print DEBUG "Found the folder, is there a PDF?\n"; #Time to find out if there is a PDF in the folder if ( ! -e <"$INTERIORNAME"*.pdf> ) { open(NOPDF,">>$INELIGIBLE_NO_PDF") or die ("Error reading $INELIGIBLE_NO_PDF +"); print DEBUG "There is no PDF in here! Printing + to Ineligible_no_pdf\n\n"; print NOPDF "$INTERIORNAME"."\n"; close NOPDF; } else { open(ELIG,">>$ELIGIBLE") or die ("Error reading $ELIGIBLE"); print DEBUG "FOUND A PDF! Printing to ELIGIBLE +!\n\n"; print ELIG "$INTERIORNAME"."\n"; close ELIG; } } } close(FILE);

And here is some DEBUG output that shows why this isn't working and I am confused. Basically the ._*.pdf file will give the correct output and the "real" file evaluation will never find a pdf. The folder check always works correctly:

Working with /local/Macintosh/COVERS archive/A-E/Alchemists Mediums and Magicians-PB/._Alchemists Mediums Magicians.pdf

Looking for /local/Macintosh/INTERIORS archive/A-E/Alchemists Mediums and Magicians-PB/

Found the folder, is there a PDF?

FOUND A PDF! Printing to ELIGIBLE!

Working with /local/Macintosh/COVERS archive/A-E/Alchemists Mediums and Magicians-PB/Alchemists Mediums Magicians.pdf

Looking for /local/Macintosh/INTERIORS archive/A-E/Alchemists Mediums and Magicians-PB/

Found the folder, is there a PDF?

There is no PDF in here! Printing to Ineligible_no_pdf

Replies are listed 'Best First'.
Re: trouble with glob
by Marshall (Canon) on Dec 03, 2010 at 14:37 UTC
    I am having a bit of trouble understanding what you want. But if appears that you want a report based upon a subset of all .pdf files underneath some directory.

    The code below will find all files ending in .pdf in any directory in $base_dir or any directory beneath $base_dir. @pdf_files will contain the full path name to each .pdf file. From this you can find out if each file in ALLFILES exists or not.

    If you want a list of all directories that do not contain .pdf files, the code is a bit different. I'm not sure what you mean by eligible or ineligible. Could you show a more concise version of desired output?

    #!/usr/bin/perl -w use strict; use File::Find; my $base_dir = "C:/Temp"; my @pdf_files; find(\&collect_pdf_files, $base_dir); print @pdf_files; sub collect_pdf_files { return unless (-f); # only real files, not directories return unless (/.pdf$/); # continue if name ends in .pdf push (@pdf_files, $File::Find::name); } #prints: C:/Temp/something.pdf
Re: trouble with glob
by kcott (Archbishop) on Dec 03, 2010 at 14:32 UTC

    I'm not sufficiently knowledgeable about Macintosh resource forks to help you directly with that aspect of your question; however, here's some tips that may be useful:

    • You're opening a number of files in append mode but using a die() message with "Error reading ...". Try adding $! to the message to get the real reason (or at least a better reason) why open() failed. Also consider using the (preferred) 3-argument form of open(). The documentation for open has examples of both of these.
    • You're building quite a few pathnames manually. Consider using File::Spec for these tasks: the functions catfile() and catdir() look like the most useful candidates here. If nothing else, it will eliminate all the  . "/" . parts you're currently typing in.
    • Add use strict; and use warnings; to the top of your code (immediately after #!/usr/bin/perl). You can get more informative messages by also adding use diagnostics;.

    -- Ken

Re: trouble with glob
by Gulliver (Monk) on Dec 03, 2010 at 15:07 UTC

    The problem is that each time you do this

    -e <"$INTERIORNAME"*.pdf>

    it returns a different filename and only a single filename. You need to assign that glob to an array and then grep for the pdf extension with a regex.

    By the way according to the 3rd edition of Programming Perl from 2000 this is referred to as the old way to do it. They recommend something like this:

    @files = glob("*.pdf");
      Actually just check the size of @files, no grep needed.

      But whenever I get my results returned by glob like this, they are broken up into different array elements by the spaces in the directories. So for instance:

      @SEARCH_FOR_TIFS = glob("$FOLDERNAME*.tif"); print $SEARCH_FOR_TIFS[0];
      Return
      Element 0: /local/Macintosh/COVERS Element 1: archive/A-E/Eye Element 2: of Element 3: Element 4:

      And part of the data disappears (element 3 and 4 should not be empty if it was splitting on the white space).

      What am I doing wrong here? I looked through File::Glob but don't see how to change globs behavior. I see the arguments to glob will be interpreted as different search queries if separated by white space but that is not applicable here. This wasn't a problem when I was store the glob results in scalar, but that obviously wasn't the right strategy either since I was only getting one of the results.

        I tried a simplified version of what you had before and it works every time on my XP laptop. Found in Programming Perl that it is supposed to return false (only once) when it gets to the end of the matching files and then start over. At least you now have an array of what the glob is doing. I couldn't get the glob to work with $FOLDERNAME in the glob but with the path spelled out it worked fine. The book mentioned something about only one level of interpreting variable but I didn't quite get it.
Re: trouble with glob
by kurt2439 (Sexton) on Dec 03, 2010 at 18:16 UTC

    What I'm trying to do isn't particularly logical -- I have a list of files (generated outside the script) that I need to determine if there is a matching folder name under a different directory tree -- if there is then I need to find out if that folder has a pdf in it. If all those things are true then the original file is "eligible" for conversion (to be done later). If not, I want to know what needs to be done for that file to be eligible later (hence the ineligible logging).

    Thanks for the help -- I'm too new to perl to know what the right suggestion is so let me experiment with all your suggestions and get back later

Re: trouble with glob
by Gulliver (Monk) on Apr 27, 2011 at 22:34 UTC

    This is a little late but I remembered this node when I saw the following in the File::Glob documentation. It looks like bsd_glob() would solve the whitespace issue here.

    Since v5.6.0, Perl's CORE::glob() is implemented in terms of bsd_glob(). Note that they don't share the same prototype--CORE::glob() only accepts a single argument. Due to historical reasons, CORE::glob() will also split its argument on whitespace, treating it as multiple patterns, whereas bsd_glob() considers them as one pattern.