koolgirl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, :))

It's been quite a while since I've humiliated myself on SoPW, so I figured I'd better get to work. Basically what I'm trying to do is this; I've written a utility for myself (mainly for extra practice with Perl since I've just recently finished the llama book, but the need for the utility is there as well), that reads through the directory in which I hold all my programming files, matches against seven different regex's with different criteria, then creates an index file, with a list of links to all files matching on each seperate keyword.

Of course at first I received nothing but gads of error reports since creation (about 4 days ago), due to which I've written, re-written and written the code one more time. I've FINALLY got it free of all error reports, as a matter of fact it sends nothing at all back, just the next command prompt. However, to my amazement, the file still hadn't been created...? I've seen lots of crazy things during my ride on the llama, but never have I seen an open file statement that didn't send back an error report when prompted to do so, but didn't open the file either.

So, after that, I went in and started writing de-bug print statements, and then my mind was really blown to bits. Let me show you:

This is the code:
#!usr/bin/perl use strict; use warnings; my $file; my $path = "/mnt/data/Programming/"; my $i = 0; my @regex = qw( file split opendir push sort sub % STDIN); my @lines; my $prev_line; my $curr_line; # This program looks through a directory, reads through each file con +tained # within, and sorts them based on several different pattern matching +criteria, # then sends links to each file containing a match for each keyword, +to an index # page. opendir (PRO, "/mnt/data/Programming") || die $!; foreach $file (readdir (PRO)) { open (IN, "$file") || die $!; while (<IN>) { #print $_; # de-bug print statement 1 push(@lines, $_); #print $lines[31];# de-bug print statement 2 if ($lines[$i] =~ /$regex[$i]/) { if ($i >= 1) { $prev_line = $lines[$i - 1]; } # end if $curr_line = $lines[$i]; open (OUT, ">/mnt/data/Programming/links") || die $!; print OUT " < a HREF=\"$path . $file\">$regex[$i]</a>"; print OUT $file . "\n"; print OUT $prev_line . "\n"; print OUT $curr_line . "\n"; $i++; close(OUT); } # end if } # end while close(IN); } # end foreach

This:

* zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space ?: * zero or more /x ign. wh.space
is what I get in the two de-bug print statements, you see within my code ( or something confusingly similar ). Now is there something obviously normal about this strange output that I in my primitive state of being haven't come upon just yet, or are there grimlins behind my command prompt?

Any help would be greatly appreciated and needed. As usual, I thank you very much for your attention and time, hopefully one day soon during my new ride on the big bad camel, I'll be answering questions instead of asking them ;)

koolgirl

UPDATE:

Thank everyone so much for all the insightful replies. I did learn a few lessons, ( not a good idea to parse a directory which is also your working directory (however I did learn that Perl was bad a#$ enough to parse itself!!), that my structure needs a bit of cleaning up, because the loops, the method of storing the lines read and the open statements, all could have been structured much more efficiently ), and I am currently in the process of de-bugging it and re-structuring it, and I have of course downloaded all code examples given to help me along in this process. With my busy life, if I seem to take a while to update, de-bug, reply, etc, please forgive, it's not bad manners, just lack of time to do it. In closing, and this is the most important part of this update, a lot of the replies ( gwadej, ig, lsh, and roboticus ) really did help to correct my thinking about code ( not just this particular de-bug ), which is an awesome opportunity for my skills to advance, and I thank you dearly for that! :))

Replies are listed 'Best First'.
Re: File not opening, but no error report
by gwadej (Chaplain) on Mar 25, 2009 at 20:37 UTC

    From a quick look, I don't see what is generating your output, but I do see a few things that will definitely slow you down.

    1. You are reading filenames from the directory "/mnt/data/Programming", but opening those files in the current directory. If you are not in "/mnt/data/Programming", this should not open the files you intend.

    2. You are modifying the directory you are reading by creating a file there. That is probably going to cause some confusion.

    3. Since you are opening your output file with '>', you would overwrite it each time instead of accumulating output in that file. If you want to reopen each time, use '>>' instead.

    4. You seem to be processing data one line at a time and also putting those lines in an array. I would probably only do one or the other.

    I would probably get the list of files all at once and then process them from an array. For instance,

    opendir ( my $dir, "/mnt/data/Programming") or die "Can't open directory: $!\n"; my @files = readdir( $dir ); closedir( $dir );

    You can now process the files independently. The file open should really contain the path as well:

    foreach my $file ( grep { -f "$path/$_" } @files ) { open( my $fh, '<', "$path/$file" ) or die "Unable to open '$file': $!\n"; while( <$fh> ) { # process each line here. } }

    It's a good idea to check the names you get from readdir() to make sure they are files. You can't read directories as files (in general). Notice that I used lexical file and directory handles. Although not critical in this case, it will definitely save troubleshooting nasty problems in the future. Start on good habits early.

    Update: Corrected file test to use path as shmem observed.

    G. Wade

      Yes, but...

      The file open should really contain the path as well:
      foreach my $file ( grep { -f $_ } @files )

      ...the file test should really contain the path as well ;-)

      foreach my $file ( grep { -f "$path/$_" } @files )
Re: File not opening, but no error report
by zwon (Abbot) on Mar 25, 2009 at 20:25 UTC
    if ($lines[$i] =~ /$regex[$i]/)

    If $lines[0] doesn't match /file/ then you never will execute if body and never will increment $i, so you always would match $lines[0] against /file/.

    Update: typo fixed

Re: File not opening, but no error report
by runrig (Abbot) on Mar 25, 2009 at 20:32 UTC
    open (IN, "$file") || die $!
    You'll want to change that to:
    open (IN, "$path/$file") || die $!;
    readdir does not return the directory part of the filename, so you need to either add the directory path to the file, or chdir to the directory before opening the file.

    Update: but if you're executing the program from that directory anyway, then that's not your problem.

Re: File not opening, but no error report
by roboticus (Chancellor) on Mar 26, 2009 at 13:01 UTC
    koolgirl:

    Just a minor note: You're opening & closing the output file handle repeatedly, which can be pretty slow if you must do it often. One way you could speed things up is to open the output file handle only once (if required) and close it only once at the end. Something like:

    # First declare your file handle my $OUT; ... foreach ... { while ... { if ... { # Error found, so... # open the file if not open yet if (!defined $OUT) { open $OUT, '>', "$path/$file" or die "Uggabug! $!"; } # and log the error print $OUT "Yer error message\n"; } } } # finally, close it if you've opened it close($OUT) if defined $OUT;
    ...roboticus
Re: File not opening, but no error report
by ig (Vicar) on Mar 26, 2009 at 01:28 UTC
    Now is there something obviously normal about this strange output...?

    If your first debug print statement is uncommented you should see every line of every file you read but if only the second is uncommented you may well see something like the sample you provided.

    You have two loops: an outer loop for each file in the directory and an inner loop for each line in the current file. On each iteration of the inner loop you push the current line onto @lines. So @lines eventually contains all the lines from all the files. Initially $lines[31] is undefined, so your second debug print statement will print nothing (but you should get a warning like "Use of uninitialized value in concatenation (.) or string at ..."), but after you have read 32 lines it will have a value and it will never change, so your second debug print will then print the same string each time through the loop.

    Following is an example you might consider. It has a few advantages and some disadvantages:

    • It slurps each file into a scalar which is reused on each iteration of the loop so it doesn't accumulate all the input lines.
    • It uses a foreach loop to iterate over the list of search strings rather than an index into the array.
    • It uses an RE to capture matched text: consecutive lines matching the given string and one preceding and one trailing line (like a context grep with one line of context). This RE is difficult to understand. This makes the loop tight/short but ultimately may be harder to understand and maintain. An alternative is a simple RE with additional code to concatenate matching and context lines.
    • It organizes the results by matched string and by file.
    • It uses a template to produce a complete HTML page - with HTML escaping of the file names and matched text.
Re: File not opening, but no error report
by Ish (Acolyte) on Mar 26, 2009 at 02:39 UTC
    you could add the path using map
    my $DIR = "./"; opendir ( my $dir, $DIR) or die "Can't open directory $DIR: $!\n"; my @files = map { $DIR.$_ } readdir( $dir ); closedir( $dir ); foreach my $file ( grep { -f $_ } @files ) { print "\$file is $file\n"; }