perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:

Hey PerlMonks So I have this script and I know how to read it for a file and get the output in txt format. But I am having problems reading a whole folder containing files. I read up on opendir and readdir.. But I am not sure if the code I am using is good? I get no errors..it is running but then my output .txt file is blank. Can someone help me out or tell me how I can read the whole folder and tell perl to read each file, parse the columns I need then read these columns line by line and output all the result in a single txt file in the end? Thanks in advance
#!usr/bin/perl -w use strict; my $directory = "/Users/My_folder/electrical_records"; opendir (DIR, $directory) or die $!; open (my $out, ">electrical_RESULT.txt"); my @files = grep {/_*_.txt/} readdir DIR; foreach my $file (@files) { while (my $file = readdir(DIR)) { next if /^\s$/; # skip blank lines my ($meter_read, $energy_consumption) = (split /\s+/)[4,7]; # energy consumption must meet min, max criteria if ($energy_consumption =~ /^[^0-1-.]/ or ( $energy_consumption < +60 or $energy_consumption > 120)) { print $out " $meter_read \ $energy_consumption \ n"; } } closedir(DIR); }

Replies are listed 'Best First'.
Re: read the whole folder files
by kennethk (Abbot) on Apr 09, 2012 at 18:50 UTC

    So close. opendir/readdir is the equivalent of ls or dir; it just lists the directory content. Assuming you want to read the contents of the file, you need to open them, too, a la:

    foreach my $file (@files) { next unless -f "$directory/$file"; open my $fh, '<', "$directory/$file" or die "Open failed on $file: + $!"; while (<$fh>) { next if /^\s$/; # skip blank lines

    Also note the use of -f to check that you're dealing with an ordinary file before opening and actually testing that the file open worked.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: read the whole folder files
by halfcountplus (Hermit) on Apr 09, 2012 at 20:23 UTC
    in the grep, you have a regex, with a rather strange looking pattern.

    Ditto. Maybe you've confused shell globbing with regular expressions? Let's break this is down:

    _*_.txt

    "_*_" would mean, zero or more instances of "_", with a "_" following. Then "." matches any character.

    If you want to match _one_.txt, _two_.txt, etc, you should use:

    _.+?_\.txt

    Which means: a "_" followed by one or more of anything, non-greedily (because + is followed by ?, but note ? has another meaning in regexps depending on context; non-greedy matching is important when you are looking for any number of anything, followed by something in particular), then "_." (notice the . is escaped with \ because . alone has a special meaning, see above) followed by "txt".

    If you haven't yet: perlretut.

      I like your explanation.

      However it should be noted that there are limits to "greediness". The regex will be as "greedy" as it can be while still allowing the rest of the regex to match. In this case, anchoring the regex to the "end of string" makes a difference.
      /_.*_\.txt$/- the characters "gobbled up" by the .* won't include the "_.txt" at the end of the string.

      Having said that, I am confused by the OPs updated comment because what we were talking about had to do the how to get the file names and not parsing the file contents itself - which is a different question!

Re: read the whole folder files
by Marshall (Canon) on Apr 09, 2012 at 19:38 UTC
    in the grep, you have a regex, with a rather strange looking pattern.
    my @files = grep {/_*_.txt/} readdir DIR;
    perhaps:
    my @files = grep {/\.txt$/} readdir DIR;
    is all that you need? The $ anchors the regex to the end of the string. An underscore "_" would be unusual before a .txt ending. Without escaping the "." like "\.", the dot means any character, which doesn't look like that you want.

    When debugging, print @files to make sure that you are getting the files that you want.

Re: read the whole folder files
by perllearner007 (Acolyte) on Apr 09, 2012 at 20:38 UTC
    Hello Perl monks, Thank you so much...both your suggestions resolved the issue. However, the output file is in the format as shown below:
    meter_read energy_consumption n 00_1 34 n 00_2 53 n 00_3 121 n ... ... meter_read energy_consumption n 00_146 33 n .... ...
    Isn't it a bit strange because while reading one file they get sorted into columns like below:
    meter_read energy_consumption 00_146 33 00_1 34
    And here they don't ? Plus the "n" i.e the \n newline command isn't supposed to show up is it?
      One way (of many) to parse the data would be like this:
      #!/usr/bin/perl -w use strict; use Data::Dumper; # this is a core module # no "installation" is required my $line = "meter_read energy_consumption n 00_1 34 n 00_2 53 n 00_3 1 +21\n"; # # this extracts number pairs and puts them into a hash # my %hash = $line =~ m/([\d_]+)\s+(\d+)/g; print Dumper \%hash; __END__ $VAR1 = { '00_2' => '53', '00_3' => '121', '00_1' => '34' };
Re: read the whole folder files
by perllearner007 (Acolyte) on Apr 09, 2012 at 20:43 UTC
    Nevermind! I fixed it! Thank you so much!

      findrule /Users/My_folder/electrical_records -file -name '_*_.txt' -maxdepth 1

      use File::Find::Rule; ## GLOB my @files = find( file => maxdepth => 1, name => '_*_.txt', in => $dir +ectory ); ## REGEX my @files = find( file => maxdepth => 1, name => qr/_.+?_\.txt$/, in = +> $directory ); ## verbose my @files = File::Find::Rule->file() ->maxdepth(1) # do not recurse ->name( '_*_.txt', ) ->in( $directory ); ...
Re: read the whole folder files
by perllearner007 (Acolyte) on Apr 10, 2012 at 16:24 UTC
    It was running fine yesterday but today I keep getting a warning "use of uninitialised value ...in pattern match.. can anyone spot the mistake below?
    if ($energy_consumption =~ /^[^0-1-.]/ or ( $energy_consumption < +60 or $energy_consumption > 120))

      Well, the warning says there is an uninitialized value in a pattern match, so clearly that means that $energy_consumption is undefined when you get there. That value is assigned in your split, so therefore the split isn't outputting at least 8 values. I suspect that your input file is not formatted as your expect. A quick way to find the offending lines would be to add the block

      if (not defined $energy_consumption) { warn "Split missed, $file: $_"; next; }

      after the split, and see what comes out. My guess is that you are working with tab-delimited files, and there are some empty values.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Hello, Thank you for your time..I added the block after split but it keeps warning me.. I checked the output of the tab-delim file aswell..Doesn't seem to be a problem there.. it just outputs something like the following
        split missed, myresultfile.txt: 61283067 61283865 798 0.57412