in reply to Re: Read files not subdirectories
in thread Read files not subdirectories

Could I please ask another question? After using "next unless -f $file" the program runs, but fails to execute anything thereafter. As a test, I inserted a simple PRINT statement immediately after the "next unless" statement, and received nothing. If I uncomment the "next if" statement, and omit the "next unless", then the simple PRINT statement works, but the program crashes trying to execute the write statement. In sum, it seems that the "next unless" filters out all obs. Make any sense?

#! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; my $files_dir = 'C:\Dwimperl\Perl\1993'; opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di +r' <$!>"; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed t +o open '$file' <$!>"; while (my $line = <$file>) { my $clean_text = $hs->parse( ' ' ); print $write_dir "$file\n"; $hs->eof; } } close(); closedir $dir_handle;

Replies are listed 'Best First'.
Re^3: Read files not subdirectories
by parv (Parson) on Jan 30, 2015 at 03:17 UTC

    Consult a beginner level Perl book ("Beginner Perl" for an example) to understand difference between file and file handle; currently selected file handle for print & its various forms.

    ... my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; ... opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di +r' <$!>"; while (my $file = readdir($dir_handle) ) { ... open my $file_handle, "/dwimperl/perl/1993/$file" or die "failed +to open '$file' <$!>"; while (my $line = <$file>) {

    Actually use the file handle, not a file path, to read a line.

    ... print $write_dir "$file\n"; ...

    The directory path is not a file handle but a string. If there is none such open file handle, print will fail. To write to a file for a specific file handle, open the file in write mode; use print FILEHANDLE LIST syntax; see print.

    To copy or move files, see File::Copy.

Re^3: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 03:24 UTC
    Thank you! Apologize for the inconvenience.

      You are welcome. I was not inconvenienced to point out the errors. Acutally, OP's reply may not be a direct reply to me as it was reply to OP's own post. Then again, that might just be the case of not being familiar with perlmonks.

Re^3: Read files not subdirectories
by locked_user sundialsvc4 (Abbot) on Jan 30, 2015 at 15:15 UTC

    On many systems, doing something to a file ... even, just opening it ... can interfere with a directory-scan, causing it to end prematurely, to list the same file more than once, and so on.   (And this would be true no matter what high-level language e.g. Perl was being used to do it.)

    Therefore, I suggest that you first retrieve the entire list of files into an in-memory list ... which you can very easily do in Perl just by using the list context.   Then, iterate through the in-memory list that you have just retrieved, checking to see if they are or aren’t directories and so-on.   Start and finish the task of retrieving the list, for any given directory that you are now “in” ... then process the list.

    Of course, “file finding” is such a common requirement that there are many CPAN modules like File::Find.   If you need to “take a walk through a directory tree,” there are plenty of tour-guides . . .