in reply to Re^2: Read files not subdirectories
in thread Read files not subdirectories

Is your script located in the same folder as the html files ?. If not add the directory to get the full path like this

#!perl use strict; use warnings; my $files_dir = 'C:\Dwimperl\Perl\1993'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "$filename\n"; }
poj

Replies are listed 'Best First'.
Re^4: Read files not subdirectories
by poj (Abbot) on Jan 30, 2015 at 18:05 UTC

    I'm guessing you want to process each line and write it out (untested)

    #!perl use strict; use warnings; use HTML::Strip; my $hs = HTML::Strip->new(); my $files_dir = 'C:\Dwimperl\Perl'; my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { my $clean_text = $hs->parse($line); print $fh_out "$clean_text\n"; ++$count; } $hs->eof; print "$count lines read from $filename\n;" }
    poj
Re^4: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 17:44 UTC
    Hi poj, your script will print the file names. Where are we going here?
Re^4: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 18:05 UTC

    Hi poj, corrected a couple of stupid things on my part (e.g., ensuring my portable hard drive is available/plugged in, and actually opening the output file for output). Now gives me a "failed to open" for the output file at line 12. Here is the revised code. I apologize for the hassle.

    #! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); #Where I will store the end results; my $write_dir = 'F:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; open (my $outfile_hand, '>', $write_dir) || die "failed to open '$writ +e_dir' <$!>"; #Where the files with the HTML tags are located; my $files_dir = 'C:\Dwimperl\Perl';#\1993'; #Open the directory where the target files with HTML tags are located; + #Why am I doing this? Stores file names in a directory handle? opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di +r' <$!>"; #Loop through each entry/file in the directory; #What is readdir doing here? It's not really reading anything; #Is it simply advancing us to the next entry?; #Seems like the real READ occurs via the OPEN statement below; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; #Open the current file so I can strip the HTML tags ??? ; open my $file_handle, '<', $file or die "failed to open '$file' <$ +!>"; #Read the current file one line at a time??; while (my $line = <$file_handle>) { ########The WHILE statement above must return FALSE cuz the program ne +ver makes it here; #Strip the HTML tags??; my $clean_text = $hs->parse( ' ' ); #Save the clean (no HTML tags) text file in a new file/locatio +n??; print $outfile_hand "$file\n"; $hs->eof; } } close(); closedir $dir_handle;

      You are trying to open a file handle to directory

      open (my $outfile_hand, '>', $write_dir

      Try

      open (my $outfile_hand, '>', $write_dir.'/clean.txt'
      poj
Re^4: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 18:22 UTC
    We're close, writes the files to output location, but the files are empty (size 0 kb). Ideas?
Re^4: Read files not subdirectories
by wrkrbeee (Scribe) on Jan 30, 2015 at 18:27 UTC
    Works! Very grateful for you time and patience with me. You're the best!