in reply to Re: Read files not subdirectories
in thread Read files not subdirectories

Could I ask another question, please? The code below runs, but fails to write/save the HTML-stripped text files. With a simple print statement, I've determined that the "second" WHILE statement must return FALSE, as the program never makes it this far. I am grateful for any insight!

#! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); #Where I will store the end results; my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; #Where the files with the HTML tags are located; my $files_dir = 'C:\Dwimperl\Perl\1993'; #Open the directory where the target files with HTML tags are located; + #Why am I doing this? Stores file names in a directory handle? opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di +r' <$!>"; #Loop through each entry/file in the directory; #What is readdir doing here? It's not really reading anything; #Is it simply advancing us to the next entry?; #Seems like the real READ occurs via the OPEN statement below; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; #Open the current file so I can strip the HTML tags ??? ; open my $file_handle, '<', $file or die "failed to open '$file' <$ +!>"; #Read the current file one line at a time??; while (my $line = <$file_handle>) { ########The WHILE statement above must return FALSE cuz the program ne +ver makes it here; #Strip the HTML tags??; my $clean_text = $hs->parse( ' ' ); #Save the clean (no HTML tags) text file in a new file/locatio +n??; print $write_dir "$file\n"; $hs->eof; } } close(); closedir $dir_handle;

Replies are listed 'Best First'.
Re^3: Read files not subdirectories
by poj (Abbot) on Jan 30, 2015 at 17:31 UTC

    Is your script located in the same folder as the html files ?. If not add the directory to get the full path like this

    #!perl use strict; use warnings; my $files_dir = 'C:\Dwimperl\Perl\1993'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "$filename\n"; }
    poj

      I'm guessing you want to process each line and write it out (untested)

      #!perl use strict; use warnings; use HTML::Strip; my $hs = HTML::Strip->new(); my $files_dir = 'C:\Dwimperl\Perl'; my $write_dir = 'G:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)){ next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { my $clean_text = $hs->parse($line); print $fh_out "$clean_text\n"; ++$count; } $hs->eof; print "$count lines read from $filename\n;" }
      poj
      Works! Very grateful for you time and patience with me. You're the best!
      Hi poj, your script will print the file names. Where are we going here?

      Hi poj, corrected a couple of stupid things on my part (e.g., ensuring my portable hard drive is available/plugged in, and actually opening the output file for output). Now gives me a "failed to open" for the output file at line 12. Here is the revised code. I apologize for the hassle.

      #! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; use HTML::Strip; my $hs = HTML::Strip->new(); #Where I will store the end results; my $write_dir = 'F:\research\sec filings 10k and 10Q\data\filing docs\ +1993\Clean'; open (my $outfile_hand, '>', $write_dir) || die "failed to open '$writ +e_dir' <$!>"; #Where the files with the HTML tags are located; my $files_dir = 'C:\Dwimperl\Perl';#\1993'; #Open the directory where the target files with HTML tags are located; + #Why am I doing this? Stores file names in a directory handle? opendir (my $dir_handle, $files_dir) || die "failed to open '$files_di +r' <$!>"; #Loop through each entry/file in the directory; #What is readdir doing here? It's not really reading anything; #Is it simply advancing us to the next entry?; #Seems like the real READ occurs via the OPEN statement below; while (my $file = readdir($dir_handle) ) { next unless -f $file; #next if $file eq '.' or $file eq '..'; #Open the current file so I can strip the HTML tags ??? ; open my $file_handle, '<', $file or die "failed to open '$file' <$ +!>"; #Read the current file one line at a time??; while (my $line = <$file_handle>) { ########The WHILE statement above must return FALSE cuz the program ne +ver makes it here; #Strip the HTML tags??; my $clean_text = $hs->parse( ' ' ); #Save the clean (no HTML tags) text file in a new file/locatio +n??; print $outfile_hand "$file\n"; $hs->eof; } } close(); closedir $dir_handle;

        You are trying to open a file handle to directory

        open (my $outfile_hand, '>', $write_dir

        Try

        open (my $outfile_hand, '>', $write_dir.'/clean.txt'
        poj
      We're close, writes the files to output location, but the files are empty (size 0 kb). Ideas?