Phoebus2000 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to process all text files in a directory through Lingua Fathom and output it into a spreadsheet.

I have adapted the code into what I thought was right but I have a very odd error. All files are read, but only the files I have already processed individually (as a test) get their text read for readability. The code sees all files but only outputs for files that have already been read, yet if I process individually it can read the text and will then register it next time I do a directory scan.

Any ideas what I need to change?

Thanks,
#! /usr/local/bin/perl # use warnings; use Path::Class; use autodie; # die if problem reading or writing a file use Lingua::EN::Fathom; use Excel::Writer::XLSX; my $workbook = Excel::Writer::XLSX->new('output.xls'); my $worksheet = $workbook->add_worksheet(); my $file_handle = $worksheet->write(); my $text = new Lingua::EN::Fathom; opendir(DH, "./new_text"); my @files = readdir(DH); closedir(DH); foreach my $file (@files) { $text->analyse_file($file); $accumulate = 1; $text->analyse_block($text_string,$accumulate); $num_chars = $text->num_chars; $num_words = $text->num_words; $percent_complex_words = $text->percent_complex_words; $num_sentences = $text->num_sentences; $num_text_lines = $text->num_text_lines; $num_blank_lines = $text->num_blank_lines; $num_paragraphs = $text->num_paragraphs; $syllables_per_word = $text->syllables_per_word; $words_per_sentence = $text->words_per_sentence; %words = $text->unique_words; foreach $word ( sort keys %words ) { print("$words{$word} :$word\n"); } $fog = $text->fog; $flesch = $text->flesch; $kincaid = $text->kincaid; print($text->report); # Create a format for the book my $header = $workbook->add_format(); $header->set_bold(); my $percent_style = $workbook->add_format(); $percent_style->set_num_format('%'); # Add the line to the file foreach my $search ($text) { $worksheet->write(0, 0, "Filename",$header); $worksheet->write($count, 0, $file); $worksheet->write(0, 1, "Fog index",$header); $worksheet->write($count, 1, $fog); $worksheet->write(0, 2, "Flesch index",$header); $worksheet->write($count, 2, $flesch); $worksheet->write(0, 3, "Flesch-Kincaid index",$header); $worksheet->write($count, 3, $kincaid); $worksheet->write(0, 4, "Number of characters",$header); $worksheet->write($count, 4, $num_chars); $worksheet->write(0, 5, "Number of words",$header); $worksheet->write($count, 5, $num_words); $worksheet->write(0, 6, "Percent complex words",$header); $worksheet->write($count, 6, $percent_complex_words); $worksheet->write(0, 7, "Number of sentences",$header); $worksheet->write($count, 7, $num_sentences); $worksheet->write(0, 8, "Number of text lines",$header); $worksheet->write($count, 8, $num_text_lines); $worksheet->write(0, 9, "Number of blank lines",$header); $worksheet->write($count, 9, $num_blank_lines); $worksheet->write(0, 10, "Number of paragraphs",$header); $worksheet->write($count, 10, $num_paragraphs); $worksheet->write(0, 11, "Number of syllables per word",$header); $worksheet->write($count, 11, $syllables_per_word); $worksheet->write(0, 12, "Number of words per sentence",$header); $worksheet->write($count, 12, $words_per_sentence); $count++; } } $workbook->close();

Replies are listed 'Best First'.
Re: Scanning a directory's files for readability
by Cristoforo (Curate) on Jan 18, 2014 at 19:35 UTC
    One problem I can see is that you need to prepend the directory to each file. readdir returns just the filename and not the complete path.

    $text->analyse_file("./new_text/$file");

      Thank you monks for your swift revelation of wisdom. I ended up using Cristoforo's solution and it worked.
Re: Scanning a directory's files for readability
by kcott (Archbishop) on Jan 19, 2014 at 09:01 UTC

    G'day Phoebus2000,

    Welcome to the monastery.

    "I am trying to process all text files in a directory"

    [Note: this extends what ++Cristoforo posted.]

    After "my @files = readdir(DH);", @files contains all filenames found in the directory being read. This will include all directories (at least the current ('.') and parent ('..') directories), symbolic links, named pipes and so on.

    Try something along these lines (untested) to get what you want:

    my $dir = './new_text'; opendir my $dh, $dir; my @files = grep { -f } map { "$dir/$_" } readdir $dh; closedir $dh;

    '-f' is described, along with other file test operators, in -X.

    Consider File::Spec for a more portable solution than "$dir/$_".

    -- Ken

      Found some funny results when testing the -f operator versus ! -d on my Windows 7 system.
      C:\Old_Data\perlp>perl -wE "opendir DIR, 'CSV';say for grep -f, readdi +r DIR" dat.txt DBD_CSV.pl my_db.txt my_db_2.txt o33.csv o33.txt o44.txt o55.csv text_csv.pl zip_codes.csv C:\Old_Data\perlp>perl -wE "opendir DIR, 'CSV';say for grep ! -d, read +dir DIR" csvfile.pl dat.txt DBD_CSV.pl dbd_csv_2tables.pl mytable my_db.txt my_db_2.txt name_cols_csv.pl o33.csv o33.txt o44.txt o55.csv placeholder.pl pmonk814937.pl selectall_hashref.pl test1.txt test_class_csv.pl text_csv.pl using_DBD_CSV.pl zip_codes.csv C:\Old_Data\perlp>
      -f doesn't get all the files that ! -d does.
        readdir returns dat.txt, because it finds CSV/dat.txt. -f, on the other hand, runs on dat.txt only, i.e. ./dat.txt. ! -d is true for files that exists in CSV, but do not exist in the current directory.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        I see that's been answered by ++choroba.

        The "map { "$dir/$_" }", in my suggested solution, provided "grep { -f }" with the pathname (as opposed to just a filename).

        I might also point out that while '! -d' will be true for the '-f' files, it may also be true for any '-l', '-S', '-b' or '-c' files.

        -- Ken

Re: Scanning a directory's files for readability (Path::Class)
by Anonymous Monk on Jan 18, 2014 at 23:32 UTC
    You have Path::Class so you shouldn't even be thinking about readdir -- readdir is the devil

    my @files = dir( $dirpath )->absolute->children();