wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks, I have a program below that crashes as it attempts to perform an OPEN command. Specifically, the program crashes at: open(INPUT, "$inddirect/company$qtr$yr.idx") || die "file for company$qtr$yr.idx: $!"; Although a novice, it seems obvious that PERL is unable to find this file in the specified location. However, I can see the file in this location. Is there something else that I am overlooking? I am grateful for any insight here. BTW, I inherited the program, and I recognize that it is terribly inefficient. Thank you for your time and patience!

use Tie::File; use Fcntl; #First year you want downloaded files for for: my $startyear=2014; #Last year you want files for: my $endyear=2014; #First qtr you want files for (usually 1): my $startqtr=1; #Last year you want files for (usually 4): my $endqtr=4; #The directory you want your index files to be stored in. my $inddirect="/Volumes/EDGAR1/Edgar/full-index"; #The directory you are going to download filings to my $direct="/Volumes/EDGAR1/Edgar/Edgar2/10K_10Q"; #The file that will contain the filings you want to download. my $outfile="/Volumes/EDGAR1/Edgar/sizefiles1.txt"; my $formget1='(10-K )'; my $formget2='(10-K405 )'; my $formget3='(10KSB )'; my $formget4='(10-KSB )'; my $formget5='(10KSB40 )'; my $formget6='(10-KT )'; my $formget7='(10KT405 )'; my $formgetq1='(10-Q )'; my $formgetq2='(10QSB )'; my $formgetq3='(10-QSB )'; my $formgetq4='(10-QT )'; #if using windows, set to "\\" - if mac (or unix), set to "/"; my $slash='/'; #loop through all the index years you specfied for($yr=$startyear;$yr<=$endyear;$yr++) { #loop through all the index quarters you specified if($yr<$endyear){$eqtr=4}else{$eqtr=$endqtr} for($qtr=$startqtr;$qtr<=$eqtr;$qtr++) { #Open the index file open(INPUT, "$inddirect/company$qtr$yr.idx") || die "file for company$ +qtr$yr.idx: $!"; #Open the file you want to write to. The first time through #the file is opened to "replace" the existing file. #After that, it is opened to append ">>". if ($yr==$startyear && $qtr==$startqtr) {$outfiler=">$outfile";} else{$outfiler=">>$outfile";} open(OUTPUT, "$outfiler") || die "file for 2006 1: $!"; $count=1; while ($line=<INPUT>) { #ignore the first 10 lines because they only contain header informatio +n if ($.<11) {next}; $form_type=substr($line,62,12); my $cik=substr($line,74,10); $file_date=substr($line,86,10); $file_date=~s/\-//g; my $fullfilename=trim(substr($line,98,43)); if ($form_type=~/^$formget1(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget2(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget3(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget4(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget5(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget6(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formget7(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formgetq1(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formgetq2(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formgetq3(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } elsif ($form_type=~/^$formgetq4(?!\/)/) { print OUTPUT "$fullfilename\n" ; $count++; } #if ($count>10){last;} #end of the while loop <INPUT> } close(INPUT); close(OUTPUT); # check to see if directory exists. If not, create it. unless(-d "$direct$slash$yr"){ mkdir("$direct$slash$yr") or die; } #Open the directory and get put the names of all files into the array +@old opendir(DIR,"$direct$slash$yr")||die "Can't open directory"; @Old=readdir(DIR); tie(@New1,Tie::File,"$outfile", mode=> O_RDWR) or die "Cannot tie file BOO: $!n"; %seen=(); #defines an array called @aonly. @aonly=(); foreach $item(@Old){$seen{$item}=1} foreach $item(@New1){ $item=~/(edgar\/data\/.*\/)(.*\.txt)/; unless($seen{$item}){ push(@aonly,$item); } } ftpsignin(); foreach $filetoget(@aonly) { # $filetoget=trim($filetoget); $fullfile="/$filetoget"; $fonly=$filetoget; #Don't forget to put your directory in here. $fonly=~s/.*\/(.*)/$direct$slash$yr$slash$1/; #$ftp->get("$fullfile", "$fonly") #or warn "can't get file",ftpsignin(),next; # "cannot get file",$ +ftp->message, next; } $ftp->quit; #end of qtr loop } #end of year loop } sub ftpsignin { use Net::FTP; $ftp = Net::FTP->new("ftp.sec.gov", Debug => 0, Passive => 1) or die "Cannot connect to some.host.name: $@"; $ftp->login("anonymous",'-anonymous@') or next; #die "Cannot login ", $ftp->message; $ftp->binary(); # set binary mode } sub trim { my $new_phrase; my $phrase = shift(@_); $phrase =~ s/^\s+//; $phrase =~ s/\s+$//; $new_phrase = "$phrase"; return "$new_phrase"; }

Replies are listed 'Best First'.
Re: file not found with OPEN
by choroba (Cardinal) on Jan 06, 2015 at 17:53 UTC
    What error do you get?

    Moreover, it's better to use the same path both in open and die to avoid confusion.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Error is "no such file or directory" ... he says completely embarrassed.
Re: file not found with OPEN
by Laurent_R (Canon) on Jan 06, 2015 at 18:09 UTC
    Just some plain vanilla checks. Check that the path is really really correct. Same for the file name. Chek the file's owner and permissions. Can you open the file with a text editor?
      A text editor such as Notepad will not open the file, which I suspect is due to the IDX nature of the file (i.e, index file).
      Question: using "Windows Explorer", I see that the "file opens with Excel." Could that be the culprit?
        You would probably have problems reading it if it were an Excel file, but the error message says "file not found". BTW, Windows will say "Excel file" for any file having the CSV extension, regardless of the content.
Re: file not found with OPEN
by nlwhittle (Beadle) on Jan 06, 2015 at 18:13 UTC

    According to your code, the first file it should open is:
    /Volumes/EDGAR1/Edgar/full-index/company12014.idx
    Does this file exist?

    --Nick
      Yes, I see the file via Windows Explorer.

        I set up a directory and file structure to match what your code is looking for and ran the portion of it that tries to open the files. It seems to work. I'm using Cygwin so the directory structure is UNIX-like.

        I think you are assuming something incorrect about where your file(s) are located. Could you maybe paste in the complete path to your files?

        --Nick

        Your error message should contain the file name that it's dying on, what is that file name?

        --Nick

        Try adding "C:" to the front of your filepaths (i.e. "C:/Volumes/EDGAR/..etc..")

        --Nick
Re: file not found with OPEN (reafactor)
by Anonymous Monk on Jan 06, 2015 at 20:57 UTC
    Some tips, use Path::Tiny so you don't have to readdir among other things, like this
    use Path::Tiny qw/ path /; if( not path( $direct, $yr )->exists ) { path( $direct, $yr )->make_path; } @Old = path( $direct, $yr )->children;

    Path::Tiny dies when it can't make_path or children or openread

    my $INPUT = path( $inddirect, "company$qtr$yr.idx" )->openr; my $OUTPUT; if( $yr == $startyear && $qtr == $startqtr ) { $OUTPUT = path( $outfil )->openw; ## clobber } else { $OUTPUT = path( $outfil )->opena; ## append }

    Also, no need for that giant if/else block, use an array

    my @FormGetRes = ( '(10-K )', '(10-K405 )', '(10KSB )', '(10-KSB )', '(10KSB40 )', '(10-KT )', '(10KT405 )', '(10-Q )', '(10QSB )', '(10-QSB )', '(10-QT )', ); @FormGetRes = map { qr{^$_(?!/)}s } @FormGetRes; RELOOP: for my $re ( @FormGetRes ) { if( $form_type =~ $re ) { print OUTPUT "$fullfilename\n"; $count++; last RELOOP; } }
      Another tip, this one about subroutines, they should return values not work on global variables, meaning
      my $ftp = ftpsignin();; sub ftpsignin { use Net::FTP; my $ftp = Net::FTP->new( "ftp.sec.gov", Debug => 0, Passive => 1 ) or die "Cannot connect to some.host.name: $@"; $ftp->login( "anonymous", '-anonymous@' ) or next; #die "Cannot login ", $ftp->message; $ftp->binary(); # set binary mode return $ftp; ### RETURN VALUE } ## end sub ftpsignin
      Why? coping with scoping explains
        Thank you for the help!!! I am grateful. On another note, I thought I had thanked everyone for their help with this issue. However, I somehow missed thanking RLAURENT, and think I ruffled his feathers. But I wanna make it right. How do I reply to his last message?

      Another tip is format/annotate your code with perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while for " -otr -opr -ce -nibc -i=4 -pt=0 "-nsak=*"

      it produces output like

      #end of qtr loop } ## end for( $qtr = $startqtr ;...) #end of year loop } ## end for( $yr = $startyear ;...) } ## end sub ftpsignin

      While these comments can help, if you keep your loops/subs relatively small, you won't rely on them as much :)

      I commonly use

      ## perltidy -olq -csc -csci=3 -cscl="sub : BEGIN END " -otr -opr -ce +-nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if " -otr -opr +-ce -nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while " -otr + -opr -ce -nibc -i=4 -pt=0 "-nsak=*" ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if while for " +-otr -opr -ce -nibc -i=4 -pt=0 "-nsak=*"

      Another tip, make more subs, give them good names, give variables good names, ... maybe BeeWork( $indexStore, $filingsStore, $filingsFile, $startyear, $endyear, $startqtr, $endqtr  ); ??? :D

      If you make subs your loops can shrink

      ## yearloop { ## quarterloop { ... BeeIO( $infile, $outfile , \@FormGetRes ); #~ my $downloads = path( $filingsStore, $yr ); my $dlyear = path( $direct, $yr); my @aonly = BeeAonly( $dlyear, $outfile ); BeeFtpGet( \@aonly , $dlyear ); } }

      Why subs? If there is a problem in BeeAOnly, you only have to debug BeeAOnly, you don't have to debug all of BeeWork

Re: file not found with OPEN
by locked_user sundialsvc4 (Abbot) on Jan 06, 2015 at 18:58 UTC

    Try putting the file name into another string, then printing that string to the STDERR log file.   Let the program tell you what it is doing.

    my $debugging = 1; my $filename = "$inddirect/company$qtr$yr.idx"; print STDERR "Opening '$filename'\n" if ($debugging); open(INPUT, $filename) || die "can't open '$filename'"; print STDERR "'$filename' opened successfully\n" if ($debugging);

      I added "C:" prior to the directory spec which solved the problem. I am grateful for your time and patience! Apologize for wasting your time!!