harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

This may be a very 'outdated' question, but I wrote a simple perl program and it seems to work fine with TinyPerl 5.8 on Windows, but it displays an error when I try to run it with old DOS perl 5.004_02. The error code is :

D:\DOS\PERL>perl K:\desktop\istext.pl .\
Use of uninitialized value at K:\desktop\istext.pl line 191.
Bad symbol for filehandle at K:\desktop\istext.pl line 191.

It seems that the old perl interpreter has an issue with opendir($DIR, $PATH); If I try to initialize the $DIR variable as $DIR = 0; then it stops working when the first directory is encountered. If I try opendir(DIR, $PATH) then again, it works as long as I don't go into a sub-directory. Then it loses track.. So, how did people overcome this issue before any newer versions of Perl existed?

Here is my script that uses the opendir() function:

#!/usr/bin/perl -w use strict; use warnings; my $RECURSIVE = 0; my $MSWIN = index($^O, 'MSWin') >= 0 ? 1 : 0; my $SKIP = '.ZIP '; # File extensions to ignore (GLOBAL) if (@ARGV == 0) # Print usage { print "\n"; print '=' x 80; print "ISTEXT.PL shows you what percentage of a file is binary data +vs plain text.\nAnd it also shows whether the file is a DOS text file +, Linux text file, OSX,\nmixed, or \"undetermined.\""; print "\n\n(\"Undetermined\" means that the file contains no line br +eaks at all. A DOS text\nfile contains CR-LF pairs. Linux text files +have LF characters only. OSX text\nfiles contain CR characters only. +And \"mixed\" means that an unequal number of\nCR and LF characters a +re both present.)\n\n"; print 'Files with DOS line breaks will be represented with $$ signs. + Files with Linux'; print "\nline breaks will be represented with /////. OSX files will +be represented with\nletter \"O\" and mixed lines will be x x x. Bina +ry data will be shown as bbbbbb.\nExample:\n\n"; print " 4KB |/////////////| /home/you/Desktop/notes.txt <--- Linu +x text file\n"; print " 1KB |ttttttttttttt| /home/you/Desktop/url.txt <--- text w +ith no line breaks\n"; print " 43KB |x x x x x x x| /home/you/Desktop/list.nfo <--- mixed + text file\n"; print ' 2KB |$$$$$$$$$$|bbb /home/you/Desktop/IE.lnk <--- This +means the file'; print "\n\thappens to have equal number of CR and LF chars, (so we a +ssume that\n\t it is most likely a DOS text file), while the letter +b's indicate\n\t\tthat it also contains about 25% binary data.\n"; print "\nUSAGE: perl istext.pl <directory or filename>\n\nUsing a di +rectory name, you must end the dir with a forward slash or backslash. +\nEnd the path with two slashes to run a recursive scan: perl isText. +pl /etc//\n\n"; exit; } my $P = join(' ', @ARGV); # Testing a file or a directory? if (EndsWithSlash($P)) { # The following file extensions will be skipped: $SKIP .= '.ZIP .LNK .MID .PDF .MP3 .MOV .M4A .MP4 .AVI .VOB .WMV .WM +A .WAV .3GP .MPG .FLV .MKV .SWF .WEBM .OGG .MP2 .AAC .AC3 .TS .DOC .D +OCX .RAR .GZ .TGZ .CAB .EXE .COM .DLL .PNG .JPG .BMP .GIF .ICO .PCX . +DRV .MSI .CAT .OCX .SYS .CPL .CPX .NLS .CHM .TTF .FON .MST .AX .TSP . +DB .MBX .ANI .JAR .GADGET '; $RECURSIVE = EndsWithSlash($P); CheckDIR($P); exit 0; } else { exit CatchFile($P); # It's a file! } ################################################################# # # This function scans a string looking for special characters # and determines what percentage of the string is plain text # and also tries to determine the text format. # # Returns an integer whose lower 8 bits is the percentage (0-100). # Bit 9 will be set if any LF characters were found. # Bit 10 will be set if any CR characters were found. # Bit 11 will be set if there are equal number of CR and LF # characters in the string. These can be interpreted as follows: # # 000 = Format is undetermined. # 001 = LINUX string (LF only) # 010 = OSX string (CR only) # 011 = MIXED format # 111 = DOS text (CR-LF pairs) # # Usage: INTEGER = isText(STRING) # sub isText { defined $_[0] or return 100; my $L = length($_[0]); $L or return 100; # We will simply count the number of plain text characters # and the number of CR and LF characters in the string. my $TOTAL = $L; # Total length of string my $C; my $TX = 0; # Number of plain text characters my $CR = 0; # Number of 0D characters my $LF = 0; # Number of 0A characters while ($L--) { $C = vec($_[0], $L, 8); next if ($C > 126); if ($C > 31 || $C == 9) { $TX++; next; } $LF++ if ($C == 10); $CR++ if ($C == 13); } # Now, we will try to determine what type of string # we're dealing with. There are 5 possibilities: # LINUX, DOS, OSX, MIXED, or "undetermined." # # Explanation of formats: # * OSX files contain CR characters as line break. # * Linux text files contain LF characters as line break. # * DOS text files contain an equal number of CR and LF # characters in pairs. # * "MIXED" means that the string contains an unequal number of # both CR and LF characters, so this may be a binary string. # * "Undeteremined" means that the string does not contain # any line break characters at all, so it could be either # a DOS text or Linux text or anything. $C = $LF ? 0x100 : 0; # We use $C to store the string format. $C |= 0x200 if ($CR); $C |= 0x400 if ($CR == $LF); # The percentage is stored in the lower 7 bits, # and the format is stored in bits 9-11. return $C | int(($TX+$LF+$CR) / $TOTAL * 100); } ################################################################# # # This function is automatically called by CheckDIR() every time # a file is found. This function gets the full name of the file. # Returns the percentage as an integer (0-100). The value "100" # means that 100% of the file is plain text with no binary # characters in it at all. # # Usage: INTEGER = CatchFile(FULLNAME) <-- Called by CheckDIR() # sub CatchFile { my $F = _FileName(\@_); # Remove unsafe characters from file name # Check file extension to see if we should skip this file if (length($F) > 4) { my $EXT = rindex($F, '.'); if ($EXT >= 0) { $EXT = uc(substr($F, $EXT, length($F))) . ' '; return -1 if (index($SKIP, $EXT) >= 0); # SKIP FILE } } -e $F or return 0; # File exists? # To save time, we're only going to process small files... my $FILE_SIZE = -s $F; if ($FILE_SIZE > 9999999) { # Print file size in Megabytes printf("%dMB\t too big ", int(($FILE_SIZE+999999) / 1000000) + ); } else { # Print file size in Kilobytes printf("%dKB\t", int(($FILE_SIZE+999) / 1000) ); # ANALYZE FILE CONTENTS... my $S = ReadFile($F); my $FORMAT = isText($S); my $PERCENT = $FORMAT & 255; $FORMAT >>= 8; $S = ''; if ($FORMAT == 1) { $S = '/' x 13; } # LINUX if ($FORMAT == 2) { $S = 'O' x 13; } # OSX if ($FORMAT == 3) { $S = 'x ' x 7; } # MIXED if ($FORMAT == 4) { $S = 't' x 13; } # undetermined if ($FORMAT == 7) { $S = '$' x 13; } # DOS my $VISUAL = int($PERCENT * .13); print '|', substr($S, 0, $VISUAL), '|', 'b' x (13-$VISUAL); } print " $F\n"; } ################################################################# # # This function reads the contents of a folder and calls # CatchFile() for each file that was found. # Usage: CheckDIR(PATH) # sub CheckDIR { my $PATH = defined $_[0] ? $_[0] : ''; length($PATH) or return; # Change / to \ on Windows computers if ($MSWIN) { $PATH =~ tr#/#\\#; } # print "Reading directory: $PATH\n"; # Make sure that PATH ends with a backslash or forward slash if (index("/\\", substr($PATH, length($PATH)-1, 1)) < 0) { $PATH .= ($MSWIN ? "\\" : '/'); } my $DIR; my $FULLNAME; opendir($DIR, $PATH) or return; my $NAME = 1; while ($NAME) { $NAME = readdir($DIR); defined $NAME or last; $FULLNAME = "$PATH$NAME"; if (-d($FULLNAME)) { # Check into subdirectory if RECURSIVE == 1 # Skip directory if its name starts with "." if ($RECURSIVE) { CheckDIR($FULLNAME) unless (vec($NAME, 0, 8) == 46); } next; } CatchFile($FULLNAME); } closedir($DIR); } ################################################################# # # Checks if the argument string ends with a forward slash or # backslash, and if it does, then removes it and returns 1, # or returns 0 if no slash was found at the end of the string. # Usage: INTEGER = EndsWithSlash(STRING) # sub EndsWithSlash { my $P = defined $_[0] ? $_[0] : ''; length($P) or return 0; index("\\/", substr($P, length($P)-1, 1)) >= 0 or return 0; chop $_[0]; return 1; } ################################################################# # Usage: STRING = _FileName(\@_) - Removes the first argument from @_ +just like shift() does and returns a file name. This function does no +t check syntax, but it does remove some illegal characters (<>|*?) fr +om the name that obviously should not occur in a file name. If the fi +le name doesn't contain any valid characters, then returns an empty s +tring. sub _FileName { @_ or return ''; my $N = shift; $N = shift(@$N); defin +ed $N or return ''; length($N) or return ''; my $c; my $j = 0; my $V += 0; for (my $i = 0; $i < length($N); $i++) { $c = vec($N, $i, 8); ne +xt if ($c == 63 || $c == 42 || $c < 32); last if ($c == 60 || $c == 6 +2 || $c == 124); if ($c > 32) { $V = $j + 1; } if ($V) { $i == $j or +vec($N, $j, 8) = $c; $j++; } } return substr($N, 0, $V); } # Usage: STRING = ReadFile(FILE_NAME, [BYTES_TO_READ, [START]]) - Read +s an entire file in binary mode and returns the contents in one strin +g. A second argument may be provided to read only a certain number of + bytes. And a third argument may be provided to set the file pointer +to a certain address before reading. sub ReadFile { my $F = _FileName(\@_); length($F) or return ''; my $L += @_ ? shift : 99999999; defined $L or return ''; my $A = @_ ? shift +: 0; defined $A or $A = 0; -f $F or return ''; -s $F or return ''; my + $B; open FH, "<$F" or return ''; binmode FH; if ($A) { sysseek(FH, $ +A, 0); } sysread FH, $B, $L; close FH; defined $B or return ''; retur +n $B; } #################################################################

Replies are listed 'Best First'.
Re: script fails with perl 5.004_02
by rjt (Curate) on Oct 27, 2019 at 07:44 UTC

    opendir($DIR, ...) doesn't work because lexical filehandles weren't available in perl 5.004. But opendir(DIR, ...) doesn't work either, because your checkDir subroutine is recursive, but you're sharing the same filehandle between all subroutine calls.

    You can get around this by reading the entire directory (i.e., get the full list of filenames in the current directory) before your recursion step.

    On a side note, your example is way, way longer than it needs to be. Boil down your example into the smallest bit of code that reproduces your problem. In so doing, you might have noticed that it only happened when you tried to do it recursively.

      Oh, thank you! And yes, I will condense this code..
Re: script fails with perl 5.004_02
by jwkrahn (Abbot) on Oct 27, 2019 at 21:39 UTC
    It seems that the old perl interpreter has an issue with opendir($DIR, $PATH); If I try to initialize the $DIR variable as $DIR = 0; then it stops working when the first directory is encountered. If I try opendir(DIR, $PATH) then again, it works as long as I don't go into a sub-directory. Then it loses track.. So, how did people overcome this issue before any newer versions of Perl existed?
    my $DIR; my $FULLNAME; opendir($DIR, $PATH) or return;

    Back then you had to use local with a typeglob:

    local *DIR; my $FULLNAME; opendir(DIR, $PATH) or return;
      Back then you had to use local with a typeglob:

      But either way, recursing directories while keeping their handles open will eventually run out of file handles.

      Demo, repeatedly opening the current directory just to use up all available handles:

      #!/usr/bin/perl use strict; use warnings; sub openHandle { opendir my $dir,'.' or die "Could not opendir: $!"; return $dir; } my @a; while (1) { push @a,openHandle(); print "Handles in use: ",0+@a,"\n"; }

      Running on Linux 64 bit, using perl 5.22.2:

      // previous lines removed Handles in use: 1015 Handles in use: 1016 Handles in use: 1017 Handles in use: 1018 Handles in use: 1019 Handles in use: 1020 Handles in use: 1021 Could not opendir: Too many open files at handles.pl line 8.

      Add three handles for STDIN, STDOUT, STDERR, and you get a total of 1024 handles. Note that no other files, directories, or network connections are currently open.

      Running on Windows 7 64 bit, Strawberry Perl 5.14.2 64 bit:

      // previous lines removed Handles in use: 27781 Handles in use: 27782 Handles in use: 27783 Handles in use: 27784 Handles in use: 27785 Terminating on signal SIGINT(2) H:\tmp>

      It takes several seconds to clean up the mess after pressing Ctrl-C.

      According to MS Technet, Windows can open about 16.7 million handles.

      MSDOS, in its default configuration, had a file handle limit of just 8, including STDIN, STDOUT, STDERR, plus two extra default handles for a communication port and a printer port. This left software with just three handles. Using the FILES directive in config.sys, you could increase that to 255 handles, a typical value was 20 or 30.

      If you want to play save, close handles as soon as possible. For recursing directories, read the entire content, close the handle, and only then, process the content. Of course, this may need more memory.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)