harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:
D:\DOS\PERL>perl K:\desktop\istext.pl .\
Use of uninitialized value at K:\desktop\istext.pl line 191.
Bad symbol for filehandle at K:\desktop\istext.pl line 191.
It seems that the old perl interpreter has an issue with opendir($DIR, $PATH); If I try to initialize the $DIR variable as $DIR = 0; then it stops working when the first directory is encountered. If I try opendir(DIR, $PATH) then again, it works as long as I don't go into a sub-directory. Then it loses track.. So, how did people overcome this issue before any newer versions of Perl existed?
Here is my script that uses the opendir() function:
#!/usr/bin/perl -w use strict; use warnings; my $RECURSIVE = 0; my $MSWIN = index($^O, 'MSWin') >= 0 ? 1 : 0; my $SKIP = '.ZIP '; # File extensions to ignore (GLOBAL) if (@ARGV == 0) # Print usage { print "\n"; print '=' x 80; print "ISTEXT.PL shows you what percentage of a file is binary data +vs plain text.\nAnd it also shows whether the file is a DOS text file +, Linux text file, OSX,\nmixed, or \"undetermined.\""; print "\n\n(\"Undetermined\" means that the file contains no line br +eaks at all. A DOS text\nfile contains CR-LF pairs. Linux text files +have LF characters only. OSX text\nfiles contain CR characters only. +And \"mixed\" means that an unequal number of\nCR and LF characters a +re both present.)\n\n"; print 'Files with DOS line breaks will be represented with $$ signs. + Files with Linux'; print "\nline breaks will be represented with /////. OSX files will +be represented with\nletter \"O\" and mixed lines will be x x x. Bina +ry data will be shown as bbbbbb.\nExample:\n\n"; print " 4KB |/////////////| /home/you/Desktop/notes.txt <--- Linu +x text file\n"; print " 1KB |ttttttttttttt| /home/you/Desktop/url.txt <--- text w +ith no line breaks\n"; print " 43KB |x x x x x x x| /home/you/Desktop/list.nfo <--- mixed + text file\n"; print ' 2KB |$$$$$$$$$$|bbb /home/you/Desktop/IE.lnk <--- This +means the file'; print "\n\thappens to have equal number of CR and LF chars, (so we a +ssume that\n\t it is most likely a DOS text file), while the letter +b's indicate\n\t\tthat it also contains about 25% binary data.\n"; print "\nUSAGE: perl istext.pl <directory or filename>\n\nUsing a di +rectory name, you must end the dir with a forward slash or backslash. +\nEnd the path with two slashes to run a recursive scan: perl isText. +pl /etc//\n\n"; exit; } my $P = join(' ', @ARGV); # Testing a file or a directory? if (EndsWithSlash($P)) { # The following file extensions will be skipped: $SKIP .= '.ZIP .LNK .MID .PDF .MP3 .MOV .M4A .MP4 .AVI .VOB .WMV .WM +A .WAV .3GP .MPG .FLV .MKV .SWF .WEBM .OGG .MP2 .AAC .AC3 .TS .DOC .D +OCX .RAR .GZ .TGZ .CAB .EXE .COM .DLL .PNG .JPG .BMP .GIF .ICO .PCX . +DRV .MSI .CAT .OCX .SYS .CPL .CPX .NLS .CHM .TTF .FON .MST .AX .TSP . +DB .MBX .ANI .JAR .GADGET '; $RECURSIVE = EndsWithSlash($P); CheckDIR($P); exit 0; } else { exit CatchFile($P); # It's a file! } ################################################################# # # This function scans a string looking for special characters # and determines what percentage of the string is plain text # and also tries to determine the text format. # # Returns an integer whose lower 8 bits is the percentage (0-100). # Bit 9 will be set if any LF characters were found. # Bit 10 will be set if any CR characters were found. # Bit 11 will be set if there are equal number of CR and LF # characters in the string. These can be interpreted as follows: # # 000 = Format is undetermined. # 001 = LINUX string (LF only) # 010 = OSX string (CR only) # 011 = MIXED format # 111 = DOS text (CR-LF pairs) # # Usage: INTEGER = isText(STRING) # sub isText { defined $_[0] or return 100; my $L = length($_[0]); $L or return 100; # We will simply count the number of plain text characters # and the number of CR and LF characters in the string. my $TOTAL = $L; # Total length of string my $C; my $TX = 0; # Number of plain text characters my $CR = 0; # Number of 0D characters my $LF = 0; # Number of 0A characters while ($L--) { $C = vec($_[0], $L, 8); next if ($C > 126); if ($C > 31 || $C == 9) { $TX++; next; } $LF++ if ($C == 10); $CR++ if ($C == 13); } # Now, we will try to determine what type of string # we're dealing with. There are 5 possibilities: # LINUX, DOS, OSX, MIXED, or "undetermined." # # Explanation of formats: # * OSX files contain CR characters as line break. # * Linux text files contain LF characters as line break. # * DOS text files contain an equal number of CR and LF # characters in pairs. # * "MIXED" means that the string contains an unequal number of # both CR and LF characters, so this may be a binary string. # * "Undeteremined" means that the string does not contain # any line break characters at all, so it could be either # a DOS text or Linux text or anything. $C = $LF ? 0x100 : 0; # We use $C to store the string format. $C |= 0x200 if ($CR); $C |= 0x400 if ($CR == $LF); # The percentage is stored in the lower 7 bits, # and the format is stored in bits 9-11. return $C | int(($TX+$LF+$CR) / $TOTAL * 100); } ################################################################# # # This function is automatically called by CheckDIR() every time # a file is found. This function gets the full name of the file. # Returns the percentage as an integer (0-100). The value "100" # means that 100% of the file is plain text with no binary # characters in it at all. # # Usage: INTEGER = CatchFile(FULLNAME) <-- Called by CheckDIR() # sub CatchFile { my $F = _FileName(\@_); # Remove unsafe characters from file name # Check file extension to see if we should skip this file if (length($F) > 4) { my $EXT = rindex($F, '.'); if ($EXT >= 0) { $EXT = uc(substr($F, $EXT, length($F))) . ' '; return -1 if (index($SKIP, $EXT) >= 0); # SKIP FILE } } -e $F or return 0; # File exists? # To save time, we're only going to process small files... my $FILE_SIZE = -s $F; if ($FILE_SIZE > 9999999) { # Print file size in Megabytes printf("%dMB\t too big ", int(($FILE_SIZE+999999) / 1000000) + ); } else { # Print file size in Kilobytes printf("%dKB\t", int(($FILE_SIZE+999) / 1000) ); # ANALYZE FILE CONTENTS... my $S = ReadFile($F); my $FORMAT = isText($S); my $PERCENT = $FORMAT & 255; $FORMAT >>= 8; $S = ''; if ($FORMAT == 1) { $S = '/' x 13; } # LINUX if ($FORMAT == 2) { $S = 'O' x 13; } # OSX if ($FORMAT == 3) { $S = 'x ' x 7; } # MIXED if ($FORMAT == 4) { $S = 't' x 13; } # undetermined if ($FORMAT == 7) { $S = '$' x 13; } # DOS my $VISUAL = int($PERCENT * .13); print '|', substr($S, 0, $VISUAL), '|', 'b' x (13-$VISUAL); } print " $F\n"; } ################################################################# # # This function reads the contents of a folder and calls # CatchFile() for each file that was found. # Usage: CheckDIR(PATH) # sub CheckDIR { my $PATH = defined $_[0] ? $_[0] : ''; length($PATH) or return; # Change / to \ on Windows computers if ($MSWIN) { $PATH =~ tr#/#\\#; } # print "Reading directory: $PATH\n"; # Make sure that PATH ends with a backslash or forward slash if (index("/\\", substr($PATH, length($PATH)-1, 1)) < 0) { $PATH .= ($MSWIN ? "\\" : '/'); } my $DIR; my $FULLNAME; opendir($DIR, $PATH) or return; my $NAME = 1; while ($NAME) { $NAME = readdir($DIR); defined $NAME or last; $FULLNAME = "$PATH$NAME"; if (-d($FULLNAME)) { # Check into subdirectory if RECURSIVE == 1 # Skip directory if its name starts with "." if ($RECURSIVE) { CheckDIR($FULLNAME) unless (vec($NAME, 0, 8) == 46); } next; } CatchFile($FULLNAME); } closedir($DIR); } ################################################################# # # Checks if the argument string ends with a forward slash or # backslash, and if it does, then removes it and returns 1, # or returns 0 if no slash was found at the end of the string. # Usage: INTEGER = EndsWithSlash(STRING) # sub EndsWithSlash { my $P = defined $_[0] ? $_[0] : ''; length($P) or return 0; index("\\/", substr($P, length($P)-1, 1)) >= 0 or return 0; chop $_[0]; return 1; } ################################################################# # Usage: STRING = _FileName(\@_) - Removes the first argument from @_ +just like shift() does and returns a file name. This function does no +t check syntax, but it does remove some illegal characters (<>|*?) fr +om the name that obviously should not occur in a file name. If the fi +le name doesn't contain any valid characters, then returns an empty s +tring. sub _FileName { @_ or return ''; my $N = shift; $N = shift(@$N); defin +ed $N or return ''; length($N) or return ''; my $c; my $j = 0; my $V += 0; for (my $i = 0; $i < length($N); $i++) { $c = vec($N, $i, 8); ne +xt if ($c == 63 || $c == 42 || $c < 32); last if ($c == 60 || $c == 6 +2 || $c == 124); if ($c > 32) { $V = $j + 1; } if ($V) { $i == $j or +vec($N, $j, 8) = $c; $j++; } } return substr($N, 0, $V); } # Usage: STRING = ReadFile(FILE_NAME, [BYTES_TO_READ, [START]]) - Read +s an entire file in binary mode and returns the contents in one strin +g. A second argument may be provided to read only a certain number of + bytes. And a third argument may be provided to set the file pointer +to a certain address before reading. sub ReadFile { my $F = _FileName(\@_); length($F) or return ''; my $L += @_ ? shift : 99999999; defined $L or return ''; my $A = @_ ? shift +: 0; defined $A or $A = 0; -f $F or return ''; -s $F or return ''; my + $B; open FH, "<$F" or return ''; binmode FH; if ($A) { sysseek(FH, $ +A, 0); } sysread FH, $B, $L; close FH; defined $B or return ''; retur +n $B; } #################################################################
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: script fails with perl 5.004_02
by rjt (Curate) on Oct 27, 2019 at 07:44 UTC | |
by harangzsolt33 (Deacon) on Oct 27, 2019 at 14:17 UTC | |
|
Re: script fails with perl 5.004_02
by jwkrahn (Abbot) on Oct 27, 2019 at 21:39 UTC | |
by afoken (Chancellor) on Oct 28, 2019 at 05:55 UTC |