fadingjava has asked for the wisdom of the Perl Monks concerning the following question:

hi , i am reading text from all the files in a directory one by one and then doing something with it . After manioulation i am writing the new data to to a text file with same name and extension .txt. But, i get empty files as a result of it . i did some trial and error to find where the problem was and it comes up that nothing is being read from the text file and so there is nothing to write back into the new file . i cant figure out why this is happening . can u help me find my fault?? here's the code
#!c:\perl\perl.exe # open directory and get all filenames in one array my $path = "c:/perl/dvd_files/"; opendir(LOCAT, $path) or die "Couldn't open folder, $!\n"; my @folder = grep !/^\.\.?$/, readdir(LOCAT); closedir (LOCAT); # foreach file open it and strip pattern and rewrite a text file foreach my $file (@folder) { my $full_path = $path.$file; #print "$full_path\n"; open SWORD, "< $full_path" or die "file could not be opened:$!"; print "here\n"; $full_path =~ s/sub/txt/gi; $full_path =~ s/srt/txt/gi; $full_path =~ s/txt/txt/gi; open WRITE, "> $full_path" or die "file could not be written:$!"; while (<SWORD>){ print "$_\n"; $_=~ s/[\{0-9\}\{0-9\}]//g; print "here"; print WRITE $_; } close (SWORD); close (WRITE); }

Replies are listed 'Best First'.
Re: reading text from afile
by ysth (Canon) on Sep 28, 2004 at 06:58 UTC
    Try skipping .txt files. If you are processing c:/perl/dvd_files/foo.txt you will open it for reading, and then turn right around and open it for writing (which truncates anything that was there).

    If you need to do the substitution on .txt files also, open a different output file and then rename it to where you want it, or read in the entire file before opening it for writing.

      thanks a lot . that was exactly the problem . it was working fine on other extensions but not on txt files .
Re: reading text from afile
by Zaxo (Archbishop) on Sep 28, 2004 at 07:49 UTC

    On unix it would seem that you are truncating the text files by opening them to write before you have read them, a result of opening them by the same name. I don't recall how that works on win32, but I suspect it is the same.

    You can fix that by either using a temporary file name and renaming it after the copy is done, or else by opening in '+<' mode, which opens to read and write, without truncation.

    Here's a rewrite using the non-truncating r/w open. I prefer it to the temporary file because of race conditions which may occur with temporary files. I'll use glob to simplify grabbing the file listing.

    #!/usr/bin/perl use warnings; use strict; use Fcntl ':flock'; my $path = 'c:/perl/dvd_files/'; # only doing the problem files - those with unchanged names my @files = glob "$path*.txt"; for (@files) { local $/; open my $fh, '+<', $_ or warn($!), next; flock $fh, LOCK_EX; binmode $fh; my $text = <$fh>;
    Your following substitution looks fishy. I can't tell what you really want to do, but what it does can be done by tr///,
    $text =~ tr/0-9{}//d; seek $fh, 0, 0; truncate $fh, 0; print $fh $text; }
    The code for the other files is similar, but there is not the issue of clobbering the file you are reading from. There is an issue of clobbering a text file if a .sub or .srt file has the same basename as a text file.

    On reflection, I wonder if you wouldn't be better off creating the rewritten files in a brand-new subdirectory and not clobbering anything. That may be necessary, anyway, since I can't test on win32 and am not sure that flock, seek, or truncate will work on your system.

    After Compline,
    Zaxo

      thanks zaxo , i agree with you that i overdid somethings. your code seems to work much better for me . i have another query though . this code that i am posting now is taken from a site for stripping HTML . It does the same thing that my other code was doing , generated empty txt files and gives me "Use of uninitialised variable " error for $plain_text. can any body tell me why??
      #!c:/perl/perl.exe use strict; use warnings; use CGI ':standard'; use HTML::Parser; my $plain_text ; my $p = HTML::Parser->new(text_h => [\&text_rtn, 'text']); my $path = "c:/perl/htmlfiles"; opendir(LOCAT, $path) or die "Couldn't open folder, $!\n"; my @folder = grep !/^\.\.?$/, readdir(LOCAT); closedir (LOCAT); # Do a loop for each file in the folder. This gets the filename also. foreach my $file (@folder) { my $full_path = $path.$file; print "Reading '$full_path'\n"; $p->parse_file($full_path); $full_path =~ s/html/txt/gi; $full_path =~ s/htm/txt/gi; print "Writing '$full_path'\n"; open(WRITE,">$full_path") or die("Cannot create file!"); print WRITE $plain_text; close(WRITE); } sub text_rtn { foreach (@_) { $plain_text .= "$_\n"; } }
Re: reading text from afile
by bobf (Monsignor) on Sep 28, 2004 at 08:07 UTC

    You might want to reconsider the substitutions that operate on $full_path. As they are now, ANY occurance of 'sub' or 'srt' in $full_path will be replaced with 'txt', not just the file extension (which I assume you are trying to do). If you want to replace only the extension, you could do something like:

    $full_path =~ s/sub$/txt/i;

    Alternatively, you could use the File::Basename and/or File::Spec modules to perform the path/filename operations for you (i.e., split off the suffix and create a path/filename for the new file).

Re: reading text from afile
by thor (Priest) on Sep 28, 2004 at 11:55 UTC
    In the interest of TIMTOWTDI, a one-liner that does almost what you're looking for.
    perl -pi.bak -e 's/[\{0-9\}\{0-9\}]//g;'
    This will edit your files "in place", saving off the originals with a ".bak" suffix.

    thor

    Feel the white light, the light within
    Be your own disciple, fan the sparks of will
    For all of us waiting, your kingdom will come