aliroshan has asked for the wisdom of the Perl Monks concerning the following question:

# --------- Load necessary Modules use HTML::Parse; use HTML::FormatText; # --------- Open File for Reading print "File to Read : "; $ifile=<>; chomp($ifile); $file = "$ifile" . ".txt" ; print $file; open (FILE, "$file") or die "Can't open $file: $!\n"; select((select(FILE), $/ = undef)[0]); my $contents = <FILE>; close(FILE); # -------- Rip HTML Tags $plain_text = HTML::FormatText->new->format(parse_html($contents)); print $plain_text; # -------- Writing to Files print "File to Write to : "; $file = <>; chomp($file); $file = $file . ".txt"; open(DAT,">$file") || die("Cannot Open File"); print DAT "$plain_text"; close(DAT);

Replies are listed 'Best First'.
Re: Write Large data
by FunkyMonk (Bishop) on May 01, 2009 at 19:04 UTC
    The original question was (courtesy of http://corion.net/perlmonks/761354.xml):

    I am trying to write some data (30 kb) to a text file, but the script get stuck and remains like this for a long time untill i mannually close it... whats the problem with this and whats the Solution for this situation... Thanx
Re: Write Large data
by ww (Archbishop) on May 01, 2009 at 23:39 UTC

    Update: I forgot to make it explicit: I believe your misuse of select caused the hang, as the other tweaks are quite minor. </Update>

    Following works to strip html tags (On W2k, but your vers of w32 is irrelevant to the problem). Is that the desired outcome?

    #!C:/perl/bin use strict; use warnings; # 761354 use HTML::Parse; use HTML::FormatText; my $contents; # global, deliberately print "\n\t File to Read: "; my $ifile=<>; chomp($ifile); # my $file = "$ifile" . ".txt" ; # Note 1 my $file="$ifile"; print "Printing name of input file:\n"; print "\t" . $ifile . "\n"; print "Done printing input file name\n\n"; readfile($ifile); sub readfile { # Note 2 local $/ = undef; open (FILE, "<$file") || die "Can't open $file: $!\n"; # select((select(FILE), $/ = undef)[0]); # Note 3 local $/ = undef; # Note 4 $contents = <FILE>; close(FILE); print "\$contents is:\n"; # Since you're doing this + you could print $contents; # simply redirect output +to a file... print "\n\t Done printing contents to screen\n\n"; # ...but + anyway... return $contents; } # -------- Rip HTML Tags my $plain_text = HTML::FormatText->new->format(parse_html($contents)); + print $plain_text; print "\n\t File to Write: "; my $writefile=<>; chomp($writefile); # $file = $file . ".txt"; # Note 5 open (DAT, ">$writefile") || die "Cannot Open File $!\n"; print DAT "$plain_text"; close($writefile);

    Note 1: It makes very little sense to auto-append ".txt" to a source that one may infer from the modules used -- will be ".htm" or ".html" or..... Moreover, appending ".txt" to -- say, "foo.htm" should immediately execute the die (unless for some reason "foo.htm.txt" exists.

    Note 2: Put the read in a sub so I could localize $/ (at Note 4) to slurp the entire file. This won't work with a file that overextends your RAM, but best practice for webmasters is avoid huge, webpages so slurping shouldn't be an issue and certainly will not be an issue with a 30KB file such as you mentioned in the OP (and then improperly removed - Use strikeout if you feel you must remove something when editing a post and mark updates as such).

    Note 3: Use of select makes no sense here in the case you've described. From perldoc -f select:

    select FILEHANDLE
    select  Returns the currently selected filehandle. Sets the current
            default filehandle for output, if FILEHANDLE is supplied. This
            has two effects: first, a "write" or a "print" without a
            filehandle will default to this FILEHANDLE. Second, references
            to variables related to output will refer to this output
            channel. ....
    

    Note 5: Appending the ".txt" prefix might have some value, but since the user is asked for a complete filename in the read sub, one might expect that user to provide a complete (path/to/writedir/filename.something when presented with a similar prompt.

    Note also use of strict and warnings which can be very helpful in many cases, though they would not have diagnosed your problem, here.

    And, for good measure, a couple style notes:

    • Comments such as those I've removed are -- by and large -- unhelpful, since the code itself is utterly transparent. I left the "Rip HTML tags" only because one could argue that it provides information to another reader about what your modules do.
    • It's often helpful when using a CLI to offset prompts (as done here with newlines and tabs) so that they stand out.
    • I may have missed some (perltidy would not) inconsistent formatting such as spacing (or lack thereof) between the variable and assignment operator and between the assignment operator and the value. Your code will be far more readable if you adopt a consistent style. Same applies to indentation here (seen only in the sub, which would be better placed at the beginning or end of the script, rather than inline as I have done.
    • Be consistent in your filehandling. Note the variant ways you show in the OP
Re: Write Large data
by superfrink (Curate) on May 01, 2009 at 18:19 UTC
    Can you post the code? We can't really debug it if we don't know what it looks like.

    PS: Since you're new here you should have a look at How (Not) To Ask A Question

    Update: Other details like the operating system, filesystem (eg NFS), etc could be helpful too. Does the file get created? Does it have some but not all of the data?
      file is not created win xp sp3 (ntfs)