Update: I forgot to make it explicit: I believe your misuse of select caused the hang, as the other tweaks are quite minor. </Update>
Following works to strip html tags (On W2k, but your vers of w32 is irrelevant to the problem). Is that the desired outcome?
#!C:/perl/bin
use strict;
use warnings;
# 761354
use HTML::Parse;
use HTML::FormatText;
my $contents; # global, deliberately
print "\n\t File to Read: ";
my $ifile=<>;
chomp($ifile);
# my $file = "$ifile" . ".txt" ; # Note 1
my $file="$ifile";
print "Printing name of input file:\n";
print "\t" . $ifile . "\n";
print "Done printing input file name\n\n";
readfile($ifile);
sub readfile { # Note 2
local $/ = undef;
open (FILE, "<$file") || die "Can't open $file: $!\n";
# select((select(FILE), $/ = undef)[0]); # Note 3
local $/ = undef; # Note 4
$contents = <FILE>;
close(FILE);
print "\$contents is:\n"; # Since you're doing this
+ you could
print $contents; # simply redirect output
+to a file...
print "\n\t Done printing contents to screen\n\n"; # ...but
+ anyway...
return $contents;
}
# -------- Rip HTML Tags
my $plain_text = HTML::FormatText->new->format(parse_html($contents));
+
print $plain_text;
print "\n\t File to Write: ";
my $writefile=<>;
chomp($writefile);
# $file = $file . ".txt"; # Note 5
open (DAT, ">$writefile") || die "Cannot Open File $!\n";
print DAT "$plain_text";
close($writefile);
Note 1: It makes very little sense to auto-append ".txt" to a source that one may infer from the modules used -- will be ".htm" or ".html" or..... Moreover, appending ".txt" to -- say, "foo.htm" should immediately execute the die (unless for some reason "foo.htm.txt" exists.
Note 2: Put the read in a sub so I could localize $/ (at Note 4) to slurp the entire file. This won't work with a file that overextends your RAM, but best practice for webmasters is avoid huge, webpages so slurping shouldn't be an issue and certainly will not be an issue with a 30KB file such as you mentioned in the OP (and then improperly removed - Use strikeout if you feel you must remove something when editing a post and mark updates as such).
Note 3: Use of select makes no sense here in the case you've described. From perldoc -f select:
select FILEHANDLE
select Returns the currently selected filehandle. Sets the current
default filehandle for output, if FILEHANDLE is supplied. This
has two effects: first, a "write" or a "print" without a
filehandle will default to this FILEHANDLE. Second, references
to variables related to output will refer to this output
channel. ....
Note 5: Appending the ".txt" prefix might have some value, but since the user is asked for a complete filename in the read sub, one might expect that user to provide a complete (path/to/writedir/filename.something when presented with a similar prompt.
Note also use of strict and warnings which can be very helpful in many cases, though they would not have diagnosed your problem, here.
And, for good measure, a couple style notes:
- Comments such as those I've removed are -- by and large -- unhelpful, since the code itself is utterly transparent. I left the "Rip HTML tags" only because one could argue that it provides information to another reader about what your modules do.
- It's often helpful when using a CLI to offset prompts (as done here with newlines and tabs) so that they stand out.
- I may have missed some (perltidy would not) inconsistent formatting such as spacing (or lack thereof) between the variable and assignment operator and between the assignment operator and the value. Your code will be far more readable if you adopt a consistent style. Same applies to indentation here (seen only in the sub, which would be better placed at the beginning or end of the script, rather than inline as I have done.
- Be consistent in your filehandling. Note the variant ways you show in the OP
|