Re: How can I download HTML and save it as txt?
by jeffa (Bishop) on Aug 30, 2005 at 21:22 UTC
|
You can always use your browser of choice -- they have a 'Save As' and you can choose 'As Text'. Something tells me this is not sufficient enough for you, however. The Perl Cookbook has a recipe devoted to converting HTML to ASCII. This is straight from the first edition, Recipe 20.5:
use HTML::FormatText;
use HTML::Parse;
$html = parse_htmlfile($filename);
$formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50)
+;
$ascii = $formatter->format($html);
| [reply] [d/l] |
Re: How can I download HTML and save it as txt?
by ikegami (Patriarch) on Aug 30, 2005 at 21:23 UTC
|
I have problems understanding your question, but at least one of the following modules should help you.
Any of LWP::Simple, LWP::UserAgent and WWW::Mechanize will help you download a web page.
As for converting the HTML to text, HTML::FormatText and possibly HTML::FormatText::WithLinks should be of interest.
Update: I see others have already posted answers. InfiniteSilence posted an example of downloading a web page and saving it as HTML in a file with the extention .txt. jeffa posted an example of converting HTML to text. Pick and choose what you want.
| [reply] [d/l] |
Re: How can I download HTML and save it as txt?
by trammell (Priest) on Aug 30, 2005 at 21:40 UTC
|
% lynx -dump www.google.com > google.txt
| [reply] [d/l] |
Re: How can I download HTML and save it as txt?
by InfiniteSilence (Curate) on Aug 30, 2005 at 21:19 UTC
|
perl -e "use LWP::Simple; getprint('http://myfoo.com')" >> myfile.txt
Celebrate Intellectual Diversity
| [reply] [d/l] |
|
perl -MLWP::Simple -e"getstore('http://myfoo.com', 'myfile.txt')"
Not sure what the tassex really wants though. Do you (tassex) want to store the HTML in a .txt file (like above), or do you want to strip the HTML and save the text?
--
b10m
All code is usually tested, but rarely trusted.
| [reply] [d/l] |
Re: How can I download HTML and save it as txt?
by jdporter (Paladin) on Aug 31, 2005 at 04:48 UTC
|
Here's one way I can think of:
use LWP::Simple;
use HTML::TreeBuilder;
use IO::File;
IO::File
->new( "> $file" )
->print(
HTML::TreeBuilder
->new_from_content( get $url )
->as_text
)
Short and sweet. But lacking any kind of error handling. :-( | [reply] [d/l] |
Re: How can I download HTML and save it as txt?
by tassex (Initiate) on Aug 30, 2005 at 22:06 UTC
|
Cheers to ALL of you!!
More than one way to do so.. :)
| [reply] |
Re: How can I download HTML and save it as txt?
by CountZero (Bishop) on Aug 31, 2005 at 05:58 UTC
|
To let you in on a big secret: HTML files are already text! You will be hard pressed to save them as anything else than text.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
Re: How can I download HTML and save it as txt?
by chanio (Priest) on Aug 30, 2005 at 21:27 UTC
|
If you don't want to save it like text from your browser, you could do like this...
Create a new printer but choose a plain text printer, and add the option to save it as a file instead of printing it. Then, when you choose /Print at any app. you are going to be prompted for the name and location of your text file. And that's it! It would write it as if you had an old plain text matrix printer (without any graphics)...
But it is better to have Firefox and it's incredible extensions (Copy to...).
| [reply] |
Re: How can I download HTML and save it as txt?
by Anonymous Monk on Aug 31, 2005 at 07:03 UTC
|
lwp-request -m get -o text http://myfoo.com >myfile.txt
| [reply] [d/l] |