The Elite Noob has asked for the wisdom of the Perl Monks concerning the following question:
Hi. I'm a writing a script to get a Links from a text file. The script then goes through each link only once. It goes the page of the link. After going to the page. It gets the HTML and saves it. Then it strips off all HTML formating. After that it deletes all text except a certain Bit. So far i got it remove formatting, but not delete the text. So this is the code. May require CPAN modules
#!/usr/bin/perl -w # article.pl # Including modules use strict; use LWP::Simple; use HTML::Parser; use HTML::Parse; use HTML::Strip; use HTML::FormatText; use IO::Handle; # Post to log for success or failure print "\n"; print "------------------------------------\n"; print "| |\n"; print "| Welcome the The Elite Noob's |\n"; print "| Article Bot! |\n"; print "| |\n"; print "------------------------------------\n"; print "\n"; print "Remember. Put all links in a file called 'links.txt'\n"; print "Links should be One Per Line. Once you have file\n"; print "Just click press enter any key to begin!\n"; print "\n:<Press Enter>:\n"; {my $tempIn = <STDIN>} # Redirecting input from file 'links.txt' open fINPUT, '<', "links.txt" or die $!; open TEXT, '>', "main.txt" or die $!; STDIN->fdopen( \*fINPUT, 'r' ) or die $!; my @links; chomp(@links = <>); # Get all input and remove \n if there my $filenum = 1; my $currentLink; close fINPUT; foreach(@links){ $currentLink = get("$_"); $currentLink = tagEdit($currentLink); print TEXT "$currentLink"; # saves Text File that is being edited. } # Removing HTML in tag. sub tagEdit { my $edited = $_[0]; my $hs = HTML::Strip->new(); my $cleanText = $hs->parse($edited); $hs->eof; return $cleanText; }
Now this is the sample text file. It has to be called links.txt, Put this link in the text file.
http://www.imreportcard.com/products/the-elevation-groupI'm trying to get the HTML. I need to strip it so it only says the following:
Product Description The Elevation Group is an training system developed by Mike +Dillard that will give you access to the investment strategies used b +y some of the richest investors in the industry. You will be able to +secure free access to a 90 minute presentation on how to use investme +nts to profit during economic chaos. This presentation will cover a w +ide variety of investment topics and give you a good grounding in wha +t you need to do to succeed. Detailed Overview With The Elevation Group presentation you will learn how you + can become rich during economic turmoil even if you don't happen to +have a lot of money to invest. You will get insight into how the econ +omy is going to be in the future so you can protect the money you hav +e amassed. You will learn how Mike Dillard has increased his personal + wealth many times and also how you can do this for yourself. You wil +l finally understand why investing in the stock market at this time i +s a very bad idea and learn what you actually should be investing in. + You will no longer have to worry about IRA's or any other traditiona +l retirement investment strategies - you will learn how the rich do t +heir investments to grow their money and retire 100% tax free. You will get an understanding of the top five challenges that are curr +ently facing the United States economy and learn how to protect yours +elf from them. You will also find out why you should invest in gold a +nd silver at this time - you will be amazed at the prices they will b +e going for in the near future. It doesn't matter if you are currentl +y 'poor' - you will learn how to build your wealth and provide your c +hildren with everything they need and desire. To learn all of this al +l you need to do is head over to the website and put in your email ad +dress to get your free access to this powerful training. Reputation Mike Dillard has built his wealth from the ground up. His ne +wsletter has helped thousands of people of people be successful - wit +h over one million subscribers. He has built a strong reputation that + has been backed up by his real-life success. If you are wanting to g +et on top of this economy and if you want to learn how investing can +make you successful then you will want to take the time to take a loo +k at this seminar. Domain "Whois" The Whois information for a website lists the owner and thei +r contact information. The Whois information for "The Elevation Group +" is public which is generally a good thing. This indicates the owner + of this site has nothing to hide.
There is a text file created that shows all of the text after tags are removed. Basically I need to discard everything that is not between "Product Description" and "0 of 0 people found this review helpful"
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Formating a HTML document to show certain text.
by wind (Priest) on Mar 26, 2011 at 22:51 UTC | |
by The Elite Noob (Sexton) on Mar 26, 2011 at 23:27 UTC | |
Re: Formating a HTML document to show certain text.
by Anonymous Monk on Mar 26, 2011 at 22:57 UTC | |
by Anonymous Monk on Mar 28, 2011 at 06:52 UTC |