Hi. I'm a writing a script to get a Links from a text file. The script then goes through each link only once. It goes the page of the link. After going to the page. It gets the HTML and saves it. Then it strips off all HTML formating. After that it deletes all text except a certain Bit. So far i got it remove formatting, but not delete the text. So this is the code. May require CPAN modules

#!/usr/bin/perl -w # article.pl # Including modules use strict; use LWP::Simple; use HTML::Parser; use HTML::Parse; use HTML::Strip; use HTML::FormatText; use IO::Handle; # Post to log for success or failure print "\n"; print "------------------------------------\n"; print "| |\n"; print "| Welcome the The Elite Noob's |\n"; print "| Article Bot! |\n"; print "| |\n"; print "------------------------------------\n"; print "\n"; print "Remember. Put all links in a file called 'links.txt'\n"; print "Links should be One Per Line. Once you have file\n"; print "Just click press enter any key to begin!\n"; print "\n:<Press Enter>:\n"; {my $tempIn = <STDIN>} # Redirecting input from file 'links.txt' open fINPUT, '<', "links.txt" or die $!; open TEXT, '>', "main.txt" or die $!; STDIN->fdopen( \*fINPUT, 'r' ) or die $!; my @links; chomp(@links = <>); # Get all input and remove \n if there my $filenum = 1; my $currentLink; close fINPUT; foreach(@links){ $currentLink = get("$_"); $currentLink = tagEdit($currentLink); print TEXT "$currentLink"; # saves Text File that is being edited. } # Removing HTML in tag. sub tagEdit { my $edited = $_[0]; my $hs = HTML::Strip->new(); my $cleanText = $hs->parse($edited); $hs->eof; return $cleanText; }

Now this is the sample text file. It has to be called links.txt, Put this link in the text file.

http://www.imreportcard.com/products/the-elevation-group

I'm trying to get the HTML. I need to strip it so it only says the following:

Product Description The Elevation Group is an training system developed by Mike +Dillard that will give you access to the investment strategies used b +y some of the richest investors in the industry. You will be able to +secure free access to a 90 minute presentation on how to use investme +nts to profit during economic chaos. This presentation will cover a w +ide variety of investment topics and give you a good grounding in wha +t you need to do to succeed. Detailed Overview With The Elevation Group presentation you will learn how you + can become rich during economic turmoil even if you don't happen to +have a lot of money to invest. You will get insight into how the econ +omy is going to be in the future so you can protect the money you hav +e amassed. You will learn how Mike Dillard has increased his personal + wealth many times and also how you can do this for yourself. You wil +l finally understand why investing in the stock market at this time i +s a very bad idea and learn what you actually should be investing in. + You will no longer have to worry about IRA's or any other traditiona +l retirement investment strategies - you will learn how the rich do t +heir investments to grow their money and retire 100% tax free. You will get an understanding of the top five challenges that are curr +ently facing the United States economy and learn how to protect yours +elf from them. You will also find out why you should invest in gold a +nd silver at this time - you will be amazed at the prices they will b +e going for in the near future. It doesn't matter if you are currentl +y 'poor' - you will learn how to build your wealth and provide your c +hildren with everything they need and desire. To learn all of this al +l you need to do is head over to the website and put in your email ad +dress to get your free access to this powerful training. Reputation Mike Dillard has built his wealth from the ground up. His ne +wsletter has helped thousands of people of people be successful - wit +h over one million subscribers. He has built a strong reputation that + has been backed up by his real-life success. If you are wanting to g +et on top of this economy and if you want to learn how investing can +make you successful then you will want to take the time to take a loo +k at this seminar. Domain "Whois" The Whois information for a website lists the owner and thei +r contact information. The Whois information for "The Elevation Group +" is public which is generally a good thing. This indicates the owner + of this site has nothing to hide.

There is a text file created that shows all of the text after tags are removed. Basically I need to discard everything that is not between "Product Description" and "0 of 0 people found this review helpful"


In reply to Formating a HTML document to show certain text. by The Elite Noob

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.