comment on

You are attempting to web-scrape 30,000 pages from a single commercial site. If you have not obtained permission to do so, this could be considered abusive behavior, especially because 25% of your requests would be to non-existent pages, (there are only about 22,500 pages, with gaps in the numbering), and you seem to have no plan for caching the pages (30,000 page requests each time you test your program).

Before you pursue this further, please see the Download Page for the National Vulnerability Database.

NVD/CVE XML Data Files:
(All up-to-date as of today!)
 3.8MB nvdcve-2007.xml
10.9MB nvdcve-2006.xml
 6.8MB nvdcve-2005.xml
 4.3MB nvdcve-2004.xml
 1.9MB nvdcve-2003.xml
 7.7MB nvdcve-2002.xml     vulnerabilities prior to and including 2002
 0.2MB nvdcve-recent.xml   all recently published vulnerabilities
 0.2MB nvdcve-modified.xml all recently published
                           and recently updated vulnerabilities
[download]

If these files contain the data you need, then this is a *much* better way to proceed.

Whether you use the HTML pages or the recommended XML files, you should download them as a separate step from your Perl code. You can do the downloading via a second Perl program using LWP, or via a specialized download tool like `wget` or (my favorite in Linux and Win32) cURL. Once you have your source data downloaded, only then should you tackle the parsing. Let us know if you need help with that parsing.

In reply to Re: Extract table info and create txt file by Util
in thread Extract table info and create txt file by TomBombadil

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.