Hi Perl Monks, I am scraping data from web pages, where I only need say, the first 30 lines. I've used Perl's "split" function to "attempt" to read the file line-by-line, although I'm not overly successful. As is, I am able to obtain the desired output, albeit at the expense of reading the entire file. Hence, I need your assistance to tweak the code below (relevant loop only) such that I read only the first 30 lines. I am grateful for any insight you may have, including tips/suggestions for improving the code. Thank you! Rick
$file_count=0;
foreach $filetoget(@aonly)
{
$fullfile="$base_url/$filetoget";
my $line_count=0;
for my $line (split qr/\'\n'/, get($fullfile))
{
if($line=~m/^\s*CENTRAL\s*INDEX\s*KEY:\s*(\d*)/m){$cik=$1;}
if($line=~m/^\s*FORM\s*TYPE:\s*(.*$)/m){$form_type=$1;}
if($line=~m/^\s*CONFORMED\s*PERIOD\s*OF\s*REPORT:\s*(\d*)/m){$
+report_date=$1;}
if($line=~m/^\s*FILED\s*AS\s*OF\s*DATE:\s*(\d*)/m){$file_date=
+$1;}
if($line=~m/^\s*COMPANY\s*CONFORMED\s*NAME:\s*(.*$)/m){$name=$
+1;}
$line_count++;
print "$cik, $form_type, $report_date, $file_date, $name\n";
print "$line_count,' ', $line,' '\n";
}
### Now write the results to file!;
#Open the output file;
open my $FH_OUT, '>>',$write_dir or die "Can't open file $write_dir";
#Save/write results/output;
$,='|';
print $FH_OUT "$cik$,$form_type$,$report_date$,$file_date$,$name$,\n";
#close $FH_IN or die "unable to close $filename";
#Update file counter;
++$file_count;
print "$file_count\n";
print "$line_count lines read from $fullfile\n";
#closedir($dir_handle);
close($FH_OUT);
}
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.