Hi Perl Monks, I am scraping data from web pages, where I only need say, the first 30 lines. I've used Perl's "split" function to "attempt" to read the file line-by-line, although I'm not overly successful. As is, I am able to obtain the desired output, albeit at the expense of reading the entire file. Hence, I need your assistance to tweak the code below (relevant loop only) such that I read only the first 30 lines. I am grateful for any insight you may have, including tips/suggestions for improving the code. Thank you! Rick
$file_count=0; foreach $filetoget(@aonly) { $fullfile="$base_url/$filetoget"; my $line_count=0; for my $line (split qr/\'\n'/, get($fullfile)) { if($line=~m/^\s*CENTRAL\s*INDEX\s*KEY:\s*(\d*)/m){$cik=$1;} if($line=~m/^\s*FORM\s*TYPE:\s*(.*$)/m){$form_type=$1;} if($line=~m/^\s*CONFORMED\s*PERIOD\s*OF\s*REPORT:\s*(\d*)/m){$ +report_date=$1;} if($line=~m/^\s*FILED\s*AS\s*OF\s*DATE:\s*(\d*)/m){$file_date= +$1;} if($line=~m/^\s*COMPANY\s*CONFORMED\s*NAME:\s*(.*$)/m){$name=$ +1;} $line_count++; print "$cik, $form_type, $report_date, $file_date, $name\n"; print "$line_count,' ', $line,' '\n"; } ### Now write the results to file!; #Open the output file; open my $FH_OUT, '>>',$write_dir or die "Can't open file $write_dir"; #Save/write results/output; $,='|'; print $FH_OUT "$cik$,$form_type$,$report_date$,$file_date$,$name$,\n"; #close $FH_IN or die "unable to close $filename"; #Update file counter; ++$file_count; print "$file_count\n"; print "$line_count lines read from $fullfile\n"; #closedir($dir_handle); close($FH_OUT); }

In reply to Split file, first 30 lines only by wrkrbeee

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.