in reply to Re: Split file, first 30 lines only
in thread Split file, first 30 lines only

Thanks hippo, I understand the LAST function, but key is "WHERE" to insert this idea. Tried immediately after updating the line counter, but still reads whole file. Sorry to be so inept.

Replies are listed 'Best First'.
Re^3: Split file, first 30 lines only
by huck (Prior) on Feb 27, 2017 at 23:11 UTC

    but still reads whole file.
    thats what get($fullfile)does , read the whole file at once.

    and i think @hippo meant

    last if $line_count > 29;
    Put it right after $line_count++;. You may Read the whole file via get, but then only PROCESS the beginning

      Thanks huck, I now understand the difference between read and process. Any thoughts on how I could approach the idea of "reading" the web page line by line? Thanks for your patience.

        Lets see if i can keep everyone happy here

        ... my $line_count=0; open (my $fh , '<', $fullfile) or die 'something'; while (my $line=<$fh>) { last if $line_count > 29; ############## needed to add this dayum chomp $line; ############## needed to add this if($line=~m/^\s*CENTRAL\s*INDEX\s*KEY:\s*(\d*)/m){$cik=$1;} if($line=~m/^\s*FORM\s*TYPE:\s*(.*$)/m){$form_type=$1;} if($line=~m/^\s*CONFORMED\s*PERIOD\s*OF\s*REPORT:\s*(\d*)/m){$ +report_date=$1;} if($line=~m/^\s*FILED\s*AS\s*OF\s*DATE:\s*(\d*)/m){$file_date= +$1;} if($line=~m/^\s*COMPANY\s*CONFORMED\s*NAME:\s*(.*$)/m){$name=$ +1;} $line_count++; print "$cik, $form_type, $report_date, $file_date, $name\n"; print "$line_count,' ', $line,' '\n"; } close $fh or die 'something'; ...
        Off the top of my head, and Untested, YMMV

Re^3: Split file, first 30 lines only
by hippo (Archbishop) on Feb 28, 2017 at 08:58 UTC

    Apologies. huck is correct, it's obviously the $line_count variable which should be tested rather than $line in this instance.

    This will only stop it processing the whole file. If you don't want to download the whole file then that's a different matter entirely and would require use of a technique such as HTTP Ranges.

      Hi Hippo, your answer above states "If you don't want to download the whole file then that's a different matter entirely and would require use of a technique such as HTTP Ranges." I've tried to Google HTTP ranges but no luck. Any ideas where I get a sense of what have in mind? Nothing else seems to work (just trying to nab the few lines from web pages). Thanks!

        Ranges are documented in section 14.35 of the HTTP RFC. They allow an HTTP client to request only part (or parts) of the resource which would ordinarily be retrieved in full (or in server-chosen chunks) from the server.

        The RFC only mandates byte-count ranges so you should use that instead of lines in order to be portable. However if you are after the first 30 lines of a 50,000 line response then just pick a large enough byte range that you will likely retrieve at least your 30 lines and if fewer lines are returned you can issue subsequent requests until you have all the data you require.