Re^2: Split file, first 30 lines only

Replies are listed 'Best First'.
Re^3: Split file, first 30 lines only by huck (Prior) on Feb 27, 2017 at 23:11 UTC
but still reads whole file. thats what `get($fullfile)`does , read the whole file at once. and i think @hippo meant `last if $line_count > 29;` [download] Put it right after `$line_count++;`. You may Read the whole file via get, but then only PROCESS the beginning	[reply] [d/l] [select]
Re^4: Split file, first 30 lines only by wrkrbeee (Scribe) on Feb 27, 2017 at 23:16 UTC
Thanks huck, I now understand the difference between read and process. Any thoughts on how I could approach the idea of "reading" the web page line by line? Thanks for your patience.	[reply]
Re^5: Split file, first 30 lines only by huck (Prior) on Feb 27, 2017 at 23:24 UTC
Lets see if i can keep everyone happy here ... my $line_count=0; open (my $fh , '<', $fullfile) or die 'something'; while (my $line=<$fh>) { last if $line_count > 29; ############## needed to add this dayum chomp $line; ############## needed to add this if($line=~m/^\sCENTRAL\sINDEX\sKEY:\s(\d)/m){$cik=$1;} if($line=~m/^\sFORM\sTYPE:\s(.$)/m){$form_type=$1;} if($line=~m/^\sCONFORMED\sPERIOD\sOF\sREPORT:\s(\d)/m){$ +report_date=$1;} if($line=~m/^\sFILED\sAS\sOF\sDATE:\s(\d)/m){$file_date= +$1;} if($line=~m/^\sCOMPANY\sCONFORMED\sNAME:\s(.$)/m){$name=$ +1;} $line_count++; print "$cik, $form_type, $report_date, $file_date, $name\n"; print "$line_count,' ', $line,' '\n"; } close $fh or die 'something'; ... [download] Off the top of my head, and Untested, YMMV	[reply] [d/l]
Re^6: Split file, first 30 lines only by wrkrbeee (Scribe) on Feb 27, 2017 at 23:25 UTC
Re^7: Split file, first 30 lines only by huck (Prior) on Feb 27, 2017 at 23:35 UTC
Some notes below your chosen depth have not been shown here
Re^3: Split file, first 30 lines only by hippo (Archbishop) on Feb 28, 2017 at 08:58 UTC
Apologies. huck is correct, it's obviously the `$line_count` variable which should be tested rather than `$line` in this instance. This will only stop it processing the whole file. If you don't want to download the whole file then that's a different matter entirely and would require use of a technique such as HTTP Ranges.	[reply] [d/l] [select]
Re^4: Split file, first 30 lines only by wrkrbeee (Scribe) on Mar 01, 2017 at 20:01 UTC
Hi Hippo, your answer above states "If you don't want to download the whole file then that's a different matter entirely and would require use of a technique such as HTTP Ranges." I've tried to Google HTTP ranges but no luck. Any ideas where I get a sense of what have in mind? Nothing else seems to work (just trying to nab the few lines from web pages). Thanks!	[reply]
Re^5: Split file, first 30 lines only (HTTP Ranges) by hippo (Archbishop) on Mar 02, 2017 at 09:38 UTC
Ranges are documented in section 14.35 of the HTTP RFC. They allow an HTTP client to request only part (or parts) of the resource which would ordinarily be retrieved in full (or in server-chosen chunks) from the server. The RFC only mandates byte-count ranges so you should use that instead of lines in order to be portable. However if you are after the first 30 lines of a 50,000 line response then just pick a large enough byte range that you will likely retrieve at least your 30 lines and if fewer lines are returned you can issue subsequent requests until you have all the data you require.	[reply]
Re^6: Split file, first 30 lines only (HTTP Ranges and :read_size_hint) by Discipulus (Canon) on Mar 02, 2017 at 10:28 UTC
Re^7: Split file, first 30 lines only (HTTP Ranges and :read_size_hint) by hippo (Archbishop) on Aug 23, 2017 at 10:46 UTC
Re^7: Split file, first 30 lines only (HTTP Ranges and :read_size_hint) by wrkrbeee (Scribe) on Mar 02, 2017 at 16:07 UTC
Re^6: Split file, first 30 lines only (HTTP Ranges) by wrkrbeee (Scribe) on Mar 02, 2017 at 16:07 UTC