Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Split web page, first 30 lines only -- :content_cb trick

by wrkrbeee (Scribe)
on Feb 28, 2017 at 21:18 UTC ( [id://1183198]=note: print w/replies, xml ) Need Help??


in reply to Re: Split web page, first 30 lines only -- :content_cb trick
in thread Split file, first 30 lines only

Hi Discipulus, Used your code suggestion (see below). I'm guessing that the $response variable will contain the 30 lines I'm looking for. As is, $response is empty. Any ideas?? Thanks so much! Rick
strict; use warnings; use Tie::File; use Fcntl; use LWP::UserAgent; use File::Slurp; my @lines; #Transfer URLS to a string variable; my $file = "G:/Research/SEC filings 10K and 10Q/Data/sizefiles1.txt"; #Now fill @pages array with contents of sizefile1.txt ... how? open (FH, "< $file") or die "Can't open $file for read: $!"; my @pages = <FH>; close FH or die "Cannot close $file: $!"; #connect variable used with GET?? my $ua = LWP::UserAgent-> new; #Initialize line counter; my $read_lines=1; #Primary loop through URLs ; foreach my $url (@pages) { my $response = $ua->get($url,':content_cb'=>\&head_only); print $response->content; } #Subroutine for primary loop; sub head_only { my ($data,$response,$protocol) = @_; my @lines = split "\n", $data; foreach my $line (@lines) { if ($read_lines ==31) { #reset line count' $read_lines = 1; print +("=" x 70), "\n"; #what is this? #die inside callback interrupt; die; } else { #print "line $read_lines: $line\n"; } } }

Replies are listed 'Best First'.
Re^3: Split web page, first 30 lines only -- :content_cb trick
by Athanasius (Archbishop) on Mar 01, 2017 at 09:48 UTC

    Hello wrkrbeee,

    I think Discipulus provided this sample code to demonstrate a useful approach which you can adapt to your particular needs. If you want to process the read-in lines in the calling code (your “Primary loop”) rather than in the callback function, then you need to store the lines in a shared variable rather than print them in sub head_only. There is an additional complication: the last line read from the current chunk of data may not be complete, so you need to check for a trailing newline and handle its absence appropriately:

    print +("=" x 70), "\n"; #what is this?

    The x operator creates a string of 70 equals characters concatenated together:

    ======================================================================

    — see perlop#Multiplicative-Operators. The plus sign is there to prevent the Perl parser from thinking that the parentheses contain the entire argument list to the print function — see print.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thank you Athanasius! Appreciate your time and effort!
Re^3: Split web page, first 30 lines only -- :content_cb trick and populate $response object
by Discipulus (Canon) on Mar 01, 2017 at 11:51 UTC
    well, you got a good answer from estimated brother Athanasius and you are right in my code my $response = $ua->get($url, ... could have be simply $ua->get($url, ... because the 30 lines are printed in the callbak.

    Anyway $response it is not empty: if you dump it (i use Data:Dump's dd method) you'll see it is completly full of stuffs excepts for the _content field.

    So is $response->content that is empty, not the $response itself.

    In the docs is said that the callback receive three arguments: a chunk of data, a reference to the response object, and a reference to the protocol object.

    So you get and handy reference to the response object and I guess you can use it to populate it's _content field. If you modify the else part of the head_only sub like:

    else{ $$resp{_content}.="$line\n" # print "line $read_lines: $line\n" }

    You can now print $response->content; and get the 30 lines only. Fun, no? thanks to let me investigate such useful feature

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thank you Discipulus! I appreciate your help very much!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183198]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-28 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found