Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^2: Split web page, first 30 lines only -- :content_cb trick

by wrkrbeee (Scribe)
on Feb 28, 2017 at 21:18 UTC ( #1183198=note: print w/replies, xml ) Need Help??

in reply to Re: Split web page, first 30 lines only -- :content_cb trick
in thread Split file, first 30 lines only

Hi Discipulus, Used your code suggestion (see below). I'm guessing that the $response variable will contain the 30 lines I'm looking for. As is, $response is empty. Any ideas?? Thanks so much! Rick
strict; use warnings; use Tie::File; use Fcntl; use LWP::UserAgent; use File::Slurp; my @lines; #Transfer URLS to a string variable; my $file = "G:/Research/SEC filings 10K and 10Q/Data/sizefiles1.txt"; #Now fill @pages array with contents of sizefile1.txt ... how? open (FH, "< $file") or die "Can't open $file for read: $!"; my @pages = <FH>; close FH or die "Cannot close $file: $!"; #connect variable used with GET?? my $ua = LWP::UserAgent-> new; #Initialize line counter; my $read_lines=1; #Primary loop through URLs ; foreach my $url (@pages) { my $response = $ua->get($url,':content_cb'=>\&head_only); print $response->content; } #Subroutine for primary loop; sub head_only { my ($data,$response,$protocol) = @_; my @lines = split "\n", $data; foreach my $line (@lines) { if ($read_lines ==31) { #reset line count' $read_lines = 1; print +("=" x 70), "\n"; #what is this? #die inside callback interrupt; die; } else { #print "line $read_lines: $line\n"; } } }

Replies are listed 'Best First'.
Re^3: Split web page, first 30 lines only -- :content_cb trick
by Athanasius (Archbishop) on Mar 01, 2017 at 09:48 UTC

    Hello wrkrbeee,

    I think Discipulus provided this sample code to demonstrate a useful approach which you can adapt to your particular needs. If you want to process the read-in lines in the calling code (your “Primary loop”) rather than in the callback function, then you need to store the lines in a shared variable rather than print them in sub head_only. There is an additional complication: the last line read from the current chunk of data may not be complete, so you need to check for a trailing newline and handle its absence appropriately:

    print +("=" x 70), "\n"; #what is this?

    The x operator creates a string of 70 equals characters concatenated together:


    — see perlop#Multiplicative-Operators. The plus sign is there to prevent the Perl parser from thinking that the parentheses contain the entire argument list to the print function — see print.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thank you Athanasius! Appreciate your time and effort!
Re^3: Split web page, first 30 lines only -- :content_cb trick and populate $response object
by Discipulus (Abbot) on Mar 01, 2017 at 11:51 UTC
    well, you got a good answer from estimated brother Athanasius and you are right in my code my $response = $ua->get($url, ... could have be simply $ua->get($url, ... because the 30 lines are printed in the callbak.

    Anyway $response it is not empty: if you dump it (i use Data:Dump's dd method) you'll see it is completly full of stuffs excepts for the _content field.

    So is $response->content that is empty, not the $response itself.

    In the docs is said that the callback receive three arguments: a chunk of data, a reference to the response object, and a reference to the protocol object.

    So you get and handy reference to the response object and I guess you can use it to populate it's _content field. If you modify the else part of the head_only sub like:

    else{ $$resp{_content}.="$line\n" # print "line $read_lines: $line\n" }

    You can now print $response->content; and get the 30 lines only. Fun, no? thanks to let me investigate such useful feature


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thank you Discipulus! I appreciate your help very much!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183198]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2022-01-18 23:11 GMT
Find Nodes?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:

    Results (54 votes). Check out past polls.