in reply to Using a Fetchrow with LWP

Try this.

use strict; use warnings; use DBD::SQLite; use HTTP::Tiny; # Test DB setup { my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/testdb','',''); $dbh->do('CREATE TABLE linktable (url TEXT NOT NULL, file VARCHAR( +255) NOT NULL)'); $dbh->do('INSERT INTO linktable VALUES ("https://www.sec.gov/Archi +ves/edgar/data/1897245/0001493152-23-024253.txt", "/tmp/edgar/0001493 +152-23-024253")'); } #--------------------------------------------# my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/testdb','',''); my $sql = 'SELECT url, file FROM linktable'; my $sth = $dbh->prepare($sql); $sth->execute or die "" . $dbh->errstr; my $ua = HTTP::Tiny->new( default_headers => { USER_AGENT => 'COMPANY email@email.com', }); while ( my $row = $sth->fetchrow_arrayref ) { my ($url, $file) = ($row->[0], $row->[1]); my $resp = $ua->mirror($url, $file); if ( $resp->{success} ) { print "OK\n"; } else { print "Failure: $resp->{status}, $resp->{reason}\n"; } } __END__

Hope this helps


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: Using a Fetchrow with LWP
by justin423 (Scribe) on Jul 19, 2023 at 02:03 UTC
    Getting there...

    Now I getting a 403 error because it isn't sending the right headers that sec.gov accepts.

    I'll update that part of the code.

Re^2: Using a Fetchrow with LWP
by justin423 (Scribe) on Jul 19, 2023 at 02:16 UTC

    I should have included that part of the code.

    Up above, there is this

    use Parent:HTTP::Message; $mess=HTTP::Message->new(); $mess=encode(gzip,deflate);

    It is just printing "failure , " with no error message now.

Re^2: Using a Fetchrow with LWP
by justin423 (Scribe) on Jul 19, 2023 at 02:54 UTC
    so this works:
    use File::Fetch; use LWP::UserAgent (); use DBI; use parent 'HTTP::Message'; $mess = HTTP::Message->new(); $mess->encode(gzip,deflate); $filename='/temp/edgar/workfile.txt'; unlink ($filename); $url='https://www.sec.gov/Archives/edgar/data/1869467/0000919574-23-00 +4048.txt'; my $ua = LWP::UserAgent->new(timeout => 10); $ua->default_header('Accept-Encoding' =>$mess = HTTP::Message->new()); $ua->default_header( USER_AGENT =>'COMPANY admin@example.com' ); print "Now downloading the file...\n"; my $res = $ua->mirror( $url, $filename );
    but this doesn't...
    use LWP::UserAgent (); use DBI; use parent 'HTTP::Message'; $mess = HTTP::Message->new(); $mess->encode(gzip,deflate); my $ua = LWP::UserAgent->new(timeout=>10); $mess = HTTP::Message->new(); $mess->encode(gzip,deflate); $ua->default_header('Accept Encoding'=>$mess=HTTP::Message->new()); $ua->default_header( USER_AGENT =>'COMPANY admin@example.com' ); my $SQL = "select url,filename from linktable"; my $sth = $dbh->prepare($SQL) or die "Prepare".$dbh->errstr; $sth-> execute() or die "".$dbh->errstr; while (my $row = $sth->fetchrow_arrayref) { my ($url,$filename)= ($row->[0],$row->[1]); print "\n$row[0] $row[1]\n"; my $resp = $ua->mirror( $url,$filename); if ( $resp->{success} ) { print "OK\n"; } else { print "Failure: $resp->{status}, $resp->{reason}\n"; } }

      "but this doesn't..."

      use LWP::UserAgent (); use DBI; use parent 'HTTP::Message'; $mess = HTTP::Message->new(); $mess->encode(gzip,deflate); my $ua = LWP::UserAgent->new(timeout=>10); $mess = HTTP::Message->new(); $mess->encode(gzip,deflate); $ua->default_header('Accept Encoding'=>$mess=HTTP::Message->new()); $ua->default_header( USER_AGENT =>'COMPANY admin@example.com' ); my $SQL = "select url,filename from linktable"; my $sth = $dbh->prepare($SQL) or die "Prepare".$dbh->errstr; $sth-> execute() or die "".$dbh->errstr; while (my $row = $sth->fetchrow_arrayref) { my ($url,$filename)= ($row->[0],$row->[1]); print "\n$row[0] $row[1]\n"; my $resp = $ua->mirror( $url,$filename); if ( $resp->{success} ) { print "OK\n"; } else { print "Failure: $resp->{status}, $resp->{reason}\n"; } }

      no strict, no warnings, no creation of a database handle object, it's almost as though you've ignored everything 1nickt has provided in this thread...

        The error I am getting is 403 forbidden.

        so the issue is not with the database related code. it is with the HTML::Tiny or LWP settings.

        here is the output with use warnings on

        Unquoted string "gzip" may clash with future reserved word at testdown +.pl line 9. Unquoted string "deflate" may clash with future reserved word at testd +own.pl line 9. Unquoted string "gzip" may clash with future reserved word at testdown +.pl line 30. Unquoted string "deflate" may clash with future reserved word at testd +own.pl line 30. Unquoted string "gzip" may clash with future reserved word at testdown +.pl line 33. Unquoted string "deflate" may clash with future reserved word at testd +own.pl line 33. Failure: 403, Forbidden Failure: 403, Forbidden Failure: 403, Forbidden Failure: 403, Forbidden Failure: 403, Forbidden

        These is what Securities and Exchange Commission says about downloading from them

        Fair access Current max request rate: 10 requests/second. To ensure everyone has equitable access to SEC EDGAR content, please u +se efficient scripting. Download only what you need and please modera +te requests to minimize server load. SEC reserves the right to limit request rates to preserve fair access +for all users. See our Internet Security Policy for our current rate +request limit. The SEC does not allow botnets or automated tools to crawl the site. A +ny request that has been identified as part of a botnet or an automat +ed tool outside of the acceptable policy will be managed to ensure fa +ir access for all users. Please declare your user agent in request headers: Sample Declared Bot Request Headers: User-Agent: Sample Company Name AdminContact@<sample company domain>.com Accept-Encoding: gzip, deflate Host: www.sec.gov