in reply to Combining Ffile:fetch with MySQL

I am not sure what the naming convention is? Is "uniquedata" something that changes on a per month basis? And if so, can you predict with certainty what it will be next month?

Could you take a couple of example documents and show what the full URL is to these documents for say June 2022 and July 2022?.

Are you saving the .pdf doc as a "BLOB" inside the SQLite DB or are you just saving a directory path on your local machine?

I have a similar web application. My app runs once per hour, looks around on part of a particular website for any new links. If it finds one, it "clicks" on it, to see there is anything there is "interesting or not". If so, the interesting data is saved in an SQLite DB. In any event, I save the URL in the DB so that I don't go there again. If nothing changed on the website, it figures that out very efficiently. This thing has been running every hour for the past 6 years, so it is possible for apps like this to work out very well. I use WWW::Mechanize but you don't seem to need the sophistication of making sense of a webpage? Or do you?

Update:
Could you show your table schema?
I would be thinking along the lines of:
URL text (where the data came from)
Version Datetime (this is actually just text (not numeric) yyyy-mm-dd hh:mm:ss)
- you can omit time and I think also the dd if not available
- leading zeroes are mandatory because this column must be in
- ASCII sort order
Downloaded Datetime (optional but often handy to know)
title text (Name of the document)
pdf blob (actual pdf file)

Replies are listed 'Best First'.
Re^2: Combining Ffile:fetch with MySQL
by justin423 (Scribe) on Jul 18, 2022 at 05:57 UTC
    The uniquedata is a standard industry identifier that only occurs for that link, and I get a download of all the identifiers and date the files are published, and because it is a PDF, I want to download them into a folder for review later (the links are only temporary and valid for 60 days after month end, and then the PDF's are removed from the website) and not save the blob file in the database, but could do that. the table schema is one table. Identifer CHAR(9), publish date CHAR(8) LINK CHAR(255) Publish date is YYYYMMDD and link is just concat('staticlink/','identifier',/staticlink') as link I just need to get the links from the database into perl for Perl to download them.
      Ok, now I understand better. You are using an existing DB, not making one? You need to use the Perl DBI - Data Base Interface. There are links to the FAQ's and some tutorials. You can see some Monk links here: https://www.perlmonks.org/?node=Tutorials#Database-Programming.

      A wild guess and some untested example code would be:
      Instead of printing the link, you would use your code to download that doc, then do whatever you are going to do with it.

      Update: I see that you are using MySQL instead of SQLite. Brain cramp on my part. But code is essentially the same - just a difference in how to get the DB connection. The dbi would be: dbi::MySQL, then you need an account and password. For SQLite, these are null strings. See the doc's referenced above for a connection example with username/password.

      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; use DBI qw(:sql_types); my $dbfile = "./YourDbName.sqlite"; my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseError = +> 1}) or die "Couldn't connect to database: " . DBI->errstr; my $get_links_4date_sql = "SELECT Identifer, LINK FROM YourTableName WHERE Publish date IS ?"); my $get_links_4date = $dbh->prepare ($get_links_4date_sql); my $date = "20220701"; $get_links_4date->execute($date); my $array_ref = $get_links_4date->fetchall_arrayref; foreach my $row_ref (@$array_ref) { my ($id,$link) = @$row_ref; print "$id \t $link\n"; }
      I can't anticipate exactly what kind of SQL you need. Above just gets docs for a particular date. Although sounds like what you need is: for each unique id, download latest version of the document. The SQL for that is of course more involved but doable.

      Hope this gets you further along your way.