hrholmer has asked for the wisdom of the Perl Monks concerning the following question:

I've got a site or three where I need to track incoming URLs that have been received before, and what data was presented for that URL -- the URLs are specific to certain users/groups/whathaveyou, and they need to be able to link back to a given URL and see about 20 variables that are the same as the last time they visited -- or when they send their friends the links, the friends see the same thing instead of a jumble of too much rotating content. Anywho, a simple CGI Script and pearl .pm run it and I don't use most of it. But I do use modules that let me put out rotating content.

That's fine for a first visit to a URL by anybody -- to just get rotating content, but once a URL has been visited for the very first time, I want to immediately record its unique URL structure, and store 12 numbers along with the key part of that structure in a database, and I'm thinking a CSV is good enough since I don't want to mess with SQL. Sites, btw, are plain old, out-of-date HTML, and the server is dedicated and new and very fast and I'm about the only user on it, so: 1.) Plenty of horsepower (40-core, 128GB DDR4) and fast, ample fast storage (NVMe drivess in Raid1) and cPanel/WHM on CentOS 7.4newest kept current with KernelCare and running CloudLinux with CageFS and mod_lsapi. ...and 2.) I'm running Perl5 and from WHM I can easily add ANY module CPAN offers, so I can be very flexible in solution if somebody is going to end up telling me to use, say, TEXT::CSV in a certain flavor, for instance.

So out-of-the-box "total" solutions are great too. On to problem at hand. To make database small I hacked one of my modules that rotates content to hand back to the server an extra value in addition to the content it served out when a URL is called. The module creates a second MYvariable, let's call it $consistent_lineX, (where X is the module it came from -- yes 1 module per rotated content displayed...) and that variable holds the line number of the data that was served out from a .txt file to the viewer when the URL was first called. (obviously the module also puts out the content itself, which gets displayed in a template.) But now I know the line number of where the data came from and I can use it when the same URL is called and make sure that the software goes to the .txt storage file and pulls whatever is in that line number and re-displays it -- so the user has a consistent experience.

By hacking some existing code I am able to get the URL called into a variable and then I also have 12 more variables $consistent_line1 - $consistent_line12 that have the line numbers stored of what was served out from EACH of these mods' .txt files last time the URL was seen. I already have figured out how to handle a repeat-visit to a URL. I can open my .CSV and find the line that matches on the url-string (first field) and I can fetch the string that matches and I can take those line numbers and go to each module's .txt file and pull out the data that was previously served out last time that URL was called, and put it into the variables that will feed into the template. User sees same 12 things, and all I had to store was line number, not the whole big batch of data. That's the point... to keep the cache small

And it is small, it's 13 fields per line, with the first field being part of the URL to match on to see if it's been served out before, and the remaining 12 fields of each record are the line numbers presented from modules 1 through 12 in order. I've got the matching figured out and the retrieval figured out and all that when a URL is repeated. BUT, my problem is new URLs. The first time a URL is seen, it's parsed (SPLIT) in two, and the back half (2nd of 2 arrays) is where I've got it now in a file, let's call it: $parsed_url1.

So, when the value of that $parsed_url1 doesn't match the first field in the cache, I know the URL hasn't been seen before, BUT I want the user to have a consistent experience, so I need to append a new record, where the first field is that back half of the parsed URL held now held in the 2nd array field of $parsed_url variable ...in $parsed1, and the other 12 fields of this appended (new) record need to come from $consistent_line1, $consistent_line2, $consistent_line3, etc. variables, in order, up to $consistent_line12.

So I can't figure out the best, neatest way to get those pushed? printed? what? into >> that .csv file as a new record. I've read a lot of "solutions" for other (not terribly similar) issues that come from using TEXT::CSVxxx, and I'm happy to install ANY Perl Modules on my system that will let me do this efficiently.

I've got to append a new record to a .CSV file (it's not to late to change that to TDV or whatever, just nothing complicated, and no SQL) and to fill 13 fields with half of a split variable plus 12 more contents held in variables, in order.

I started playing with the long way around where I transfer the value of that 2nd array 1 from the first file to a temporary variable, then maybe concatenate, then push, or try to get all the 13 values into 1 variable split into an array of 13 then pushing it in with something like push (@temporary,(join, $whatever, @array),"\n")and then using  foreach (@temporary) print $_; print $cachefile $pray_it_appended Yes, I'm not putting it in right syntax that's not my question -- this is bigger picture. I can get syntax right once I have a solution route, and I'm SO LOST looking for one -- that's to give you and an idea of the different "solutions" I'm seeing -- that particular one takes a lot of fiddling around to get all the field values into a single file split into an array, and then a lot of code to push/print/cajole them eventually into my cache.csv as an appended record without overwriting or worse.

1. Yes, I'm lost. But I know I'm lost and that's the first step. Need advice on shortest route between my newby stupidity here and appending those records for first-time seen URLs over there

2. Yes, I got the rest of it working -- I know you don't believe that... But since I can't append records, I tested the rest of it it by filling the cache.csv file with made-up records and running the script and it does indeed find a matching first field when compared with the same portion of the incoming URL, and yes, it does go to the text files and goes to the line number in each file,  something like while(<QUOTEFILE>) $used_url_line = $_ if /\(some version of $parsed_url[1])\/ and pulls out the full line content from the cache on a first field match and I chomp it to ditch newline, and then parse it into 13 array fields and then do 12 separate operations on 12 separate files (cause I only have the file line where the data lives, not the data, yet) to open the text file and go to the $selectedline for each and use a pipe split to grab from newline back and then use the [0] field of each of those variables arrays (the part with the actual data in it) to convey that data to the acutual variables that go to each of the 12 fields in the templates. So I'm sorta freehanding what I did here at 27 hours awake when I show code, but you get the point... the rest of it DOES work.

3. And it's not even slow, but surely somewhat messy, but I'm learning, and now, NOW I'm just stuck on appending the new record to the .CSV filling 13 fields with the value of 13 variables, the first of which (the one needing to populate the first field) holds the value in the 2nd half of its array... I know this is more reading that you ever wanted to do, and I'll take all the slams in the world, but I'm having fun... I'm just stuck on appending a 13-variable CSV record from 12.5 files... Now I'll shut up and take my medicine in the form of your laughter, and hopefully help

Reminder that I can and will install ANY Perl Modules to my server from the thousands at CPAN for a shortcut. Oh, yeah, and I don't use file-locking, oops, forgot that, and sometimes the bots come raiding and ignore my robots.txt instructions to go SLOWLY... Yeah, so there's that... No, I'm not high... maybe tired from cobbling together this little mess here...

If you're still reading this, my THANKS for your patience in this alone. You are truly serene if you made it to here.

  • Comment on Appending single record to CSV (or TDV, not too late to switch) - filling 13 fields from 13 files, one of which is split into array of 2 and I just need half of it...
  • Select or Download Code

Replies are listed 'Best First'.
Re: Appending single record to CSV (or TDV, not too late to switch) - filling 13 fields from 13 files, one of which is split into array of 2 and I just need half of it...
by Tux (Canon) on Jan 22, 2018 at 12:16 UTC

    Currently the de-facto CSV parsing (and generating) modules on CPAN are Text::CSV_XS and Text::CSV. The latter will use the first if installed. The XS version is up to 100 times as fast. YMMV.

    Be prepared to read a lot of documentation if you've never worked with it. It starts very simple, but hairy CSV files can make you need a lot of options.


    Enjoy, Have FUN! H.Merijn
Re: Appending single record to CSV (or TDV, not too late to switch) - filling 13 fields from 13 files, one of which is split into array of 2 and I just need half of it...
by poj (Abbot) on Jan 22, 2018 at 19:55 UTC
    and I'm thinking a CSV is good enough since I don't want to mess with SQL

    Why not use both :) A simple demo for you

    #!/usr/bin/perl use strict; use warnings; use DBD::CSV; # create db handle my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_ext => ".csv/r", RaiseError => 1, }) or die "Cannot connect: $DBI::errstr"; # create csv table if not exists my $table = 'mycache'; unless (-e $table.'.csv'){ $dbh->do ("CREATE TABLE $table ( uid char(100), line1 char(100), line2 char(100), line3 char(100), datetime char(20) ) "); } while (1) { # show records my $ar = $dbh->selectall_arrayref("SELECT uid FROM $table ORDER BY u +id"); print "$table.csv contains\n"; print " $_->[0]\n" for @$ar; print "\n"; # input new or existing record print "Input unique id (q to quit,c to clear) > "; chomp( my $input = <STDIN> ); exit if lc $input eq 'q'; # clear if (lc $input eq 'c'){ $dbh->do("DELETE FROM $table"); next; } # check existing or new my @f = $dbh->selectrow_array(" SELECT * FROM $table WHERE uid = ?",undef,$input); if (@f){ print "--\n $input EXISTS : @f \n--\n"; } else { my @f = ($input,'init1','init2','init3',scalar localtime); print "--\n $input NEW RECORD : @f \n--\n"; $dbh->do("INSERT INTO $table VALUES (?,?,?,?,?)",undef,@f); } }
    poj
Re: Appending single record to CSV (or TDV, not too late to switch) - filling 13 fields from 13 files, one of which is split into array of 2 and I just need half of it...
by kcott (Archbishop) on Jan 23, 2018 at 08:25 UTC

    G'day hrholmer,

    Welcome to the Monastery.

    [This post was difficult to read, which you acknowledge. For future reference, avoid prosaic descriptions and this chatty style you've adopted. Choose concise and succinct statements to describe your problem; use short dot points instead of drawn out paragraphs; put code in blocks and use pseudocode if don't know what syntax you need.]

    The following script, pm_1207659_csv_append.pl, provides some techniques which you may find useful (assuming I've got the basic idea of what you're trying to accomplish).

    #!/usr/bin/env perl -l use strict; use warnings; use autodie; use Text::CSV; use Inline::Files; my $csv = Text::CSV::->new(); my $csv_file = 'pm_1207659_out.csv'; my %seen; open my $csv_fh, '<', $csv_file; while (my $row = $csv->getline($csv_fh)) { $seen{$row->[0]} = 1; } close $csv_fh; my @in_fhs = (\*FILE1, \*FILE2, \*FILE3); open my $out_fh, '>>', $csv_file; while (1) { my @data = map scalar readline $_, @in_fhs; last unless defined $data[0]; chomp @data; my $key = (split /_/, $data[0])[1]; next if $seen{$key}++; $csv->print($out_fh, [$key, @data[1 .. $#in_fhs]]); } close $out_fh; __FILE1__ A_B B_B C_A B_C C_D __FILE2__ F2-1 F2-2 F2-3 F2-4 F2-5 __FILE3__ F3-1 F3-2 F3-3 F3-4 F3-5

    Notes:

    • Let Perl do your I/O checking with the autodie pragma.
    • Use Text::CSV as ++Tux has already discussed. As the name suggests, comma-separated is the default; use the sep_char attribute if you want something different (e.g. "\t" for tabs).
    • I've only used 3 (not 13) files. Inline::Files is for demonstration purposes. You'll need to actually open real files.
    • I've created 'pm_1207659_out.csv' with some initial data to show handling existing duplicates. The new data also contains duplicates which is handled using the same %seen hash. As stated above, this is intended to demonstrate a technique: you'll need to adapt it to your needs.
    • You should also note that I've made gross assumptions about the data. For instance, all files have the same number of records; and all records contain only valid data. You'll need to add appropriate validation and checking for any production code.

    Here's a sample run with before and after data:

    $ cat pm_1207659_out.csv X,Y,Z C,what,ever some,thing,else $ pm_1207659_csv_append.pl $ cat pm_1207659_out.csv X,Y,Z C,what,ever some,thing,else B,F2-1,F3-1 A,F2-3,F3-3 D,F2-5,F3-5

    You should be able to follow this through and see how duplicate data (existing and new) is handled.

    — Ken