Sandy_Bio_Perl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I thought I had stumbled across a great module that would allow me to apply sql queries to a spreadsheet saved in csv format. The queries work well when the cells of the csv file contain text, but not when they contain numbers, producing the error message "Use of uninitialized value in concatenation (.) or string at"

I have attached the relevant bits of my code below. Any help would be hugely apprec1ated. Thank you

# Connect to the database, (the directory containing our csv file(s)) my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => ".", f_ext => ".csv", f_encoding => "utf-8", RaiseError => 1, }) or die $DBI::errstr; # Output the name and contact field from each row my $sth = $dbh->prepare (qq(select * from newCsv.csv)); $sth->execute; while (my $row = $sth->fetchrow_hashref) { print "row = $row->{sid} \tgender = $row->{gender} \t\tgenotype += $row->{name} \n"; ## this works well all cells contain only text print "test group = $row->{tGroup} \n"; ## creates 'use of unini +tialized value in concatenation (.)... etc. tGroup contains numbers } $sth->finish(); $dbh->disconnect();

Replies are listed 'Best First'.
Re: dbi::CSV works with text, but not numbers
by Marshall (Canon) on May 20, 2016 at 00:00 UTC
    I've also used dbi:CSV before and it is way cool. I think stevieb's comments are completely right and that this will solve your problem.

    At some point, you may find that the CSV DB doesn't fit your needs in terms of features or performance. If that happens, I would recommend SQLite. And also check out SQlite Manager for Firefox. SQLite does a lot of what MySQL can do but without the admin hassle. In a multi-process environment, SQLite requires that an exclusive lock be acquired to update the dB for a write. So performance is not the same for writes, but for reads, this thing works fantastic! For complex operations, I found that you can dynamically vary the amount of memory that SQLite uses. That kind of fine tuning can have a big performance impact.

    Update: The reason that SQLite is so cool is that it uses a simple file for its work (rough comparison to DBD CSV) So it does not require a server to be running (along with the admin hassles with that).

      Thank you Marshall. I'll give SQLite a try

        I also highly recommend the Firefox add-in in my original post. That thing is amazingly good! I just installed it on this Windows laptop that I'm using... Since Mozilla uses .sqlite files, you will be able to select under directories, say, cookies.sqlite.

        Under structure, it will tell you how that table was created:

        CREATE TABLE moz_cookies (id INTEGER PRIMARY KEY, baseDomain TEXT, ori +ginAttributes TEXT NOT NULL DEFAULT '', name TEXT, value TEXT, host T +EXT, path TEXT, expiry INTEGER, lastAccessed INTEGER, creationTime IN +TEGER, isSecure INTEGER, isHttpOnly INTEGER, appId INTEGER DEFAULT 0, + inBrowserElement INTEGER DEFAULT 0, CONSTRAINT moz_uniqueid UNIQUE ( +name, host, path, originAttributes))
        Anyway if you have Firefox, you already have some SQLite DB's on your machine.
Re: dbi::CSV works with text, but not numbers
by stevieb (Canon) on May 19, 2016 at 22:37 UTC

    Welcome to the Monastery, Sandy_Bio_Perl!

    update: I found the issue. When $row is populated from the DB, all of the column names get lower-cased. So if you change tGroup to tgroup when accessing it in row, that should fix the problem. /update

    What I'd do, is dump out the entire row using Data::Dumper. Add a use Data::Dumper; near the top of the script, and then:

    while (my $row = $sth->fetchrow_hashref) { print Dumper $row; next; }

    ...and check to ensure that the structure contains the proper keys and values. Note that the keys in $row are case-sensitive. If your DB has a lot of rows, you may want to replace next with last, especially if you can trigger the warning on the first iteration.

      col_names skip_first_row By default DBD::CSV assumes that column names are stored in + the first row of the CSV file and sanitizes them (see "raw_head +er" below). If this is not the case, you can supply an array re +f of table names with the col_names attribute. In that case the attribute skip_first_row will be set to FALSE. If you supply an empty array ref, the driver will read the +first row for you, count the number of columns and create column +names like "col0", "col1", ... Note that column names that match reserved SQL words will c +ause unwanted and sometimes confusing errors. If your CSV has he +aders that match reserved words, you will require these two attri +butes. ... raw_header Due to the SQL standard, field names cannot contain special characters like a dot (".") or a space (" ") unless the col +umn names are quoted. Following the approach of mdb_tools, all + these tokens are translated to an underscore ("_") when reading t +he first line of the CSV file, so all field names are 'sanitized'. I +f you do not want this to happen, set "raw_header" to a true value a +nd the entries in the first line of the CSV data will be used verb +atim for column headers and field names. DBD::CSV cannot guarantee +that any part in the toolchain will work if field names have those characters, and the chances are high that the SQL statement +s will fail.

      I'll think of a more clear way to document what this sanatation actually means.


      Enjoy, Have FUN! H.Merijn

        Thank you for your help. Very much appreciated

      Thank you stevieb. It works wonderfully now. And thank you and the rest of the Abbots for helping the novice community

Re: dbi::CSV works with text, but not numbers
by NetWallah (Canon) on May 20, 2016 at 00:16 UTC
    I completely agree with Marshall (++) on the beauty and performance of SQLite.

    I also have another suggestion : Post a small sample of your csv file to this site, sufficient to demonstrate the problem. I'm sure our XP-hungry population will be glad to QA test and debug this for you.

            This is not an optical illusion, it just looks like one.

      Thank you, NetWallah. Not sure how to upload my csv file, (it has many rows) but I pasted a few rows below.

      sid name genotype gender tGroup DNABl DNATw2 DNAT +w4 DNATw8 DNATw12 DNATw16 DNATw24 DNATw48 DNATw8-Bl + eAgAbScr eAgABBl eAgAbTw12 eAgAbTw24 eAgAbTw48 ALT +Scr ALTBl ALTTw2 ALTTw4 ALTTw8 ALTTw12 ALTTw16 A +LTTw24 ALTTw48 coreTcellBl coreTcellTw4 coreTcellTw8 c +oreTcellTw12 coreTcellTw16 coreTcellTw24 coreTcellTw48 su +rfaceTcellBl surfaceTcellTw4 surfaceTcellTw8 surfaceTcellTw1 +2 surfaceTcellTw16 surfaceTcellTw24 surfaceTcellTw48 HLA_ +A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_DRB1_1 HLA_DRB1_2 H +LA_DRB3_1 HLA_DRB3_2 HLA_DRB4_1 HLA_DRB4_2 HLA_DRB5_1 +HLA_DRB5_2 HLA_X_1 HLA_X_2 1 AUC-001 C Female 3 7.8 3.85 3.47 2.83 2.1 +8 2.69 2.18 2.176 -4.97 + / - + / - + / - + / + - + / - 617 69 47 35 28 34 24 25 22 + 55 xx 130 36.66666667 5 23.33333333 0 50 xx + 201.6666667 0 0 0 0 *0201/0207/0209/0215N/0218/0220 +/0224/0229/0230/0231/0233/0234/0242/0243N *0301/04 *0702/05/0 +6/07/21/22/23/26/30/33 *4001/14/22N/23/33/42/43/45 *1101/04/12/ +15/24/27/28/29/32/33/35/37/38/39/43 *1501/02/04/05/06/07/08/09/11/ +13 *0201/02/03/06/10/11/12/14/15/17/18 *0101/05/09/ +11/12/13 2 AUC-002 B Male 3 11.13 10.51 7.69 5.95 6. +74 5.66 5.38 5.152 -5.18 + / - + / - + / - + +/ - + / - 82 61 72 55 73 66 95 87 70 + 31.66666667 65 xx 3.333333333 5 0 65 78.3333333 +3 56.66666667 xx 0 0 0 25 *0201/0204/0207/0209/0 +215N/0217/0218/0220/0224/0229/0230/0231/0232N/0233/0239/0242/0243N + *0203 *4601/02 *4001/4010/4022N *1401/26/32/38/39 * +1405 3 AUC-003 C Male 3 9.03 7.7 7.51 5.67 4.74 + 4.3 3.34 2.176 -3.36 + / - + / - + / - + / - + + / - 87 102 83 102 107 89 57 47 26 +33.33333333 46.66666667 18.33333333 15 75 15 0 3 +3.33333333 40 30 0 0 0 5 *3101/02/05 *2402/2 +403/2409N/2411N/2415/2420/2421/2425/2426/2427 *1538 *4002/29 + *0801/03/14/16 *1101/12/15/19/24/27/28/29/39 + 4 AUC-004 B Male 1 7.23 3.43 2.66 2.18 2.18 + 2.18 2.18 2.176 -5.05 - / + - / + - / + - / ++ - / + 71 335 241 200 66 51 46 31 28 + 107 157 18 xx 168 45 10 120 168 37 xx + 0 0 0 *1101/02/03/05 *1101/02/03 *5502/07 *1 +518 *0801/03/14/16 *1201/02/06