Re^3: Invert a hash... not a FAQ (I hope)

Replies are listed 'Best First'.

Re^4: Invert a hash... not a FAQ (I hope)
by djacobow (Initiate) on Jan 22, 2009 at 02:01 UTC

Yeah a database is an obvious solution that I am resisting just for pure stubbornness. (and because I am dreading the learning curve for DBI, SQL, etc)

I've also conjectured, too, that the "db that is the filesystem", indexed by date in my case, will be faster than a "real" database particularly considering that *usually* I can get all the information I need for a given subset of days in memory at once without a problem.

I'll be sad, though if after all the trouble, DBI ends up being slower.

[reply]

Re^5: Invert a hash... not a FAQ (I hope)

by tilly (Archbishop) on Jan 22, 2009 at 02:24 UTC

use DBI;
my $dbh = DBI->connect(
  "dbi:Pg:host=$host;database=$database",
  $user,
  $password,
  {
    AutoCommit => 0,
    RaiseError => 1,
  },
) or die "Can't connect: $DBI::errstr";

my $data = $dbh->selectall_arrayref(qq{
  SELECT MAX(price) as max_price
  FROM data_log
  WHERE price_date > now()::date - 5
    AND to_char(price_date, 'HH24') = '06'
}) or die "Cannot prepare: $DBI::errstr";

print $data->[0][0];
[download]

DBI

manual

[reply]
[d/l]

Re^5: Invert a hash... not a FAQ (I hope)

by Jenda (Abbot) on Jan 22, 2009 at 02:22 UTC

Two not so obvious advices.

1. batch the import. If the database you decide has some batch import tools able to handle your format, use them. Otherwise turn off AutoCommit when creating the DBI object and commit only once every thousand (ten thousands? ... that depends) records. This will speed the import up quite a bit.

2. make sure you define indexes on your tables. Not too few and not too many. If the database you choose allows you to see the "estimated execution plan" of the query generating the report, use that and make sure it doesn't use "table scans" on tables you only need a few rows from etc. Don't be afraid to play with this a bit, create an index, see what it does with the estimated execution plan and estimated price and see how long does it run ...

You may spend a lot of time with this at first, but as the management starts inventing more and more reports that they'd like, it will pay off.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re^6: Invert a hash... not a FAQ (I hope)

by jhourcle (Prior) on Jan 22, 2009 at 15:10 UTC

Depending on the size of the data load, if a loader tool isn't available and you're adding significantly more values than already exist in the table (eg, if you truncate and replace when you do your load), the indexing may slow down the import.

If you have this sort of situation, it may be better to drop the indexes, do the data load, then put back in the indexes. In some cases, if you don't trust the data being loaded, you may need to retain your unique indexes to verify that you don't have duplicated records.

[reply]