Combine data in table

DrAxeman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Combine data in table by davidrw (Prior) on Aug 07, 2005 at 17:40 UTC
It's just a single SQL statement, which should be easy since you have it in DBD::CSV already.. `SELECT colIP, SUM(colPartitionSize) as size, SUM(colPartitionFree) as free FROM table_name GROUP BY colIP` [download] Otherwise, to do it in perl, you would loop over the rows, hashing on the IP. Something like: `my $rows = ...; # ref to AoH my %ips; foreach my $row (@$rows){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;` [download]	[reply] [d/l] [select]
Re^2: Combine data in table by DrAxeman (Scribe) on Aug 07, 2005 at 18:02 UTC
This is what I did: `# Connect to the database my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => "file.csv"}; my $row = "IP,Partition,PartitionSize,PartitionFree"; my %ips; foreach my $row (@$row){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;` [download] But I am getting: `Can't use string ("IP,Partition,PartitionSize,Parti") as an ARRAY ref +while "strict refs" in use at ./psfile.pl line 39.` [download]	[reply] [d/l] [select]
Re^3: Combine data in table by davidrw (Prior) on Aug 07, 2005 at 18:09 UTC
I would recommend using the "SELECT ... GROUP BY" SQL statement -- it's very conscise and exactly what you want, and leverages the fact you're using DBD/CSV .. As for your code in this post, though, you have `my $row = "IP,Partition,PartitionSize,PartitionFree";` which just sets some string as the value for `$row` .. you need it to be all the data as an array of hashrefs .. `my $rows = $dbh->selectall_arrayref("SELECT * FROM your_table", {Slice +=>{}}, ); my %ips; foreach my $row (@$rows){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;` [download] (Note that this is inefficient, as well -- would be better to use a `$sth` and fetch one row at a time instead of sucking in all the rows at once)	[reply] [d/l] [select]
Re^4: Combine data in table by DrAxeman (Scribe) on Aug 07, 2005 at 19:10 UTC
Re^5: Combine data in table by davidrw (Prior) on Aug 07, 2005 at 19:30 UTC
Some notes below your chosen depth have not been shown here
Re^4: Combine data in table by DrAxeman (Scribe) on Aug 07, 2005 at 23:25 UTC
Re^5: Combine data in table by davidrw (Prior) on Aug 08, 2005 at 01:19 UTC
Re: Combine data in table by jZed (Prior) on Aug 07, 2005 at 18:22 UTC
The solution is even easier than the other (good) suggestions in this thread. Use the new CREATE TABLE AS SELECT syntax to combine the three operations (remove a column, sum the columns, create a new table with the sums) into a single statement. This is a complete script that does exactly that, it creates a new CSV file called "partition_new.csv" with exactly the result set you specified: #!/usr/bin/perl -w use strict; use DBI; my $dbh=DBI->connect("dbi:CSV(RaiseError=1):csv_eol=\n"); $dbh->{csv_tables}->{partition}->{file}='partition.csv'; $dbh->{csv_tables}->{partition_new}->{file}='partition_new.csv'; $dbh->do("DROP TABLE IF EXISTS partition_new"); $dbh->do(" CREATE TABLE partition_new AS SELECT colIP , SUM(colPartitionSize) AS colPartitionSize , SUM(colPartitionFree) AS colPartitionFree FROM partition GROUP BY colIP "); print `cat partition_new.csv`; [download]	[reply] [d/l]
Re: Combine data in table by davido (Cardinal) on Aug 07, 2005 at 17:03 UTC
Is the dataset so large that you need a relational database's SQL to manage it, or could you simply use Text::CSV to translate the CSV file into an easily manipulated datastructure? Dave	[reply]
Re^2: Combine data in table by jZed (Prior) on Aug 07, 2005 at 18:34 UTC
I'd suggest that the size of the dataset is not the deciding factor bewtween using something like Text::CSV_XS and DBD::CSV, although it could be a factor in deciding between DBD::CSV and more robust DBMSs like SQLite or PostgreSQL or MySQL. IMNSHO, the deciding factors in choosing between a Perl parsing approach and a database approach would be the comfort level of the programmer with SQL and the kind and complexities of operations. Sure, parsing modules and parsing without modules can delete, and insert, and update, and sum, and average, and modify column structure, etc. But when one finds oneself doing many of those operations, a database approach may be called for.	[reply]
Re^2: Combine data in table by DrAxeman (Scribe) on Aug 07, 2005 at 17:28 UTC
I've been using DBI and TEXT::CSV for my work. The files are not that large.	[reply]
Re: Combine data in table by AReed (Pilgrim) on Aug 07, 2005 at 18:04 UTC
One possible way in perl: use strict; use warnings; my %ips; # Deal with the column headers my $cols = <>; chomp ($cols); my @cols = split(',',$cols); print "$cols[0],$cols[2],$cols[3]\n"; while (<>) { my ($address, undef, $size, $free) = split(','); $ips{$address}{'size'} += $size; $ips{$address}{'free'} += $free; } # Note that the order of the lines in the output may not reflect # the order of the addresses in the input file. for my $address(sort keys %ips) { print "$address,$ips{$address}{'size'},$ips{$address}{'free'}\n"; } [download]	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks