Re^2: Process and combine two CSV files into one

I'm just not getting this join right. If I do a

my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day
+sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType FROM
+ hosts LEFT JOIN info ON hosts.IP = info.IP");
[download]

My 2 tables get joined. The only issue is that I get a warning:

Execution ERROR: Ambiguous column name 'IP' called from /usr/lib/perl5
+/vendor_perl/5.8.6/i586-linux-thread-multi/DBI.pm at 1557.
[download]

But if I try and change a column name, every field returned contains the "IP" value, regardless of what the true value should be.

Also, when I try to join a 3rd table it takes a long time. The 2 tables are done in less than 15 seconds. When I add the 3rd table it's just over 5 minutes.

my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day
+sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType, Par
+titionFree FROM hosts, disks LEFT JOIN info ON hosts.IP = info.IP AND
+ hosts.IP = disks.IP");
[download]

In the end, I want ALL data from hosts, and the relevant data from the other tables.
Once I have the these 3 tables combined, I'd like to print them out in a CSV type format. How do I do that?

Comment on Re^2: Process and combine two CSV files into one Select or Download Code

Replies are listed 'Best First'.

Re^3: Process and combine two CSV files into one
by anonymized user 468275 (Curate) on Aug 15, 2005 at 08:02 UTC

Changing the column name (all other things being equal) simply forced it to interpret the same as a literal - you want to undo that change.

It is normal for three tables to take much longer than two to join. The performance strategy for joining tables 1..n where n>2 is as follows:-

- Join the first two tables placing the required columns in a temporary table.

- Then join the third table with the temporary table, putting the results in a second temporary table and drop the first temporary table.

- Continue this iterative process of joining a results temporary table with the next real table until joining the last real table with the last temporary table at which point the final results can be obtained directly instead of storing in a temporary table - this way no more than two tables are physically joined at once, whereas any number of tables have been logically joined.

- If this query is intended to be re-used it should be placed in a stored procedure, not inside perl code, to prevent unnecessary communications overheads during execution, especially now that it has been split up. For this reason, ideally in terms of performance as well as other considerations, any re-used process of more than one SQL statement should be placed inside a stored procedure.

Hope this helps!

-S

One world, one people

[reply]


Syntactic Confectionery Delight
	PerlMonks