in reply to Array of Hash interface to an Array of Array?

If you are dealing with that much data, you obviously have some means of storing it - most likely a database.... or something that can be viewed as a database - there are several modules that let you view a file as a database - I have heard of Berkleydb, but have not used that myself.

I suggest that you leverage the database methods, particularly DBI to manage accessing and storing this information. This will make your code easier to read, and a lot more scalable.

  • Comment on Re: Array of Hash interface to an Array of Array?

Replies are listed 'Best First'.
Re: Re: Array of Hash interface to an Array of Array?
by jaa (Friar) on Sep 01, 2003 at 14:18 UTC
    Thanks for the suggestion - I am familiar with databases, stored procs, set-relational theory/practice and various flavours of SQL, and the DBI interface.

    Unfortunately the DP in this instance cannot be expressed in SQL - without going into detail, it is complex and iterative.

    Regards,

    Jeff

    PS My favourite SQL shortcoming is:

    select name, sum(value) as value from mytable group by name order by abs(value) desc

    I only know of a single SQL engine that will perform this query correctly

      My favourite SQL shortcoming is:

      OK, at the risk of making a fool of myself, I'll bite.

      I believe that that statement is syntactically incorrect. If it works in one engine, I would say it is broken. (Which one are we talking about, by the way)?

      When you say sum(value) as value, in your order by clause, you can't use value (i.e. the right hand side of the selected item clause) as the aggregator. You must use sum(value) (the left hand side of the selected item clause).

      I just whipped up a sample table in Postgresql. It chokes on your example, but renders the correct results on mine. Consider:

      select * from foo; name | value ------+------- foo | -10 foo | -6 foo | 12 foo | 34 bar | 6 bar | 23 bar | 76 rat | -6 rat | -60 (9 rows)

      and the result:

      select name, sum(value) as value from foo group by name order by abs(s +um(value)) desc; name | value ------+------- bar | 105 rat | -66 foo | 30 (3 rows)

      Looks good to me. Now, it may well be that you can use what I term "the right hand side" in group by, but I think I'll continue using the left hand side. Even if it is clunky and redundantly verbose, at least I know what's going on.

      And I have been known to write Perl scripts that generate selected aggregates paired up with corresponding group by clauses just to ensure that it's all correct and avoid breaking the DRY principle.

        8-)

        I like this SQL trap because I believe that it is natural to want to sort descending by an absolute aggregated value.

        The real issue is that the alias of the result column being the same as the name of an input column causes some SQL engines to deliver incorrect results without alerting the programmer to the hidden problem.

        Many SQL engines attempt to eliminate the sort phase by assembling the data in the correct order whilst examining / aggregating input rows. In most engines, a GROUP BY will automatically order a result set, but when present, the ORDER BY will take precedence. When the ORDER BY contains a function - e.g ABS() the function is called with the current input row value - not the aggregated resultset value which has not yet been finalised.

        Often, the results are only SOMETIMES incorrect - the nastiest form of bug. I tracked down an instance of this rare beetle, that was only seen on a handful of occasions in two years of processing.

        When I raised it with ****, the initial response was 'Why would you want to anyway?', followed by redirection to details of the SQL spec which indeed does not mention this facility.

        The following SQL at least complains if the SQL engine doesn't support sorting on an absolute aggregated value:

        select name, sum(value) as sumval from mytable group by name order by abs(sumval) desc

        I prefer named resultset columns rather than

        order by abs(2) desc

        I haven't worked with Postgresql to know if it does the right thing - which 'value' column was summed, absed and sorted during the query? Changing the alias would show us if it is really doing the Right Thing - if so, hoorah! A second DB that caters to common sense!

        FWIW the DWIM DB engine that works in this case is probably long defunct - I haven't seen it for years - it was called FoxPro, and I think it was captured by the Big Bad Wolf.