in reply to Comparing CSV files

Well, have a look at my CSV table diff utility. In regard to the other answers so far - davido has suggested the basic modules, they all three can use alternate separators, not just commas. And dragonchild - diff wouldn't do too good of a job if the files had been created as database dumps or with DBD::CSV because even where they had the same rows, they might be in different places in the file depending on what had or hadn't been deleted, inserted, updated, etc.

Replies are listed 'Best First'.
Re^2: Comparing CSV files
by pg (Canon) on Oct 15, 2004 at 05:36 UTC

    DBD::CSV is a good choice, but it only supports very basic SQL statement, otherwise he can see what has been deleted just by saying:

    select foo, bar from table1 minus select foo, bar from table2

    And see what has been added by saying:

    select foo, bar from table2 minus select foo, bar from table1

    Not just minus is not supported, even sub-query is not well supported, so he has to do more coding than he should.

      Fascinating - I've never seen the minus keyword for SQL yet. Up to day, I've always used left outer joins to find out the missing parts between two sets of keys:

      -- in table2 but not in table1: SELECT r.id FROM table2 AS r LEFT JOIN table1 AS l ON r.id = l.id WHERE r.system = ? and l.id IS NULL and r.id IS NOT NULL -- in table1 but not in table2: SELECT l.id FROM table2 AS l LEFT JOIN table1 AS r ON l.id = r.id WHERE l.system = ? and l.id IS NOT NULL and r.id IS NULL SQL

      SQLite doesn't seem to document the minus keyword, so I'll stay with my (left) outer join solution, as SQLite doesn't support the full outer join. Of course, I haven't checked whether DBD::CSV supports outer joins, and I guess it doesn't jZed tells me it does, so these two SQL statements will be put to more and more use. How convenient if your whole business logic can be expressed in two SQL statements.

        Yes MINUS does look inteteresting. AFAIK it isn't available for MySQL, PostgresSQL, SQLite, or mentioned in ANSI SQL.

        As for DBD::CSV, pg is correct that it lacks subselects, but it does support NATURAL, INNER, LEFT, RIGHT, and FULL OUTER joins of two tables and would work with the examples Corion shows.