in reply to Re: Ignore Relation already exists error from DBI
in thread Ignore Relation already exists error from DBI

Thanks guys for the ideas I will just have to get the regular expression correct.

I am dealing with Netflow data which records flows through a router. I am just grouping the flow data from each day into a seperate table until it is a week old and then I will aggregate the data some more.

Does this still sound like a bad design?

  • Comment on Re: Re: Ignore Relation already exists error from DBI

Replies are listed 'Best First'.
Re: Re: Re: Ignore Relation already exists error from DBI
by Tomte (Priest) on Feb 19, 2003 at 12:55 UTC
    Does this still sound like a bad design?

    To my ears, it does :-)
    The date of the day makes for a perfect column, if you think about it.

    regards,
    tomte


      There is a timestamp entry for each record, it is just that I have to call the table something :-))

      Can you help with my regular expression. I was having difficulty so have tried both suggestions and this is what I'm getting back:

      DBD::Pg::st execute failed: ERROR: Relation '20030219' already exists at ./dbasetest.pl line 21.
      Database handle destroyed without explicit disconnect.
      Database handle destroyed without explicit disconnect.

      I have tried various bits and pieces, including: return if $error =~ /Relation\s.0-9*.\salready\sexists/;

      Thanks again.

        Well I guess cisco_netflow_log would be more accurate a name then... :-)
        You sounded as if you would create a new table each day, and that would be a thing to think over.

        #!/usr/bin/perl my $test = "Relation '20030219' already exists"; print "Result: ", $test =~ /Relation\s'[0-9]*'\salready\sexists/, "\n" +; __END__ Result: 1
        HTH,

        regards,
        tomte


Re: Re: Re: Ignore Relation already exists error from DBI
by dws (Chancellor) on Feb 19, 2003 at 17:19 UTC
    I am just grouping the flow data from each day into a seperate table until it is a week old and then I will aggregate the data some more. ... Does this still sound like a bad design?

    Yes. By keeping data in separate tables, you're making more work for yourself. Databases are good at aggregating information, and it's easiest if all of the data is in a single table.

    If I was doing this, I'd keep add a (non-key) field to hold the date, and would keep all data in a single file. Then, I could do something like

    SELECT sum(somefield) FROM table WHERE datefield BETWEEN ? AND ?
    to do aggregate calculations. Or
    SELECT datefield, sum(somefield) FROM table WHERE datafield BETWEEN ? AND ? GROUP BY datefield
    if I needed the aggregates to apply to distinct dates.

    Flushing old data is easy.

    DELETE from TABLE WHERE datafield < ?
    These are all static queries. By keeping all data in a single table, I don't have to reconstruct new queries to plug in a dynamic table name. And I'm letting the database do the aggregate calculations, rather than pulling all of the data over the wire to the client and doing it there.

    If your particular calculations are more complicated, and can't be done in SQL, you may still be able to do part of the aggregation on the server side, reducing the amount of data you have to pull across the wire into the client.

      Hi all,

      this has been great, thanks all for the input.

      At the moment even pushing only about 150Gb a month we are getting about 14000 flows per 5 minute period during the day. I just assumed that if I kept 5 minute data for a week even managing to aggregate somewhat that I would still end up with tables with millions of records in it.

      Having looked around the Postgres site maybe I'm being foolish :-) I will go with the one table and see how I go.

      Thanks all for the advice it has made my life a lot easier.