in reply to redundancy Checker

First ask yourself "what comprises redundant data?".


Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re^2: redundancy Checker
by Anonymous Monk on Jul 27, 2005 at 02:01 UTC
    For example the user input a list of data for a school. Then another user input another list of data for that school, and then other user and so on. So how do I detect the redundant data? TQ

      How do you determine redundant schools?

      I had to import data from a system, that might've had the 'University of Louisville Speed School' as 'UL' 'U of L' 'U Louisville' 'Univ. Louisville', 'Speed School', etc.

      If you're looking for exact string duplicates, it's fairly easy to just in SQL, assuming we're looking for duplicated entries of field1, field2:

      SELECT COUNT(*) AS duplicates, field1, field2 FROM some_table GROUP BY field1, field2 HAVING duplicates > 2

      Then you know which records to bother looking at, rather than having to go through the whole table.

      At the time the second (redundant) data is entered you should notice that there is already an entry in the data base for the re-entered data.

      At that time you either throw away the redundant data or replace/edit the existing data base entry.

      Perhaps you need to show us the sort of code you have currently and explain where the problem is?


      Perl is Huffman encoded by design.

        You still have to define "redundant". If you properly normalize addresses (something almost no one does), then each street should have one and only one entry in the database. However, five guys with the first name of "John" should probably not have that abstracted away into a single entry. Just because the data looks the same does not mean that it's the same thing.

        Further, and this is a heresy that many database purists would be horrified by, there are times that DBAs will deliberately leave data denormalized for performance reasons (though this should not be done until you've gone down other avenues of correcting the problem).

        We may be able to be more specific if you can describe at a higher level the problem you're trying to solve.

        Cheers,
        Ovid

        New address of my CGI Course.

        Well I haven't code anything yet for the redundancy checker part. I am still planning on how best to do it. Array?

        I've don the data input part, but that just a simple SQL insert, and all the data are place in the database

        i.e.

        data1 | data2 | data3 | data4 | data5 |

        big small large medium good

        extra size bad small nice