For example the user input a list of data for a school. Then another user input another list of data for that school, and then other user and so on.
So how do I detect the redundant data?
TQ | [reply] |
How do you determine redundant schools?
I had to import data from a system, that might've had the 'University of Louisville Speed School' as 'UL' 'U of L' 'U Louisville' 'Univ. Louisville', 'Speed School', etc.
If you're looking for exact string duplicates, it's fairly easy to just in SQL, assuming we're looking for duplicated entries of field1, field2:
SELECT COUNT(*) AS duplicates,
field1,
field2
FROM some_table
GROUP BY field1, field2
HAVING duplicates > 2
Then you know which records to bother looking at, rather than having to go through the whole table. | [reply] [d/l] |
At the time the second (redundant) data is entered you should notice that there is already an entry in the data base for the re-entered data.
At that time you either throw away the redundant data or replace/edit the existing data base entry.
Perhaps you need to show us the sort of code you have currently and explain where the problem is?
Perl is Huffman encoded by design.
| [reply] |
You still have to define "redundant". If you properly normalize addresses (something almost no one does), then each street should have one and only one entry in the database. However, five guys with the first name of "John" should probably not have that abstracted away into a single entry. Just because the data looks the same does not mean that it's the same thing.
Further, and this is a heresy that many database purists would be horrified by, there are times that DBAs will deliberately leave data denormalized for performance reasons (though this should not be done until you've gone down other avenues of correcting the problem).
We may be able to be more specific if you can describe at a higher level the problem you're trying to solve.
| [reply] |
Well I haven't code anything yet for the redundancy checker part. I am still planning on how best to do it. Array?
I've don the data input part, but that just a simple SQL insert, and all the data are place in the database
i.e.
data1 | data2 | data3 | data4 | data5 |
big small large medium good
extra size bad small nice
| [reply] |