Check For Dupes In FLat DB Before Adding

lisaw has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Check For Dupes In FLat DB Before Adding by northwind (Hermit) on Dec 14, 2005 at 18:13 UTC
So what exactly is the file format of your flat file DB? As in, is "everything on one line", or "everything in a set number of lines", or "variable records"? In regards to reading in a file: `open(FILE_HANDLE, ">>", "$filename") \|\| die "ACK, GASP: $error_message +\n"; my @data = <FILE_HANDLE>; chomp(@data); close FILE_HANDLE;` [download] will slurp the entire flat file DB in and remove the newline character(s) from each line (depending on the size of the file, slurping entire files may not be a Good Idea^TM).	[reply] [d/l]
Re: Check For Dupes In FLat DB Before Adding by northwind (Hermit) on Dec 14, 2005 at 18:39 UTC
Reading between the lines of the OP, lets say the flat file is variable length records with one value per line and an empty line seperates records. Then this code should work: open(FILE_HANDLE, ">", "$filename") \|\| die "ACK, GASP: $error_message: + $!\n"; my @data = <FILE_HANDLE>; chomp(@data); close FILE_HANDLE; my %db_hash; my $unique_id = 0; foreach (@data) { next if(m/^\s$/); my %temp_hash = (); s/^\s+//; # Remove excess whitespace s/\s+$//; m/^(.?)=(.?)$/i; # Grab key/value pairs $temp_hash{$1} = $2; $db_hash{$unique_id} = \%temp_hash; $unique_id++; } open(NEW_FILE, "<", "$new_filename") \|\| die ACK, GASP: $error_message: + $!\n"; foreach my $id (sort keys %db_hash) { if(not defined $db_hash{$id}{email}) { $db_hash{$id}{email} = $some_value; } foreach (sort keys $db_hash{$id}) { print NEW_FILE "$_=$db_hash{$id}{$_}\n"; } print "\n"; } close NEW_FILE; [download] (Note:* Code is untested! Use at your own risk. Also, there are a huge number of assumtions made in the above code because the OP was more than a little vauge...)	[reply] [d/l]
Re: Check For Dupes In FLat DB Before Adding by nedals (Deacon) on Dec 15, 2005 at 02:52 UTC
If the file is not too big.... `my $found = 0; open(FH, "$filename") \|\| die "Cannot open $filename for reading: $!\n" +; while (<FH>) { chomp; my (email,name) = split('=',$_); if ($email eq $inputted_email) { $found = 1; break; ## exit loop; } } close FH; if (!$found) { ## Append to file open(FH, "<<", "$filename") \|\| die "Cannot open $filename to append: + $!\n"; print FH, "$inputted_email=$inputted_name\n"; close FH; }` [download]	[reply] [d/l]
Re^2: Check For Dupes In FLat DB Before Adding by chas (Priest) on Dec 15, 2005 at 05:49 UTC
Do you mean "last" rather than "break"? Also, I don't think "<<" is a valid open mode. (Update/comment: If the record is found, why not "exit" rather than just break out of the loop? And otherwise just fall through to the append sub.)	[reply]
Re^3: Check For Dupes In FLat DB Before Adding by nedals (Deacon) on Dec 15, 2005 at 06:24 UTC
Do you mean "last" rather than "break"? Also, I don't think "<<" is a valid open mode. You are right. Too much C recently. As to the open mode, it should read... `## Append to file open(FH, ">>", "$filename") \|\| die "Cannot open $filename to append: $ +!\n";` [download] If the record is found, why not "exit" rather than just break out of the loop? Because I don't know what needs to be done next.	[reply] [d/l]
Re^4: Check For Dupes In FLat DB Before Adding by chas (Priest) on Dec 15, 2005 at 13:17 UTC
Re^5: Check For Dupes In FLat DB Before Adding by pileofrogs (Priest) on Dec 15, 2005 at 18:20 UTC
Re: Check For Dupes In FLat DB Before Adding by chas (Priest) on Dec 14, 2005 at 18:23 UTC
I would likely open the file for reading and search, using a while loop and the match operator, for the 'email' record. If that exists then exit. Otherwise, seek to the beginning of the file, open a new file and write the contents of the existing file to the new file, adding the desired record at some point. How this is done would depend on the format of your file. (If it consists of lots of lines of text, then you can use the match operator to decide when to insert the desired record.) Finally, I'd rename the old file (with a .bak suffix perhaps) and the newly written file to "people.dat" or whatever. (Opening the file in append mode will only allow you to add the new record at the end, but if that's what you want then you might do that. You won't be able to add a line in the middle of the file, though.) (Update: Of course, do some torture tests with your code to make sure it works as desired before putting it into production.)	[reply]
Re: Check For Dupes In FLat DB Before Adding by pileofrogs (Priest) on Dec 14, 2005 at 19:25 UTC
Is there only one 'email' record per file and you're working on multiple files, or are there multiple records in one file that might or might not contain an email field?	[reply]
Re^2: Check For Dupes In FLat DB Before Adding by lisaw (Beadle) on Dec 14, 2005 at 20:33 UTC
Hi! There are multiple records on one file that has the email address and name. I just need to check to see if there are duplicate emails only.	[reply]
Re^3: Check For Dupes In FLat DB Before Adding by pileofrogs (Priest) on Dec 14, 2005 at 21:14 UTC
Okay, to further refine my understanding of your question: You say you're trying to eliminate duplicate emails. You also say you're trying to add email in places where there isn't one before. Those sound like two different things. When you say duplicate emails, do you mean that no 2 records should have the same email? Or do you mean that no record should have 2 emails? When you're adding an email to records that don't already have one, are you adding the same email over and over (causing duplicates) or do you have a list of emails to add? Maybe it would be easiest if you posted a chunk of your flat file DB as a 'before' and 'after', with the 'after' section manually fixed. -Pileofrogs	[reply]
Re^4: Check For Dupes In FLat DB Before Adding by lisaw (Beadle) on Dec 14, 2005 at 21:48 UTC