Re: Check For Dupes In FLat DB Before Adding
by northwind (Hermit) on Dec 14, 2005 at 18:13 UTC
|
So what exactly is the file format of your flat file DB? As in, is "everything on one line", or "everything in a set number of lines", or "variable records"?
In regards to reading in a file:
open(FILE_HANDLE, ">>", "$filename") || die "ACK, GASP: $error_message
+\n";
my @data = <FILE_HANDLE>;
chomp(@data);
close FILE_HANDLE;
will slurp the entire flat file DB in and remove the newline character(s) from each line (depending on the size of the file, slurping entire files may not be a Good IdeaTM). | [reply] [d/l] |
Re: Check For Dupes In FLat DB Before Adding
by northwind (Hermit) on Dec 14, 2005 at 18:39 UTC
|
Reading between the lines of the OP, lets say the flat file is variable length records with one value per line and an empty line seperates records. Then this code should work:
open(FILE_HANDLE, ">", "$filename") || die "ACK, GASP: $error_message:
+ $!\n";
my @data = <FILE_HANDLE>;
chomp(@data);
close FILE_HANDLE;
my %db_hash;
my $unique_id = 0;
foreach (@data)
{
next if(m/^\s*$/);
my %temp_hash = ();
s/^\s+//; # Remove excess whitespace
s/\s+$//;
m/^(.*?)=(.*?)$/i; # Grab key/value pairs
$temp_hash{$1} = $2;
$db_hash{$unique_id} = \%temp_hash;
$unique_id++;
}
open(NEW_FILE, "<", "$new_filename") || die ACK, GASP: $error_message:
+ $!\n";
foreach my $id (sort keys %db_hash)
{
if(not defined $db_hash{$id}{email})
{
$db_hash{$id}{email} = $some_value;
}
foreach (sort keys $db_hash{$id})
{
print NEW_FILE "$_=$db_hash{$id}{$_}\n";
}
print "\n";
}
close NEW_FILE;
(Note: Code is untested! Use at your own risk. Also, there are a huge number of assumtions made in the above code because the OP was more than a little vauge...) | [reply] [d/l] |
Re: Check For Dupes In FLat DB Before Adding
by nedals (Deacon) on Dec 15, 2005 at 02:52 UTC
|
If the file is not too big....
my $found = 0;
open(FH, "$filename") || die "Cannot open $filename for reading: $!\n"
+;
while (<FH>) {
chomp;
my (email,name) = split('=',$_);
if ($email eq $inputted_email) {
$found = 1;
break; ## exit loop;
}
}
close FH;
if (!$found) {
## Append to file
open(FH, "<<", "$filename") || die "Cannot open $filename to append:
+ $!\n";
print FH, "$inputted_email=$inputted_name\n";
close FH;
}
| [reply] [d/l] |
|
|
Do you mean "last" rather than "break"? Also, I don't think
"<<" is a valid open mode.
(Update/comment: If the record is found, why not "exit" rather than just break out of the loop? And otherwise just fall through to the append sub.)
| [reply] |
|
|
Do you mean "last" rather than "break"? Also, I don't think "<<" is a valid open mode.
You are right. Too much C recently. As to the open mode, it should read...
## Append to file
open(FH, ">>", "$filename") || die "Cannot open $filename to append: $
+!\n";
If the record is found, why not "exit" rather than just break out of the loop?
Because I don't know what needs to be done next. | [reply] [d/l] |
|
|
|
|
Re: Check For Dupes In FLat DB Before Adding
by chas (Priest) on Dec 14, 2005 at 18:23 UTC
|
I would likely open the file for *reading* and search, using a while loop and
the match operator, for the 'email' record. If that exists then exit. Otherwise, seek to the beginning of the file, open a new file and write the contents of the existing file to the new file, adding the desired record at some point. How this is done would depend on the format of your file. (If it consists of lots of lines of text, then you can use the match operator to decide when to insert the desired record.) Finally, I'd
rename the old file (with a .bak suffix perhaps) and the newly
written file to "people.dat" or whatever.
(Opening the file in append mode will only allow you to add the new record at the end, but if that's what you want then you might do that. You won't be able to add a line in the middle of the file, though.)
(Update: Of course, do some torture tests with your code to make sure it works as desired before putting it into production.) | [reply] |
Re: Check For Dupes In FLat DB Before Adding
by pileofrogs (Priest) on Dec 14, 2005 at 19:25 UTC
|
Is there only one 'email' record per file and you're working on multiple files, or are there multiple records in one file that might or might not contain an email field?
| [reply] |
|
|
Hi! There are multiple records on one file that has the email address and name. I just need to check to see if there are duplicate emails only.
| [reply] |
|
|
Okay, to further refine my understanding of your question:
You say you're trying to eliminate duplicate emails. You also say you're trying to add email in places where there isn't one before. Those sound like two different things.
When you say duplicate emails, do you mean that no 2 records should have the same email? Or do you mean that no record should have 2 emails?
When you're adding an email to records that don't already have one, are you adding the same email over and over (causing duplicates) or do you have a list of emails to add?
Maybe it would be easiest if you posted a chunk of your
flat file DB as a 'before' and 'after', with the 'after' section manually fixed.
-Pileofrogs
| [reply] |
|
|