Hello PerlMonks,
I'm trying to remove lines from a file which have a duplicate id number but things don't seem to be working right. A sample data line is like so:
lat="37.4192" lng="-122.0574" United States ID No: 1123631397
And here is the code:
open (FILE, "<:utf8", "input.txt");
my @lines = <FILE>;
my @uniq = ();
my @purge = ();
my %seen = ();
foreach $line (@lines) {
my $id = $line =~ m/ID No: (\d+)/;
if ($seen{$id}++){
push (@uniq, $line);
$new_uniq++;
}
else {
push (@purge, $line);
}
open (MYFILE, ">:utf8", "data.txt");
print MYFILE @uniq;
open (PURGE, ">>:utf8", "purge.txt");
print PURGE @purge;
After the first run, the data in data.txt appears to have the correct result. However, I was expecting that purge.txt would contain all lines that were removed as duplicates but, it only contains one line. Subsequent runs _always_ remove the first line of data from input.txt and places it in purge.txt. Could someone please point out my error? I'm trying to teach myself Perl but for some reason I can't get a grasp on what's going wrong here.
Thanks!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.