in reply to Very basic question while reading a file line by line
"Very basic question ..."
Unfortunately, the question itself is too basic. You have omitted information which, if provided, would have resulted in a better answer for you.
Your input appears to be a tab-separated CSV file. Three things suggest this:
You've said nothing about the encoding of your data. I've used "UTF-8" for both input and output; you may need something else.
Your data seems very simplistic. Is what you posted truly representative of your real data?
I added an extra record to your posted input:
$ cat test_in.csv id name 123 john 34 john 567 john 11 peter 899 peter 87 helen 961 Anonymous Monk
In a normal file, with no special format defined, and to the extent that it's represented in a webpage, that last record has three fields; however, if a CSV format is specified, that last record has only two columns, just like all of the other records. Here's the CSV format revealed ('^I' represents a tab; '$' represents a newline):
$ cat -vet test_in.csv id^Iname$ 123^Ijohn$ 34^Ijohn$ 567^Ijohn$ 11^Ipeter$ 899^Ipeter$ 87^Ihelen$ 961^IAnonymous Monk$
Parsing CSV files has many gotchas. Don't try writing your own code to deal with all of these: Text::CSV has already done so; its use is highly recommended. Note that if you, or your users, have Text::CSV_XS installed, it will run faster (without requiring any change to the "use Text::CSV;" statement).
The code for performing the filtering is fairly straightforward. Here's a few notes:
[Aside: Just this week, working with some legacy code, I came across this sort of thing: "$aref->[25]". I was not happy about having to go back several screenfuls and start counting; then check for changes to that count (e.g. via unshift()).]
#!/usr/bin/env perl use strict; use warnings; use autodie; use constant NAME => 1; my $infile = 'test_in.csv'; my $outfile = 'test_out.csv'; use Text::CSV; my %seen; { my $csv = Text::CSV::->new({ binary => 1, sep_char => "\t", quote_char => undef, }); open my $fh_in, '<:encoding(UTF-8)', $infile; open my $fh_out, '>:encoding(UTF-8)', $outfile; (undef) = scalar <$fh_in>; # skip & discard header record while (my $row = $csv->getline($fh_in)) { $csv->say($fh_out, $row) unless $seen{$row->[NAME]}++; } }
Running that gives:
$ cat test_out.csv 123 john 11 peter 87 helen 961 Anonymous Monk
Revealing CSV format:
$ cat -vet test_out.csv 123^Ijohn$ 11^Ipeter$ 87^Ihelen$ 961^IAnonymous Monk$
— Ken
|
|---|