Re: File Manipulation - Need Advise!

Replies are listed 'Best First'.
Re^2: File Manipulation - Need Advise! by bart (Canon) on Jan 03, 2008 at 18:07 UTC
Workout of Old Gray Bear's idea: `my %data; my $header = <>; # first line while(<>) { my($key) = split /\t/; $data{$key} = $_; } # output: print $header; foreach my $key (sort keys %data) { print $data{$key}; }` [download] To use it as is, call the script with "file2.txt" as parameter on the command line, and redirect the script's STDOUT to "file1.txt". `perl thescript.pl file2.txt >file1.txt` [download]	[reply] [d/l] [select]
Re^3: File Manipulation - Need Advise! by nashkab (Novice) on Jan 03, 2008 at 18:23 UTC
file1.txt output is the following:- COMPUTER DISTRIBUTION_ID STATUS 30F-WKS `1781183799.xxx11' IC--- 30F-WKS `1781183799.xxxx1' IC--- ADM34A3F9 `1781183799.41455' IC--- [download] I want COMPUTER DISTRIBUTION_ID STATUS 30F-WKS `1781183799.xxx11' IC--- ADM34A3F9 `1781183799.41455' IC--- [download]	[reply] [d/l] [select]
Re^4: File Manipulation - Need Advise! by bart (Canon) on Jan 03, 2008 at 18:32 UTC
Like someone said in the Chatterbox: your data may not separated by tabs. Therefore, the whole record (line) would be treated as the id. Replace `split /\t/` in my code, with `split /\s+/`. If it still won't work, then use the following code at the end, to test what's in the hash: `use Data::Dumper; print Dumper \%data;` [download] and see what makes it fail.	[reply] [d/l] [select]
Re^5: File Manipulation - Need Advise! by nashkab (Novice) on Jan 03, 2008 at 18:55 UTC
Re^6: File Manipulation - Need Advise! by blue_cowdawg (Monsignor) on Jan 03, 2008 at 19:03 UTC
Re^2: File Manipulation - Need Advise! by WoodyWeaver (Monk) on Jan 04, 2008 at 22:50 UTC
> Whenever you want the unique members of a data-set, think about using a hash When you want the pairwise unique members of a serial set, think about a state variable. If you need unique across an entire set, no question that hashes are most useful. Problem, though, is that you have to then store all the keys. It is not uncommon to want to dedup when there are successive runs (think unix's 'uniq'). That's when this second class comes into play. Set a state variable, and read one line at a time. You may have to keep around the previous line or two to compute your state. You may have to do some work at the end to clean up stored lines. `my $thisKey; my $lastLine = <>; my $lastKey = ''; # first line is header, so always print while (<>) { if (/(.?)\t./) { $thisKey = $1 } else { warn "bad data: $_ had no tab\n"; } if ($thisKey ne $lastKey) { print $lastLine; } $lastLine = $_; $lastKey = $thisKey; } print $lastLine;` [download] This is a big win when you have millions and millions of entries to sift through.	[reply] [d/l]