subset extraction from master file

tux242 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: subset extraction from master file by broquaint (Abbot) on Nov 18, 2003 at 16:01 UTC
Something like this should do it `open(my $master_fh => "master.file") or die "ack: $!"; open(my $subset_fh => "subset.file") or die "ack: $!"; my %subset = map { chomp; split ':', $_, 2 } <$subset_fh>; open(my $result_fh => ">", "result.file") or die "ack: $!"; while(<$master_fh>) { my($key, $rest) = split ':', $_, 2; print {$result_fh} join ':', $key, $subset{$key}, $rest if exists $subset{$key}; }` [download] While not suitable for large subset files, this should create a new file with the subset and its related contents from the master file in a colon seperated list. HTH `_________ broquaint`	[reply] [d/l]
Re: subset extraction from master file by davido (Cardinal) on Nov 18, 2003 at 16:03 UTC
One way to do it without file slurping: use strict; use warnings; open MASTER, "file1" or die "cant't open master file: $!\n"; open SUBSET, "file2" or die "can't open subset file: $!\n"; open OUT, ">outfile" or die "can't open output file: $!\n"; while ( my $sline = <SUBSET> ) { chomp $sline; my ( $subkey, $subvals ) = split /:/, $sline, 2; my ( $mkey, $mvals ); while ( my $mline = <MASTER> ) { chomp $mline; ( $mkey, $mvals ) = split /:/, $mline, 2; last if $mkey eq $subkey; } print OUT join( ":", $subkey, $subvals, $mvals ), "\n"; } close SUBSET; close MASTER; close OUT or die "couldn't close outfile: $!\n"; [download] The above snippet makes the assumption that the master file and the subset file are in the same order, and that every key in subset has an equal key in the master set (which is pretty much what you said). If the master file and the subset file are not in the same order, this method would need to be reworked, since synchronization is critical for its success. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l]
Re: Re: subset extraction from master file by tux242 (Acolyte) on Nov 18, 2003 at 16:30 UTC
this gives me an error David like it does not understand what is trying to be opened? Below is the error: ./23file.pm: open: not found ./23file.pm: open: not found ./23file.pm: open: not found ./23file.pm: syntax error at line 4: `)' unexpected [download] any clues? Thanks. Here is the code I am using open MASTER, "file3" or die "cant't open master file: $!\n"; open SUBSET, "file2" or die "can't open subset file: $!\n"; open OUT, ">outfile" or die "can't open output file: $!\n"; while ( my $sline = <SUBSET> ) { chomp $sline; my ( $subkey, $subvals ) = split /:/, $sline, 2; my ( $mkey, $mvals ); while ( my $mline = <MASTER> ) { chomp $mline; ( $mkey, $mvals ) = split /:/, $mline, 2; last if $mkey eq $subkey; } print OUT join ":", $subkey, $subvals, $mvals; } close SUBSET; close MASTER; close OUT or die "couldn't close outfile: $!\n"; [download]	[reply] [d/l] [select]
Re: Re: Re: subset extraction from master file by davido (Cardinal) on Nov 18, 2003 at 16:34 UTC
Sounds like you're executing the script as though it were a shell script rather than as a Perl script. Did you add a shebang line? Also, see my minor update to the "print" line. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply]
Re: subset extraction from master file by Anonymous Monk on Nov 18, 2003 at 16:14 UTC
Yet another way: `$ cat master key1 other 1 stuff key2 other 2 stuff key3 other 3 stuff $ cat subset key1 more 1 stuff key3 more 3 stuff $ cat ex.pl #!/usr/bin/perl -w use strict; my %hash; while(<>){ chomp; my($key, $rest) = split " ",$_,2; $hash{$key} = $rest and next if 1 .. eof; $hash{$key} .= " $rest" if exists $hash{$key}; } for (sort keys %hash){ print "$_ $hash{$_}\n"; } __END__ $ perl ex.pl subset master key1 more 1 stuff other 1 stuff key3 more 3 stuff other 3 stuff` [download]	[reply] [d/l]
Re: subset extraction from master file by duff (Parson) on Nov 18, 2003 at 18:38 UTC
You know ... there is more than one way to do it ...including not using perl! `join -t: master other > newmaster` [download] Type "man join" at your unix shell prompt.:-) Also, I interpretted i need all of the files from the subset matched up with the same keys in the master file and the rest of the master listing concatenated onto the end of the subset keys listing to mean that you wanted the lines from the master that weren't in the other to be appended to the end of the master but your example didn't show this at all. If you wanted the unpaired lines to show up in the output (not at the end), you could do this: `join -a1 -t: master other > newmaster` [download] Hope this helps! duff	[reply] [d/l] [select]
Re: subset extraction from master file by ptkdb (Monk) on Nov 18, 2003 at 16:11 UTC
So your desired threshold of someone doing your thinking/experimenting/debugging has not yet been crossed to your satisfaction? mentioned in response to the remark of "previous suggestions didn't work" Just tossing some pseudo code out there: `use strict ; use warnings ; my(%h, %h2, $f, $master, $newmaster, %db) ; while( <$f> ) { @h{@KEYS} = split ; $db{$h{KEYFIELD} = { %h } ; } while( <$master> ) { @h2{@KEYS} = split ; @h2{@OTHERKEYS} = @$db{$h2{KEYFIELD}}{@OTHERKEYS} if exists $db{$h2 +{KEYFIELD}} ; print $newmaster join ":", @h2{@ALLKEYS} ; } # move new master ontop of oldmaster` [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Keep It Simple, Stupid
	PerlMonks