Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

subset extraction from master file

by tux242 (Acolyte)
on Nov 18, 2003 at 15:48 UTC ( [id://308008]=perlquestion: print w/replies, xml ) Need Help??

tux242 has asked for the wisdom of the Perl Monks concerning the following question:

I am going to try this one last time at the risk of being flogged or drawn and quartered, only because the previous suggestions never worked - Here is what I am trying to do, I have 2 colon delimited "/etc/passwd" like files with a unique key at the beginning of each line on each one, the second file is a subset of the first or master file, i need all of the files from the subset matched up with the same keys in the master file and the rest of the master listing concatenated onto the end of the subset keys listing, see below:

File 1 - master list key1 other stuff key2 other stuff key3 other stuff ... File 2 - subset listing key1 more stuff key3 more stuff ... Desired result: File3 - final listing - subset listing with matching k +eys from master listing concatenated at end key1 more stuff other stuff key3 more stuff other stuff

Thanks in advance

Replies are listed 'Best First'.
Re: subset extraction from master file
by broquaint (Abbot) on Nov 18, 2003 at 16:01 UTC
    Something like this should do it
    open(my $master_fh => "master.file") or die "ack: $!"; open(my $subset_fh => "subset.file") or die "ack: $!"; my %subset = map { chomp; split ':', $_, 2 } <$subset_fh>; open(my $result_fh => ">", "result.file") or die "ack: $!"; while(<$master_fh>) { my($key, $rest) = split ':', $_, 2; print {$result_fh} join ':', $key, $subset{$key}, $rest if exists $subset{$key}; }
    While not suitable for large subset files, this should create a new file with the subset and its related contents from the master file in a colon seperated list.
    HTH

    _________
    broquaint

Re: subset extraction from master file
by davido (Cardinal) on Nov 18, 2003 at 16:03 UTC
    One way to do it without file slurping:

    use strict; use warnings; open MASTER, "file1" or die "cant't open master file: $!\n"; open SUBSET, "file2" or die "can't open subset file: $!\n"; open OUT, ">outfile" or die "can't open output file: $!\n"; while ( my $sline = <SUBSET> ) { chomp $sline; my ( $subkey, $subvals ) = split /:/, $sline, 2; my ( $mkey, $mvals ); while ( my $mline = <MASTER> ) { chomp $mline; ( $mkey, $mvals ) = split /:/, $mline, 2; last if $mkey eq $subkey; } print OUT join( ":", $subkey, $subvals, $mvals ), "\n"; } close SUBSET; close MASTER; close OUT or die "couldn't close outfile: $!\n";

    The above snippet makes the assumption that the master file and the subset file are in the same order, and that every key in subset has an equal key in the master set (which is pretty much what you said). If the master file and the subset file are not in the same order, this method would need to be reworked, since synchronization is critical for its success.


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein

      this gives me an error David like it does not understand what is trying to be opened? Below is the error:

      ./23file.pm: open: not found ./23file.pm: open: not found ./23file.pm: open: not found ./23file.pm: syntax error at line 4: `)' unexpected
      any clues? Thanks.

      Here is the code I am using

      open MASTER, "file3" or die "cant't open master file: $!\n"; open SUBSET, "file2" or die "can't open subset file: $!\n"; open OUT, ">outfile" or die "can't open output file: $!\n"; while ( my $sline = <SUBSET> ) { chomp $sline; my ( $subkey, $subvals ) = split /:/, $sline, 2; my ( $mkey, $mvals ); while ( my $mline = <MASTER> ) { chomp $mline; ( $mkey, $mvals ) = split /:/, $mline, 2; last if $mkey eq $subkey; } print OUT join ":", $subkey, $subvals, $mvals; } close SUBSET; close MASTER; close OUT or die "couldn't close outfile: $!\n";
        Sounds like you're executing the script as though it were a shell script rather than as a Perl script. Did you add a shebang line?

        Also, see my minor update to the "print" line.


        Dave


        "If I had my life to live over again, I'd be a plumber." -- Albert Einstein
Re: subset extraction from master file
by Anonymous Monk on Nov 18, 2003 at 16:14 UTC

    Yet another way:

    $ cat master key1 other 1 stuff key2 other 2 stuff key3 other 3 stuff $ cat subset key1 more 1 stuff key3 more 3 stuff $ cat ex.pl #!/usr/bin/perl -w use strict; my %hash; while(<>){ chomp; my($key, $rest) = split " ",$_,2; $hash{$key} = $rest and next if 1 .. eof; $hash{$key} .= " $rest" if exists $hash{$key}; } for (sort keys %hash){ print "$_ $hash{$_}\n"; } __END__ $ perl ex.pl subset master key1 more 1 stuff other 1 stuff key3 more 3 stuff other 3 stuff
Re: subset extraction from master file
by duff (Parson) on Nov 18, 2003 at 18:38 UTC
    You know ... there is more than one way to do it ...including not using perl!

    join -t: master other > newmaster

    Type "man join" at your unix shell prompt.:-)

    Also, I interpretted

    i need all of the files from the subset matched up with the same keys in the master file and the rest of the master listing concatenated onto the end of the subset keys listing
    to mean that you wanted the lines from the master that weren't in the other to be appended to the end of the master but your example didn't show this at all. If you wanted the unpaired lines to show up in the output (not at the end), you could do this:

    join -a1 -t: master other > newmaster

    Hope this helps!

Re: subset extraction from master file
by ptkdb (Monk) on Nov 18, 2003 at 16:11 UTC
    So your desired threshold of someone doing your thinking/experimenting/debugging has not yet been crossed to your satisfaction?
    mentioned in response to the remark of "previous suggestions didn't work"

    Just tossing some pseudo code out there:

    use strict ; use warnings ; my(%h, %h2, $f, $master, $newmaster, %db) ; while( <$f> ) { @h{@KEYS} = split ; $db{$h{KEYFIELD} = { %h } ; } while( <$master> ) { @h2{@KEYS} = split ; @h2{@OTHERKEYS} = @$db{$h2{KEYFIELD}}{@OTHERKEYS} if exists $db{$h2 +{KEYFIELD}} ; print $newmaster join ":", @h2{@ALLKEYS} ; } # move new master ontop of oldmaster
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://308008]
Approved by Itatsumaki
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-24 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found