snowy has asked for the wisdom of the Perl Monks concerning the following question:

Hi there

I have two files "signal" which contain a list of all possible molecules. For example

5-HT 5-HT1BR serotonin MAP-kinase MAPK MAP-kinase-kinase
and another file "orderednames" which is tab delimited containing some of the molecules and their other names
5-HTt serotonin MAP-kinase MAPK ERK

I would like to print all the molecules in "signal" which are not in the tab delimited file "orderednames"

At the moment my code prints out if the tokens from the tab file match from the signal molecules file which is sort of the oppisite of what I want.

use strict; use warnings; my $tok; my @tokens; my %hash; open INFILE, "<signal.txt" or die "can't open file"; while (<INFILE>) { chomp; $hash{$_} = undef; } close INFILE; open DATA, "<orderednames.txt" or die "can't open sentences"; while (<DATA>) { @tokens = split(/\t+/, $_); foreach $tok (@tokens) { if (exists($hash{$tok})) { print " match $tok \n" ; last; } else { print "$tok not match \n"; } } } close DATA; exit;

Thanks for your help in advance

Replies are listed 'Best First'.
Re: searching and printing what is wanted
by DamnDirtyApe (Curate) on Jul 04, 2002 at 22:41 UTC

    I realize this can be done very nicely with hashes, but everytime I hear, "in this but not in this," I think set operations, and then I think Set::Scalar.

    #! perl use strict ; use warnings ; use Set::Scalar ; my $signal = Set::Scalar->new ; open INFILE, "<signal.txt" or die "can't open file" ; my @signal_list = <INFILE> ; close INFILE ; chomp @signal_list ; $signal->insert( @signal_list ) ; my $orderednames = Set::Scalar->new ; open DATA, "<orderednames.txt" or die "can't open sentences" ; while ( <DATA> ) { my @tokens = split( /\t+/, $_ ) ; foreach my $token ( @tokens ) { $orderednames->insert( $token ) ; } } close DATA; print "In signal:\n" ; print $signal, "\n\n" ; print "In orderednames:\n" ; print $orderednames, "\n\n" ; print "In signal, but not in orderednames:\n" ; print $signal - $orderednames, "\n\n" ; exit;

    _______________
    D a m n D i r t y A p e
    Home Node | Email
Re: searching and printing what is wanted
by ehdonhon (Curate) on Jul 04, 2002 at 22:23 UTC

    You seem to be on the right track. Here's an idea:

    while ( <DATA> ) { @tokens = split(/\t+/, $_ ); foreach $tok (@tokens) { delete $hash{$tok}; } } foreach $tok ( keys( %hash ) ) { print "$tok\n"; }
Re: searching and printing what is wanted
by DamnDirtyApe (Curate) on Jul 05, 2002 at 06:52 UTC

    I've been humming and hawing about 179544 in my head all day. It just felt clumsy. So, after watching some excellent commentary in the CB, I was inspired to rewrite this using map and grep. I gotta say I am much happier with this.

    #! /usr/bin/perl use strict; use warnings; open SIG, 'signal.txt' ; my @sig = <SIG> ; close SIG ; chomp @sig ; open ORD, 'orderednames.txt' ; my @ord = <ORD> ; close ORD ; my @names = map { split } @ord ; my %names_hash ; $names_hash{$_} = 1 foreach @names ; my @results = grep { !$names_hash{$_} } @sig ; print join( ':', @results ), "\n" ;

    Update: Fixed closing of wrong file handle. Thanks theorbtwo!

    Update 2: Yep, it keeps getting shorter (Thanks again theorbtwo.)

    #! /usr/bin/perl use strict; use warnings; open SIG, 'signal.txt' ; chomp( my @sig = <SIG> ) ; close SIG ; open ORD, 'orderednames.txt' ; my %names_hash ; $names_hash{$_} = 1 foreach map { split } <ORD> ; close ORD ; my @results = grep { !$names_hash{$_} } @sig ; print join( ':', @results ), "\n" ;

    _______________
    D a m n D i r t y A p e
    Home Node | Email

      I like this one a lot better, but there's a couple of things I'd change. (Constructive criticizim, I hope.)

      You're using tempories that you don't need to. For example, you could have just done

      open ORD, 'orderedname.txt'; my @name = map {split} <ORD>; close ORD;
      (Also, you close SIG instead of ORD in that stanza. I'd use my'd filehandles and {}s to manage their scopes instead, but that's mostly my own superstition and not good style.)

      There's a little trick I just learned the other day instead of a foreach to convert an array to a hash: my %hash; @hash{@array} = (1)x@array; (Or whatever you want for the values; @array itself might work nicely. (But don't forget the ()s, otherwise you'll get one elem with a string of 1s, and a bunch with undef.)


      We are using here a powerful strategy of synthesis: wishful thinking. -- The Wizard Book

Re: searching and printing what is wanted
by Abigail-II (Bishop) on Jul 05, 2002 at 09:53 UTC
    my %hash = map {$_ => 1} map {chomp; split /\t/} `cat orderednames`; print map {"$_\n"} grep {!$hash {$_}} map {chomp; $_} `cat signal`;