Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have produced a piece of code that even by my standards should be easy to get working - however it doesn't. Here is the code:
@random_header = ('>143B_HUMAN (P31946) 14-3-3 protein beta/alpha (Pro +tein kinase C inhibitor protein-1) (KCIP-1) (Protein 1054', '>AAAT_HU +MAN (Q15758) Neutral amino acid transporter B(0) (ATB(0)) (Sodium-dep +endent neutral amino acid transporter type 2) (RD114/simian type D re +trovirus receptor) (Baboon M7 virus receptor)'); @all_records = ('>143B_HUMAN (P31946) 14-3-3 protein beta/alpha (Prote +in kinase C inhibitor protein-1) (KCIP-1) (Protein 1054) TMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSW RVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYL KMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYY EILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDA GEGEN', '>AAAT_HUMAN (Q15758) Neutral amino acid transporter B(0) (ATB +(0)) (Sodium-dependent neutral amino acid transporter type 2) (RD114/ +simian type D retrovirus receptor) (Baboon M7 virus receptor) MVADPPRDSKGLAAAEPTANGGLALASIEDQGAAAGGYCGSRDQVRRCLRANLLVLLTVV AVVAGVALGLGVSGAGGALALGPERLSAFVFPGELLLRLLRMIILPLVVCSLIGGAASLD PGALGRLGAWALLFFLVTTLLASALGVGLALALQPGAASAAINASVGAAGSAENAPSKEV LDSFLDLARNIFPSNLVSAAFRSYSTTYEERNITGTRVKVPVGQEVEGMNILGLVVFAIV FGVALRKLGPEGELLIRFFNSFNEATMVLVSWIMWYAPVGIMFLVAGKIVEMEDVGLLFA RLGKYILCCLLGHAIHGLLVLPLIYFLFTRKNPYRFLWGIVTPLATAFGTSSSSATLPLM MKCVEENNGVAKHISRFILPIGATVNMDGAALFQCVAAVFIAQLSQQSLDFVKIITILVT ATASSVGAAGIPAGGVLTLAIILEAVNLPVDHISLILAVDWLVDRSCTVLNVEGDALGAG LLQNYVDRTESRSTEPELIQVKSELPLDPLPVPTEEGNPLLKHYRGPAGDATVASEKESV M', '>143E_HUMAN (P42655) 14-3-3 protein epsilon (Mitochondrial import + stimulation factor L subunit) (Protein kinase C inhibitor protein-1) + (KCIP-1) (14-3-3E) MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ'); foreach $header (@random_header) { foreach $element (@all_records) { if ($element =~ /$header/) { push (@random_record, $element); } } }
The code is supposed to pick out the elements in the second array that contain the information in the first. I'm am unsure whether the problem is the code or data. I have messed around with the pattern match and have tried removing the whitespace from the data, with no luck. I am probably missing something really obvious so any help would be appreciated. Thanks

20021119 Edit by Corion: Changed title to be more descriptive

Replies are listed 'Best First'.
Re: ehh!
by nothingmuch (Priest) on Nov 18, 2002 at 20:12 UTC
    Since your headers array contains regexp meta characters, like parens, they are interpreted instead of matched. You don't really want that, and should either use quotemeta on each of the strings when constructing the header array (to make sure all characters that will do something special in the regexp are forced to be interpreted literally, using \ infront of them (you can also use /\Q$header/)), or change the regexp to
    push (@random_record, $element) if (index($element,$header) != -1);
    The index function does exact substring matching. It will suite your requirements a bit more....

    -nuffin
    zz zZ Z Z #!perl
Re: Searching one array with elements from another (was: ehh!)
by jdporter (Paladin) on Nov 18, 2002 at 21:37 UTC
    Since you want exact substring matches, index() will work, as the previous comment said. However, since it appears that the substrings to be found always occur at the beginning, another way you could go is with substr. I.e.     if ( substr( $_, 0, length($h) ) eq $h ) Putting that into a grep, you'd get something like this:
    # we put this in a sub so we can short-circuit via return. # pass headers to search for; # analyzes $_ sub matches_a_header { for my $h ( @_ ) { substr( $_, 0, length($h) ) eq $h and return 1; } 0 } my @matching_records = grep { matches_a_header( @headers_to_find ) } @records;

    jdporter
    ...porque es dificil estar guapo y blanco.