in reply to Re: how to remove duplicate strings?
in thread how to remove duplicate strings?

hi graff, thanks for the reply. but,all i want to process now is the SEQUENCE. that is the string(a continous stretch of alphabets)which is next to the accessions line. so when i grep it and store it in a seperate array, and when i print the array (inside the loop) i am getting the output something like this
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL + +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSY +TAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNL +HGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVE +MGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEK +GKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEY +LENPKKYIP GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKN +KGIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNK +GIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKG +IIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGI +IWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGII +WGED GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAAN +KSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGG +SSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYI +PGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTG +QAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMK +CSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTL +MEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGA +AAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GD +VFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLFGSSSSSSSSSSR
when i print the array outside the loop, either its printing the last string alone, or, removes the repeating alphabets from the string and printin a result like this.
MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW MGDVEKI +FCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHT +PNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAY +W GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVE +KIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVFKRIMCSQHTEPNL
but the result which i need is:
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL + +MEYLENPKKYIP GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYT +AANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQA +PGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVFKGKRIFIMKCS +QCHTVESSSSKGGKHKTGPNLHGLFGSSSSSSSSSSR
and i need all this as 4 elements in the same array. i hope u got it.

Replies are listed 'Best First'.
Re^3: how to remove duplicate strings?
by graff (Chancellor) on Oct 30, 2006 at 06:39 UTC
    Well no, I'm not sure that I got it. What is clear is that you did not satisfy GrandFather's request in the first reply, as I hoped you would.

    So let me make another guess at what you really want. How about this:

    my @arr = (); while (<PIR>) { chomp; if( /^ENTRY/ ) { $entry = $_ } elsif ( /^(TITLE)\s+(\S.*)/ ) { $title = "$1\n\t $2" } elsif ( /^(ORGANISM)\s+(\S.*)/ ) { $org = "$1\n\t $2" } elsif ( /^ACCESSIONS/ ) { $acc = $_ } else { push @arr, $_; } } print "@arr\n";
    Now, I would assume there should be more code than that, if you really need to do things with $acc, $entry, $org and $title. If you really just want to output an array with those long strings as the elements of the array, the code could be a lot simpler.

    If there's a chance that one of those long strings might appear more than once in the data file, use those long strings as hash keys instead of array values:

    # simplified version: ignore header stuff: my %hash; while(<PIR>) { chomp; $hash{$_} = undef unless /^(?:ENTRY|TITLE|ORGANISM|ACCESSIONS)\s/; } print join " ", keys %hash, "\n";
    Using a hash like that might be a good idea for other reasons: maybe you would want the header values to be associated with each long string. (Hint: some people refer to hashes as "associative arrays".) If so, assign the header strings as the hash value.
      hey graff, thank ya, u got my problem rite.i tried writting the code the way u said, and i got the answer, but the problem which i am facing now is, i had to save each element of that array in to a new array and split the characters. to make it clear, the program is now like this.
      open (PIR,'/home/guest/sampir.txt'); my @arr = (); while (<PIR>) { chomp; if( /^ENTRY/ ) { $entry = $_ } elsif ( /^(TITLE)\s+(\S.*)/ ) { $title = "$1\n\t $2" } elsif ( /^(ORGANISM)\s+(\S.*)/ ) { $org = "$1\n\t $2" } elsif ( /^ACCESSIONS/ ) { $acc = $_ } else { push @se, $_; } }
      and i tried splitting it up like this
      foreach $r(@se) { @y=split(//,$r); }
      but am not getting the answer. how to go abt it.?