in reply to how to remove duplicate strings?

Add strictures (use strict; use warnings;) to your code, clean up the issues that creates, then see if the problem remains.

As it stands there are a large number of variables initialised (maybe) but unused and a number of arrays are referenced, but their use is not clear. Your unique test looks fine. Your data reading looks like rubbish.

Generate a sample script using __DATA__ to provide the data and show us what you get and what you expect.


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: how to remove duplicate strings?
by heidi (Sexton) on Oct 30, 2006 at 05:42 UTC
    k, fine. to be very clear, i didnt want to confuse you all with my whole program,the ones which you said as UNUSED VALUES are not unused values, but i will be using it while printing the results later.so all i want to process now is the SEQUENCE. that is the string(a continous stretch of alphabets)which is next to the accessions line. so when i grep it and store it in a seperate array, and when i print the array (inside the loop) i am getting the output something like this
    MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLFGSSSSSSSSSSR
    when i print the array outside the loop, either its printing the last string alone, or, removes the repeating alphabets from the string and printin a result like this.
    MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW MGDVEKIFCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLRAYW GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVEKIFMCSQHTPNLARYW GDVFKRIMCSQHTEPNL
    but the result which i need is:
    MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL +MEYLENPKKYIP GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN +KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLFGSSSSSSSSSSR
    and i need all this as 4 elements in the same array. i hope u got it.