comment on

hi all, i have written a program to seperate strings from a data file. the problem is, i am getting duplicates of the original copy of the strings. i tried removing it, but in vain.....well, here is my program,

open (PIR,'/home/sampir.txt');
while (<PIR>)
{
    if (/^ENTRY/)    {$entry = $_;}
    elsif(/^TITLE/)    {$title = (s/                   /\n\t\t /g,$_);
+}
    elsif(/^ORGANISM/){$org = (s/                   /\n\t\t /g,$_);}
    elsif(/^ACCESSIONS/){$acc = $_;}
    else
    {    
        @arr = $_;
    }
    if (defined $array2[0])
    {
        @array = split('',$arr[0]);
    }
    
}
print @array;
[download]

and this is the sample data file:

ENTRY            CCHU       #type complete
TITLE            cytochrome c [validated] - human
ORGANISM         #formal_name Homo sapiens #common_name man
ACCESSIONS       A31764; A05676; I55192; A00001
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGEDTL
+MEYLENPKKYIP
ENTRY            CCCZ       #type complete
TITLE            cytochrome c - chimpanzee (tentative sequence)
ORGANISM         #formal_name Pan troglodytes #common_name chimpanzee
ACCESSIONS       A00002
GDVEKGKKIFIMKCSQCHTVEKGSSSKHKSSSTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGED
ENTRY            CCMQR      #type complete
TITLE            cytochrome c - rhesus macaque (tentative sequence)
ORGANISM         #formal_name Macaca mulatta #common_name rhesus macaq
+ue
ACCESSIONS       A00003
GDVEKGKKIFIMKCSQCHTVEKGGSSSSKHKTGPNLHGLFGAAAAAAAARKTGQAPGYSYTAANKSSSSN
+KGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE
ENTRY            CCMKP      #type complete
TITLE            cytochrome c - spider monkey
ORGANISM         #formal_name Ateles sp. #common_name spider monkey
ACCESSIONS       A00004
GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLFGSSSSSSSSSSR
[download]

I refered perldoc and i tried using,

    my @unique = ();
    my %seen   = ();
        foreach my $elem ( @array )
        {
        next if $seen{ $elem }++;
        push @unique, $elem;
        }
[download]

But what it does is, it removes the alphabets which repeats within the string.i dont want that to happen,i want all the 4 strings(the one which is next to accession line) in an array without duplicate strings. Plz help me out. thanks.

In reply to how to remove duplicate strings? by heidi

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.