using ur code..... I was able to eliminate the redundancy....... But i was not able to get the whole data.... I got a fragmented data..... (Incomplete dataset returned)
I entered the data file as follows...
Data.txtUsing ur code i obtained the following result.....>gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSWGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLVLGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27] MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49478497|ref|YP_034712.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27] MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSWGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLVLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEKTRKVFADSTVDFLREYGFDGVDLDWEY
The result shows unique sequence...... but is not complete.. I require the complete data along with the two lines after the >gi....... (even the alphabets lines)>gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG + >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 >gi|49478343|ref|YP_037789.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27] >gi|49478497|ref|YP_034712.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27]
thanksMLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI
In reply to Re^2: How to eliminate redundancy in huge dataset (1,000 - 10,000)
by Kenin
in thread How to eliminate redundancy in huge dataset (1,000 - 10,000)
by Kenin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |