Re: Extracting field information from a GenBank file.

Hi all.

Thank you both for the help. In an effort to help others who might find this node, I have included my working code below (assuming of course the format of GenBank files does not change although given the sheer power of perl, modifications should be easy to make):

#!/usr/bin/perl -w
use strict;
use autodie;

my $dna;
open FH, '<', 'fluseq.txt';
my $data = do {local $/; <FH>};

if ($data =~ /ORIGIN(.*)/s)
{
    $dna = $1;
    $dna =~ s/\s+//g;
    $dna =~ s/\d+//g;
    $dna =~ s/\/\///;
}

print $dna;

close FH;
[download]

Comment on Re: Extracting field information from a GenBank file. Download Code

Replies are listed 'Best First'.
Re^2: Extracting field information from a GenBank file. by hdb (Monsignor) on Jul 15, 2013 at 08:43 UTC
A couple of comments on your code: If the word 'ORIGIN' appears anywhere else in the test, your code would break. If you use `$/='ORIGIN'` your file would automatically be split at all occurences of this word and you could just use the last bit. Instead of removing all kinds of unwanted characters you could tell Perl to remove everything but a, c, g, and t. Most of it is clearly a matter of taste but it feels more direct to me this way: `use strict; use autodie; open my $fh, '<', 'fluseq.txt'; my @tmp = do {local $/='ORIGIN'; <$fh>}; my $dna = pop @tmp; $dna =~ s/[^acgt]//gi; # delete all but a, c, g, and t print $dna;` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Extracting field information from a GenBank file.
by hdb (Monsignor) on Jul 15, 2013 at 08:43 UTC

A couple of comments on your code:

If the word 'ORIGIN' appears anywhere else in the test, your code would break.
If you use $/='ORIGIN' your file would automatically be split at all occurences of this word and you could just use the last bit.
Instead of removing all kinds of unwanted characters you could tell Perl to remove everything but a, c, g, and t.

use strict;
use autodie;

open my $fh, '<', 'fluseq.txt';
my @tmp = do {local $/='ORIGIN'; <$fh>};
my $dna = pop @tmp;
$dna =~ s/[^acgt]//gi; # delete all but a, c, g, and t
print $dna;
[download]

[reply]
[d/l]
[select]