Re: replace fist and last occurrences of N

What does "large file" mean in your context? Of what file sizes are you speaking?

I think this could be a solution for files less than 100 MB. Didn't test it... (with large files)

#!/usr/bin/perl
use strict;
use warnings;

{
    local $/;
    my $data = <DATA>;

    $data =~ s/((?:n+\n?)+n+)/replace($1)/gme;

    print $data, "\n";

    sub replace {
        my $s = shift;

        substr( $s, 0, 1, '^' );
        substr( $s, -1, 1, '^' );

        return $s;
    }
}
__DATA__
acacccacacacaccacacccacacaccacacccacacccacacaccaca
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cccacaccacacccacacaccacacaccacacccacacccacacacacca
cacccacacaccacacccacacacaccctaaccctaacccctaaccccta
accctaacccnnnnnnnnnnnnnnnnnnnnnnnnnnnccctaaccctaac
ccctaaccctaaccctaaccgtaaccctaaccctttaccctaacccgaac
ccctaacnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnggggg
gaccctgaccgtgaccctgaccctaacccgaacccgaacccgaaccccga
accccgaaccccgaaccccaaccccaaccccaaccccaaccctaacccct
caccctcaccctcgacccccgacccccgacccccgacccccaccccgaac
ggnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnaccctaaccctaaaaccctaaccctagcc
ctagccctagccctagccctaacccctaacccctaaccctaagccgaagc
[download]

Comment on Re: replace fist and last occurrences of N Download Code

Replies are listed 'Best First'.
Re^2: replace fist and last occurrences of N by ini2005 (Novice) on Jul 12, 2008 at 13:30 UTC
Thanks, The files are up to 1G...	[reply]
Re^3: replace fist and last occurrences of N by linuxer (Curate) on Jul 12, 2008 at 13:34 UTC
Ok, what about newlines? Are there newlines in your datafiles? Or are they just one long string consisting of character class [a-zA-Z]? update: I know that for DNA information the character class can be much smaller ;o))	[reply]
Re^4: replace fist and last occurrences of N by ini2005 (Novice) on Jul 12, 2008 at 13:38 UTC
yes, there are new lines the files looks exactly like in your example (DATA) thses are dna files of whole genomes and they are quite large.... update and, it might also be capital N not just n	[reply]
Re^2: replace fist and last occurrences of N by linuxer (Curate) on Jul 13, 2008 at 20:17 UTC
massa's post made me think about my script. I wonder why I got stuck to the idea to let an extra subroutine do the replacement... and why I forgot everything about character classes... Here's an updated version of my script without an extra subroutine. All work is done by the regex. And the DNA data is now read from a file (so if your system has enough memory this might hopefully work for you). `#!/usr/bin/perl use strict; use warnings; my $file = shift @ARGV; die "no dna file specified!\n" if !defined $file; open my $fh, '<', $file or die "$file: $!\n"; my $data = do { local $/; <$fh> }; close $fh; $data =~ s/n([nN\n]+)n/^$1^/gm; print $data, "\n"; __END__` [download]	[reply] [d/l]