Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hello Monks
i have file with multiple patterns like eg: XYZATGC. i want to match the substring ATGC and trim the sequence say like the output should be XYZ. can any one tell me how to do this?

Replies are listed 'Best First'.
Re: pattern matching
by Eliya (Vicar) on Feb 10, 2012 at 10:50 UTC

      i am sorry for not being clear if it is
      XYZATGCAATGGCGTC.....
      YAZXSATGCAVVGBHYN...
      there are many patterns

        The sequence should be trimmed from the begining ofATGC to full sting and should give only XYZ (for example if the string XYZATGCAATGGCGTC.....

Re: pattern matching
by Xiong (Hermit) on Feb 10, 2012 at 11:49 UTC

    I'm not sure you've given enough information. We need a broader selection of example inputs and outputs.

    For instance, here's a perfectly valid script that does exactly what you demand -- no more, no less:

    #!/usr/bin/perl # xyzatgc.pl # = Copyright 2011 Xiong Changnian <xiong@cpan.org> = # = Free Software = Artistic License 2.0 = NO WARRANTY = use 5.014002; use strict; use warnings; #~ use Devel::Comments '#####', ({ -file => 'debug.log' }); #--------------------------------------------------------------------- +-------# # Pass input filename on command line: $ xyzatgc.pl infile.txt my $in_filename = shift; # Construct output filename: infile.txt => infile.out $in_filename =~ /([^.]+)\.txt/; my $out_filename = $1 . q|.out|; # Slurp in entire input file. my $indata ; { open my $in_fh, '<', $in_filename or die "Couldn't open $in_filename for reading"; local $/ = undef; # slurp $indata = <$in_fh>; close $in_fh or die "Couldn't close $in_filename"; }; # Substitute as required. $indata =~ s/XYZATGC/XYZ/g; # Write results to output file. open my $out_fh, '>', $out_filename or die "Couldn't open $out_filename for writing"; say {$out_fh} $indata; close $out_fh or die "Couldn't close $out_filename"; # Terminate. say 'Done.'; __END__

    Input:

    XYZATGC XYZATGC XYZATGC XYZATGC XYZATGC XYZ xyz foo ATGC atgc JAPHATGC XYZATGC XYZATGCXYZATGCXYZATGC

    Output:

    XYZ XYZ XYZ XYZ XYZ XYZ xyz foo ATGC atgc JAPHATGC XYZ XYZXYZXYZ

    Now I'm going to wager that's not quite what you want. Please don't try to explain in English words what you'd rather see. Instead, show us a fuller example of input and output.

    We'll see what we can do.

    I'm not the guy you kill, I'm the guy you buy. —Michael Clayton

      hi this is not wat i wanted.
      $indata =~ s/XYZATGC/XYZ/g; i dont knw what string will be there after ATGC. in a string like this XYZATGCCVFGBGVFCD... as soon as ATGC is found at a particular position it should trim the entire part (including ATGC and the rest that follows it and show only the string which is XYZ and write it to new file. am i clear now


      here are some exammples
      input

      XYZATGCACGTGFVGFCCV.......
      YZXCVFDCXZATGCXCCXZZSDD
      output
      XYZ new file1.txt
      YZXCVFDCXZ new file2.txt

        - # Substitute as required. - $indata =~ s/XYZATGC/XYZ/g; + # This is not terribly efficient. + my @outdata = split q|ATGC|, $indata; + $indata = $outdata[0];
        I'm not the guy you kill, I'm the guy you buy. —Michael Clayton
Re: pattern matching
by RichardK (Parson) on Feb 10, 2012 at 13:49 UTC

    You could try something like this

    use v5.14; use warnings; my @values = ( qw/XYZATGCACGTGFVGFCCV YZXCVFDCXZATGCXCCXZZSDD/); for my $v (@values) { my ($pre) = $v =~ m/^(.*?)ATGC/; say $pre; }

      Ya thank you every one ill try them