This post is a followup to the one I posted earlier. My original aim was to match a four letter word ('ABCD') in the input file rows and then report 10 succeeding characters after the matched word in the output file. I was able to do that. Then, I found that 'ABCD' is repeated twice in each row with 22 characters between them. This 22 characters should be split into two (11 characters each) and should be reported in the same column. I was able to do that also. But, now my problem is to get a heading for the file. The code below gives the heading, but it is repeated after every 2 lines (22 characters). Also, I need to reformat the file to give the frequency of each unique row in the next column (in effect reduces the number of rows ).
#!usr/bin/perl -w use strict; use warnings; my @input_files=<*.seq>; my $local_count=0; my %hash; foreach my $input_file(@input_files) { unless (open(INPUT, $input_file)) { print "Cannot open file \"$input_file\"\n\n"; exit; } my $sequence='ABCD'; my @headings=('Tags', 'Frequency'); my $headings=join("\t",@headings); while (my $line=<INPUT>) { if ($local_count==0){ my $outfile=$input_file; $outfile=~s/.seq/.tag.txt/gi; unless (open (OUTPUT, ">$outfile")) { print "Cannot open file \"$outfile\"\n\n"; exit; } } chomp $line; foreach($line=~m/$sequence/i){ if ($line=~m/$sequence(.{11})(.{11})$sequence/){ print OUTPUT "\n",$headings,"\n",$1,"\n",$2; } $local_count++; } } }
The output I am getting now is in this format below:
Tags Frequency CDDDDDDDDDD BCDDEDDDDDR Tags Frequency CDEDEDDDESE CEEESEEDESE Tags Frequency
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |