bluray has asked for the wisdom of the Perl Monks concerning the following question:
This post is a followup to the one I posted earlier. My original aim was to match a four letter word ('ABCD') in the input file rows and then report 10 succeeding characters after the matched word in the output file. I was able to do that. Then, I found that 'ABCD' is repeated twice in each row with 22 characters between them. This 22 characters should be split into two (11 characters each) and should be reported in the same column. I was able to do that also. But, now my problem is to get a heading for the file. The code below gives the heading, but it is repeated after every 2 lines (22 characters). Also, I need to reformat the file to give the frequency of each unique row in the next column (in effect reduces the number of rows ).
#!usr/bin/perl -w use strict; use warnings; my @input_files=<*.seq>; my $local_count=0; my %hash; foreach my $input_file(@input_files) { unless (open(INPUT, $input_file)) { print "Cannot open file \"$input_file\"\n\n"; exit; } my $sequence='ABCD'; my @headings=('Tags', 'Frequency'); my $headings=join("\t",@headings); while (my $line=<INPUT>) { if ($local_count==0){ my $outfile=$input_file; $outfile=~s/.seq/.tag.txt/gi; unless (open (OUTPUT, ">$outfile")) { print "Cannot open file \"$outfile\"\n\n"; exit; } } chomp $line; foreach($line=~m/$sequence/i){ if ($line=~m/$sequence(.{11})(.{11})$sequence/){ print OUTPUT "\n",$headings,"\n",$1,"\n",$2; } $local_count++; } } }
The output I am getting now is in this format below:
Tags Frequency CDDDDDDDDDD BCDDEDDDDDR Tags Frequency CDEDEDDDESE CEEESEEDESE Tags Frequency
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Creating a column of frequency for the unique entries of another column
by toolic (Bishop) on Oct 28, 2011 at 18:43 UTC | |
|
Re: Creating a column of frequency for the unique entries of another column
by Cristoforo (Curate) on Oct 29, 2011 at 20:28 UTC | |
by bluray (Sexton) on Oct 29, 2011 at 21:10 UTC | |
|
Re: Creating a column of frequency for the unique entries of another column
by Marshall (Canon) on Oct 29, 2011 at 11:18 UTC | |
by bluray (Sexton) on Oct 29, 2011 at 16:50 UTC | |
by aaron_baugher (Curate) on Oct 29, 2011 at 17:24 UTC |