in reply to Re: Counting matches
in thread Counting matches
Below is an example of what is in the NRT2.txt file.Gloin1
I would like to return the following in an output file for each element in the array. Since this is not an exact match I expect I need to use a regex.>Gloin1_46659 MVKLFARPLPIDP.... >Gloin1_30454 MIKLFDKPSKELS....
Gloin1: 2 occurrences in NRT2.txt
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Counting matches
by hippo (Archbishop) on May 29, 2017 at 15:08 UTC | |
Here is an SSCCE which matches your spec. Enjoy.
| [reply] [d/l] |
by Nicpetbio23! (Acolyte) on May 29, 2017 at 16:00 UTC | |
| [reply] [d/l] [select] |
by hippo (Archbishop) on May 29, 2017 at 16:04 UTC | |
There is absolutely nothing preventing you from expanding the @hant and @counts arrays. Why do you suppose I set them up as arrays in the first place even though you had only given us one data point in your sample? | [reply] [d/l] [select] |
by Nicpetbio23! (Acolyte) on May 29, 2017 at 17:07 UTC | |
by shmem (Chancellor) on May 29, 2017 at 17:19 UTC | |
| |
by poj (Abbot) on May 29, 2017 at 17:38 UTC | |
How big are the 2 files ? Try poj | [reply] [d/l] |
by Nicpetbio23! (Acolyte) on May 29, 2017 at 17:47 UTC | |
by Nicpetbio23! (Acolyte) on May 29, 2017 at 18:20 UTC | |
by poj (Abbot) on May 29, 2017 at 18:49 UTC | |
by AnomalousMonk (Archbishop) on May 29, 2017 at 19:19 UTC | |
For those monks following along at home, please note that the sample NRT2.txt Fasta file given here has none of the genomes in the sample Genomes_used_hant.txt file! You'll have to roll your own sample data. Thank you, Nicpetbio23!. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
Re^3: Counting matches
by kcott (Archbishop) on May 30, 2017 at 06:53 UTC | |
Now that you've provided an indication of your data, a much better solution (than my earlier tentative suggestion) presents itself. Assuming you have a filehandle, e.g. $matches_fh, to your file of match data (Genomes_used_Hant.txt in your example); and another, e.g. $fasta_fh, to your fasta data (NRT2.txt in your example); you can capture the wanted counts like this:
The code you've presented, in a couple of your posts in this thread, use the 3-argument form of open with lexical filehandles: this is very good. You are not, however, checking for I/O errors: this is not good at all. The easiest method is to let Perl do this checking for you with the autodie pragma; the alternative is to do this yourself, as shown in the open documentation. In the test code below, I've used Inline::Files purely for convenience. The count information is in %count: you can format and output this however you want.
Output:
— Ken | [reply] [d/l] [select] |
Re^3: Counting matches
by johngg (Canon) on May 29, 2017 at 14:34 UTC | |
Something along these lines then:
A more comprehensive example of your input file would be needed to be sure of the solution. Update: Too simplistic, ignore this as examples of both files are needed before making a stab at a solution. Cheers, JohnGG | [reply] [d/l] |