some bioinformatics

hgraf has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: some bioinformatics by anneli (Pilgrim) on Oct 01, 2011 at 05:02 UTC
Hi hgraf; That's a regular expression, and it's probably the most important bit of your code. The bits in parens are the capture groups; they're `$1`, `$2`, ... `\S+` means 1 or more non-whitespace character(s), and `\s` is 0 or more whitespace. It depends on the format of your data, but repeated `(\S+)\s(\S+)\s*` sections may be enough.	[reply] [d/l] [select]
Re: some bioinformatics by Marshall (Canon) on Oct 01, 2011 at 05:47 UTC
It sounds like what you have is a FASTA file. Click on: CPAN search for fasta and you will get pages of CPAN modules that have something to do with the FASTA format! I wrote one FASTA parser. Use it if you want. I did this to demo one particular type of parsing technique. There are many other ways to write code that parses a FASTA file. I don't recommend that you use my code because I think that there is more general purpose code in the way of a CPAN module or "library function" that you can use. The basic formula for success here is: Explain what you want to do in terms of: a) the data that you have now and b) the information that you want to produce. And then show c) your Perl code so far. I highly recommend trying to understand how to use the BioPerl modules. But in any event, you have not shown either (a) or (b) above. So it is not possible to discuss (c).	[reply]
Re: some bioinformatics by Cristoforo (Curate) on Oct 01, 2011 at 20:04 UTC
I commented on a similiar problem here and here. If the module discussed, Bio::SeqIO, doesn't solve your problem, maybe this one would, Bio::Seq.	[reply]
Re: some bioinformatics by pvaldes (Chaplain) on Oct 01, 2011 at 08:55 UTC
`($line =~ /^>(\S+)\s(.)/ ); # what is exactly doing?` this line search: a beginning of line, followed by at least one letter or digit that are captured for reuse later as ID, followed or not with one or some white spaces, followed or not by anything. Anything is captured also so you can pass this to a second variable later (Description). Whitespaces and the description block are optional i.e, this `blablabla blobloblo #OK blabla #OK b #OK blabla #NOT, note the whitespaces before` [download] Now, what I need it to do is, add more strings. I need two extra strings. `my $first_extra_string = 'I am a cow'; my $second_extra_string = 'moo too';` [download] How can add more $ to it, such as acession numbers? `my $adding_more_dollar_signs = "I am rocowfeller, I have a lot of \$\$\$\$\$!";` Maybe you should explain better what you want to do exactly? something like this? `($line =~ /^>(\S+)\s(.)\s(.)\s(.)/ ); my $daisy_cow = $3; my $shawn_the_sheep = $4;` [download]	[reply] [d/l] [select]
Re: some bioinformatics by Anonymous Monk on Oct 01, 2011 at 02:33 UTC
Hi, As a start, tell us exactly what your starting data looks like, and what bits of it you wish to extract. J.C.	[reply]
Re: some bioinformatics by Anonymous Monk on Oct 01, 2011 at 02:06 UTC
he says without looking up Have you read perlintro?	[reply]