mgamar has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I need help, I have a file(BED file) I have the coordinates of initial and final position of the smallRNAs, I want to know if this coordinates correspond to microRNAs, I think I have to extract the sequences out of a genome of reference from the coordinates of the BED file and then compare with the sequences of the microRNAs. Could somebody tell me how manage the variables, o give me some tip to start this. Thanks you very much!!
  • Comment on extract sequences of DNA from coordinates

Replies are listed 'Best First'.
Re: extract sequences of DNA from coordinates
by ww (Archbishop) on Sep 24, 2014 at 20:53 UTC

    Suggest you visit Wiki re bioperl and then take your question to BioPerl QuickStart link

    The reasons for this recommendation?

    • Your question is essentially unintelligible to the uninitiated -- you may understand all the terms you use, but that's not necessarily true for all to whom you question is directed.
    • Conversely, those frequenting bioperl.org may have ready answers.

    If you have an notion of how you'd accomplish you goal, write out the steps of your solution; translate them to pseudocode; and read such texts as Learning Perl, the Tutorials here and perhaps some of the college-level (freely-available) introductory Perl courses.

    Then, if you run into stumbling blocks, bring the code you've written, the errors messages, sample data, and a narrative explanation of how your work fails to produce the desired (or expected) results.

    Then you have a question with which we'll be pleased to help.




    If you didn't program your executable by toggling in binary, it wasn't really programming!

    -->
Re: extract sequences of DNA from coordinates
by erix (Prior) on Sep 25, 2014 at 06:22 UTC

    Hi mgamar. How are your readers here to know what a BED file is? When asking questions don't depart from your own position but try to imagine that of the person you're talking to.

    Especially on a site like PerlMonks. Many people here are happy to help but please provide appropriate links to the context of your question; in the case of BED files, that is the UCSC site.

    Explain what a BED file normally is for (I'll assume you know), and its format. Perhaps include the first 5 lines of one of your BED files. It'd only be polite to include wikipedia links to 'jargon' words like smallRNA, microRNA.

    ( I remembered they (UCSC) explain the format quite clearly; look what I found in the FAQ. )

Re: extract sequences of DNA from coordinates
by thanos1983 (Parson) on Sep 24, 2014 at 20:41 UTC

    Hello mgamar,

    Welcome to the community, please read this link How (Not) To Ask A Question and this How do I post a question effectively?.

    By writing an effective question, it attracts interest of more people to help you solve your problem. Upload your code, show to the people what you have tried so far and where you have failed.

    Not everyone is familiar with your problem, but we are willing to read and make an effort to help with your problem. Just by writing:

    Could somebody tell me how manage the variables, o give me some tip to start this.

    It is impossible to actually know what is your problem, and how to solve it. Provide us with a sample of your file and with a sample of the desired output that you have in your mind.

    I hope my answer did not confuse you but helped you.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: extract sequences of DNA from coordinates
by roboticus (Chancellor) on Sep 25, 2014 at 12:21 UTC

    mgamar:

    Now that erix provided a pointer to some information about your problem, here's a little sketch you might be able to pull some ideas out of:

    use strict; use warnings; #### # Read BED file into an array #### my $BED_FILE = <<EOBED; chr7 127471196 127472363 Pos1 0 + 127471196 127472363 255,0, +0 chr7 127472363 127473530 Pos2 0 + 127472363 127473530 255,0, +0 chr7 127473530 127474697 Pos3 0 + 127473530 127474697 255,0, +0 chr7 127474697 127475864 Pos4 0 + 127474697 127475864 255,0, +0 chr7 127475864 127477031 Neg1 0 - 127475864 127477031 0,0,25 +5 chr7 127477031 127478198 Neg2 0 - 127477031 127478198 0,0,25 +5 chr7 127478198 127479365 Neg3 0 - 127478198 127479365 0,0,25 +5 chr7 127479365 127480532 Pos5 0 + 127479365 127480532 255,0, +0 chr7 127480532 127481699 Neg4 0 - 127480532 127481699 0,0,25 +5 EOBED my @ABED; open my $BFH, '<', \$BED_FILE; while (my $line = <$BFH>) { my ($chrom, $chrBeg, $chrEnd, $name, $score, $strand, $thickBeg, $thickEnd, $RGB, $blkCnt, $blkSzs, $blkBegs) = split /\s+/, $line; push @ABED, [ $chrBeg, $chrEnd, $name ]; } #### # Compare each coordinate in my list of things to check # against the items we stored in the BED element array #### my $COORDS_FILE = <<EOCOORDS; joe 123456789 123456900 Doesn't touch any of 'em bill 127473500 127474000 Intersects Pos2 and Pos3 judy 127480550 127480560 Inside Neg4 EOCOORDS open my $CFH, '<', \$COORDS_FILE; while (my $line = <$CFH>) { my ($item, $beg, $end) = split /\s+/, $line; print "Checking item $item [$beg, $end] against BED file\n"; for my $rBED (@ABED) { if (overlaps([$beg, $end], $rBED)) { print "Item $item overlaps $rBED->[2]\n"; } } print "\n"; } sub overlaps { my ($lBeg, $lEnd) = @{$_[0]}; my ($rBeg, $rEnd) = @{$_[1]}; return 1 if $rBeg>$lBeg and $rBeg<$lEnd; return 1 if $rEnd>$lBeg and $rEnd<$lEnd; return 1 if $lBeg>$rBeg and $lBeg<$rEnd; return 1 if $lEnd>$rBeg and $lEnd<$rEnd; return 0; }

    When I run it, I get:

    Checking item joe [123456789, 123456900] against BED file Checking item bill [127473500, 127474000] against BED file Item bill overlaps Pos2 Item bill overlaps Pos3 Checking item judy [127480550, 127480560] against BED file Item judy overlaps Neg4

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: extract sequences of DNA from coordinates
by Laurent_R (Canon) on Sep 24, 2014 at 21:54 UTC
    Sorry, I am a software engineer, not a biologist. Even though I did take some biology and genetics courses at university, it is so many years ago that I forgot most of it and, even if I did remember, what I learned at the time is most probably completely outdated (I don't think that anyone ever dreamed at the time of, one day, sequencing the human genome). In brief, I just don't understand your questions. I don't even know what a BED file is, and if I remember to a certain extent what RNA is about, I do not not have a clue on what smallRNAs and microRNAs are.

    I sincerely hope that some monk here might be able to help you, and it might happen, but i would really suggest that you explain your problem in CS terms, show some code that is failing to do what you want, explain what result you are looking for and how what you get is different, etc. If you do that, then you have a real chance of getting help. With the way you presented your problem, it would really be big luck.

    Or, maybe, as already suggested, you should go to a Bioperl forum, where you are much more likely to find people understanding you and your language.