mesana has asked for the wisdom of the Perl Monks concerning the following question:

Hi!! this is my input file:
gene chrom start end NRAS chr1 115247085 115250671 NRAS chr1 115250775 115250813 NRAS chr1 115251152 115251275 NRAS chr1 115252190 115252349 NRAS chr1 115256421 115256599 NRAS chr1 115258671 115258798 NRAS chr1 115259279 115259515 TNNT2 chr1 201328142 201328383 TNNT2 chr1 201328751 201328791 TNNT2 chr1 201330407 201330497 TNNT2 chr1 201331041 201331155 TNNT2 chr1 201331238 201331243 TNNT2 chr1 201332423 201332534 TNNT2 chr1 201333426 201333503 TNNT2 chr1 201334319 201334435 TNNT2 chr1 201334738 201334798 TNNT2 chr1 201335966 201335999
I need to generate a file with intervals of 200, considering that if the interval does not finish exactly in the second column, it should end in the second row. The same for each genes(name in the first column). for example...
NRAS chr1 115247085 115247285 NRAS chr1 115247287 115247487 NRAS chr1 115247489 115247689... ... NRAS chr1 115250400 115250600 (for example... NRAS chr1 115250602 115250906 NRAS chr1 115250908 115251108

anyone has any idea?

thank u a lot

Replies are listed 'Best First'.
Re: continuuos intervals
by perldigious (Priest) on Aug 31, 2016 at 13:23 UTC

    Hi mesana,

    Welcome to the monastery.

    What you are asking would probably be pretty trivial and simple to implement in Perl, but the desired output pattern you describe isn't quite clear to me.

    Your output begins at the first input "start" value, 115247085, and then adds 200 to get your first output "end" value, 115247285. But then for some reason you add 2 to this to get your output's next "start" value, 115247287. This pattern continues, but 115250400 less 115247489 is not divisible by 202, so the pattern seemingly broke down there. You pick the pattern up again, but at 115250602 inexplicably add 304 to get 115250906. Then your original pattern of adding 2 and then adding 200 continues. I'm also not clear on why you would stop at the value 115251108, but perhaps that was just an arbitrary stopping point. Do you see why we might be confused? :-)

    If you explain that a little more clearly the monks here can probably help you out.

    Update: Fixed some number errors of my own.

    I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
    I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
Re: continuuos intervals
by GotToBTru (Prior) on Aug 31, 2016 at 12:13 UTC

    In your example output, I get how the first interval is 115247085 to 115247285 .. but I wonder what happened to 115247286, and I can't figure out how you decided where any of the following intervals started.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: continuuos intervals
by Anonymous Monk on Aug 31, 2016 at 11:23 UTC
    Sure, first read file, then match or split, then print. Simple perlintro stuff
Re: continuuos intervals
by mesana (Initiate) on Aug 31, 2016 at 14:06 UTC
    thank you a lot,you are right!it is not so clear my explanation! so...the input file is composed by intervals of different lenght of piece of genes (the name is in the first column).I have to split these intervals in smaller intervals of 200, considering to "complete the gap" among the intervals. At the same time, the intervals have to be of 200 but the next start has to be 200+2. To make it easier.. this is an easier input file. The intervals I have to create is 10 input.txt gene1 chr1 10 25 gene1 chr1 46 99 gene1 chr1 103 190 gene2 chr4 50 63 gene2 chr4 90 110 output.txt gene1 chr1 10 20 gene1 chr1 22 53 gene1 chr1 55 65 gene1 chr1 67 77 gene1 chr1 79 89 gene1 chr1 91 99 gene2 chr4 50 60 gene2 chr4 62 99 gene2 chr4 101 110

      Hi mesana, just a few helpful new PerlMonks user tips.

      Please put separate <p> and <code> tags around your input and output data like you did in your original post. This helps the monks here quickly copy and paste your data if necessary.

      Please reply directly to a node you are responding to by clicking the individual "reply" button on that node. This helps keeps conversations organized and understandable.

      You can similarly "edit" any of your own nodes if you need to make fixes and updates, but it's considered courteous to include "UPDATE" or "EDIT" text when you do so explaining what you changed. If you intend to delete large portions this way, it's also nice to use <del> tags around the deleted portions instead so it appears like this. This also helps future readers in case you get responses discussing things you have deleted.

      A quick and helpful PerlMonks tutorial for other basic things can be found in How do I post a question effectively?

      I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
      I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious