biomonk has asked for the wisdom of the Perl Monks concerning the following question:
Hello All, I come from biology background and I started using Perl recently.I want to add more information to an output which is generated a program in order to do that I like use raw data which produces that output.I feel this can be done by Parsing and Searching files.Problem is that size of raw data or inputs files is large and by using regexp it take too long, so I need to find other way to do it, I thought this is great place to ask for help.
......Geneset=GO0035091 Size=77 ES=0.525 NES=2.913 NominalP=0. +000 Geneset=GO0030163 Size=54 ES=0.463 NES=2.248 NominalP=0. +013 Geneset=GO0007067 Size=44 ES=0.484 NES=1.975 NominalP=0. +018
GO0046800 GO0046800 CD209 CD209L CD209L1 CD209_HUMAN + CLC4M GO0032104 GO0032104 CART CARTPT CART_HUMAN GHRL GHRL +_HUMAN ......... .......... ..... ...... .......... .... .. +........ ......... .......... ..... ...... .......... .... .. +........ GO0035091 GO0035091 41_HUMAN 5NTD_HUMAN 9804 A-388D4.1 + .....
rs10904494 NP_817124 17881 rs7906287 NP_817124 39800 rs4881551 41_HUMAN 21567 rs5416721 5NTD_HUMAN 0 .................... .............. ....
Marker CHI2 rs3749375 11.7268615355335 rs10499549 10.4656064706897 rs5416721 9.85374546064131
Geneset Genes SNP/Marker ES NES NominalP GO0035091 41_HUMAN rs4881551 .... ....... .... 5NTD_HUMAN rs5416721 .... ....... .... .................... .................... ..................
This can be done easily by first getting Geneset and there corresponding genes which I did and (for rest of things I need your help) then using Marker from CHI2 file to search SNP/Marker to Gene Map file (as CHI2 file contains markers/SNP’s of our interest) and store that into a file. Now we need to search this file for genes from our Geneset inorder get SNP’s and print them into a new file along with old data.I need your guidance to do this, so please help me out.Because of my low programming skill I need little explanation than just code so that I can understand it and use or modify it in future.
You can look at my code hereopen(OUTPUT, "<C:\\Documents and Settings\\shra1\\Desktop\\prj\\schnei +der_breast_copy_num_pathway_enrichment1.txt"); @output = <OUTPUT>; close(OUTPUT); open(GENESETS, "<C:\\Documents and Settings\\shra1\\Desktop\\prj\\huma +n.gmt"); @geneSets = <GENESETS>; close(GENESETS); @NewgeneSet; @genesInSet; $i=0; while($i < 10){ @outputLineSplit = split(/\t/,$output[$i]); #print "$outputLineSplit[0] \n"; $setName = $outputLineSplit[0]; $equalLoc = index($setName, "="); $setName = substr($setName,$equalLoc+1,length($setName)); #print "$setName\n"; @genesInSet[$i]= $setName; $i++; } foreach $genesInSet(@genesInSet){ print "$genesInSet\n"; foreach $geneSets(@geneSets){ if($geneSets=~m/$genesInSet/i){ #print "$geneSets\n"; } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Searching and Parsing Biological data
by pc88mxer (Vicar) on Jul 01, 2008 at 17:00 UTC |