Re^2: Parsing BLAST

Replies are listed 'Best First'.
Re^3: Parsing BLAST by srdst13 (Pilgrim) on Apr 25, 2006 at 01:58 UTC
Unless your sequence is quite large (and so you have many thousands of unique 20mers), I would go the hash route. It will be VERY fast if memory isn't limiting. If that isn't feasible, break your sequence into fasta sequences of size 20 base pairs and give each a unique ID. Then, blast away using tabular output. Then, you can parse to your heart's content using simple perl. Sean	[reply]
Re^3: Parsing BLAST by Anonymous Monk on Apr 25, 2006 at 00:28 UTC
I always parse blast in its -m 8 or -m 9 tabular output format. Much easier to parse.	[reply]
Re^3: Parsing BLAST by Anonymous Monk on Apr 25, 2006 at 09:03 UTC
Is this homework?	[reply]
Re^4: Parsing BLAST by cumurph (Novice) on Apr 25, 2006 at 16:53 UTC
Any suggestions on implementing the hashing methos, or web sites with code I might be able to user/modify? This a part of class project for the bioinformatics class I'm in. The rest of my classmates and I (seven of us.) are all trying to figure this out. The professor has given us some leads, but the code he gave us isn't working right. thanks! -Rob	[reply]
Re^5: Parsing BLAST by MadraghRua (Vicar) on Apr 25, 2006 at 21:23 UTC
Perhaps he gave it to you like that so you could learn how to read and debug the code? The code tutorials will really help you if you stop, breath and then take the tie to go through, understand and then use them. The Monks don't usually do your homework for you - its a point of principle that doing your homework doesn't help you learn the language. I'm going to give you some pointers on how you might tackle the problem - its up to you to do something with it. Or not. I could structure it something like this 1. Create a hash of all possible 20mers a. Start by making an array containing four strings A,T,G,C b. Count the number of array elements you have c. For each array element use shift to get it from the left side of the array d. add each of the four nucleotides to the shifted element e. add each new string back into the right side of the array with push f. repeat for each of the original elements in the array g. You should end up with 4^20 array elements - 1.0995e13 h. Use each array element as a hask key and set the value of the key to zero i. Thinking about it, the size of the array will get pretty large, so maybe start with four arrays, each containing a nucleotide. This will decrease the final size of the individual arrays by a quarter. You can beak it down even further by creating more arrays ealier, such as create individual arrays for the first 64 combinations (3mers) and then carry on from there. Play with it and see what works best. 2. Read the files in from your directory: a. Read a directory of file names b. For each file a. grab the sequence and the name c. close the file d. Process the sequence and the file before starting the next one 3. Process the file as follows: a. Make the sequence one long concatenated string b. You know you want to look at a window of 20 bases, you have to deceide how many bases you want to walk down the sequence, eg read first 20 base window, step down 5 bases, read next 20 base window and so on c. For each window, match the window to a hash key and autoincrement the value of the hash key d. If you run out of sequence, end the processing 4. Reporting on the matches a. Use the has to find keys with a value of 0, 1, 2, 3, 4, etc. b. You have the sequence name, so print the output as sequence name, patterns with 0 hits, patterns with 1 hit and so on. If you're only interested in single hits for that sequence, then only print those out. c. If you use tabs between each value, you can open it in excel as tab delimited text. http://www.perlmonks.com/?node_id=9073 This is a fairly straight forward project - really. You should be able to figure it out with the first five chapters of Merlyn's Learning Perl book, which is pretty compact. Good luck MadraghRua yet another biologist hacking perl....	[reply]
Re^6: Parsing BLAST by cumurph (Novice) on Apr 26, 2006 at 15:12 UTC
Re^5: Parsing BLAST by cumurph (Novice) on Apr 25, 2006 at 17:13 UTC
Question, is there a program that will search all my out files for just ones that have exact matches? That might just slove my problem. Thanks!	[reply]