I dont know much about file formats but the input file I am using is a FASTA file which stores DNA sequences. I am a beginner and doing this as a grad school project so this is pretty much the actual code and there isnt much else to it. The regular expression is fine as it gives the desired results when I use it on a test file with a few lines but doesnt work on larger files.
To give more context on the actual problem the 10 random characters are random barcodes flanked by a specific sequence (the abc and def in my example code). Once I get the 5 characters (i.e. dna bases) before and after this fragment I will use them to figure out which gene the random barcode inserted into. In this way I will have each gene associated with a unique barcode.
| [reply] |
| [reply] |
Yes the original file displays fine in the text editor. Also I dont really see bizarre characters, just normal characters placed one on top of another which is why I thought it maybe an issue with my pc as the output file I create on removing the newlines has a very very long single line of text which my pc maybe having problems loading.
But anyways thanks for the help. I will keep messing around and see if I can somehow get this to work because from the replies I have got the problem doesnt seem to be with the code itself but with something else.
| [reply] |