in reply to Regular expressions across multiple lines

Why doesn't it work with your large file? Does it work with a file half the size of your large file? One tenth the size?

  • Comment on Re: Regular expressions across multiple lines

Replies are listed 'Best First'.
Re^2: Regular expressions across multiple lines
by abcd (Novice) on Apr 24, 2016 at 17:05 UTC
    I tried it on a small file with only few lines and it worked perfectly. With the large file it just does nothing. I tried it with a 10mb file and still doesnt work. I tried to output the chomped text to a txt file. When I open that text file in a text editor it shows weird overlapping text (like some sort of graphical problem). The only thing I can think of is that my pc is too slow and the process hangs or something. But if this is the only way to do it I will try it at my work pc.
      Is this an ASCII file or are there other multi-byte character encodings? "Too slow" a PC is not likely, some other issue is afoot here, could be a Unicode issue? Can you hack this down into a simple: a)this works and b)this doesn't work example without huge files? The actual code can also be VERY useful.
        I dont know much about file formats but the input file I am using is a FASTA file which stores DNA sequences. I am a beginner and doing this as a grad school project so this is pretty much the actual code and there isnt much else to it. The regular expression is fine as it gives the desired results when I use it on a test file with a few lines but doesnt work on larger files.

        To give more context on the actual problem the 10 random characters are random barcodes flanked by a specific sequence (the abc and def in my example code). Once I get the 5 characters (i.e. dna bases) before and after this fragment I will use them to figure out which gene the random barcode inserted into. In this way I will have each gene associated with a unique barcode.

      Try on a 100kb file, just to see if it is just taking to long.

      At what length of file does it stop working?