Re: multiple processes to access one file

Replies are listed 'Best First'.
Re^2: multiple processes to access one file by julio_514 (Acolyte) on Aug 21, 2012 at 01:34 UTC
yes they are.	[reply]
Re^3: multiple processes to access one file by BrowserUk (Patriarch) on Aug 21, 2012 at 02:32 UTC
Having 4 processes (or as I would use: threads) reading from different parts of the same file is not a problem. The problems come entirely from the hideous format design of FastQ files. As Wikipedia puts it: it can make parsing complicated due to the unfortunate choice of "@" and "+" as markers (these characters can also occur in the quality string). Dividing the file size in to 4 and having each thread/process seek into the file to a different position is simple and fast. The problem is how to then skip forward from the calculated start position to locate the start of the nearest (next) 4-line record. The format specifies that the first character of the 1st line of each record is '@'; and the first character of the 3rd line is '+'; but dumbly, these marker characters can also appear as part of the quality information in the 2nd and 4th lines -- including as the first character of each of those lines. That makes leaping into the file and find the start of a record surprisingly difficult, with the only simple alternative being to read forward in groups of 4 lines from the start of the file, which kinda defeats the purpose. Theoretically, reading forward until you have 4 consecutive lines where the 1st & 3rd start with '@' & '+' respectively, and the other two do not, should (I think) establish a datum point from which an appropriate starting point can be calculated from which each thread/process can advance. I'll get back to you once I've convinced myself of that. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re^4: multiple processes to access one file by Anonymous Monk on Aug 21, 2012 at 07:03 UTC
How about Tie::File? From what I understand, it does not store the file in memory, could be suitable? We have a list of all possible regular expression matching a sequence header for the fastqs that we are working with. Say for a set of 4 lines, you would have: `@BLABLA:1:2:3:2 ACGTACGT... + DDDEEHFG...` [download] So all headers are: `^@\S+:\d+:\d+:\d+:\d+\n` But headers are not of a fixed length, so how would you seek into a file if you need the byte start and end position?	[reply] [d/l] [select]
Re^5: multiple processes to access one file by BrowserUk (Patriarch) on Aug 21, 2012 at 07:29 UTC
Re^5: multiple processes to access one file by julio_514 (Acolyte) on Aug 21, 2012 at 07:04 UTC