Re: Optimise file line by line parsing, substitute SPLIT

Pick up a paper back book. Bend it in half in one hand and use your thumb to flick through the pages. Time how long it took.

Now, using the same book, read the first word on every page. Did it take you longer?

Reading a file and doing nothing with the lines is the metaphorical equivalent of flicking through the pages.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Optimise file line by line parsing, substitute SPLIT

Replies are listed 'Best First'.
Re^2: Optimise file line by line parsing, substitute SPLIT by vsespb (Chaplain) on Jun 03, 2013 at 13:54 UTC
That's true, but not always. Sometimes you don't need process even 1% of the data. You just read it, split, and drop 99.9% of lines where field1 <> 'abcd' (that's where SQL can help) Or you read webserver logs into memory hash (grouped by IP) and then you do scoring of new site visitors in real time (and you need access only records related to particular IP) (and SQL will be slower) Or maybe you read list of files from text file, and read filelisting from disk, and then compare it in memory (no use for SQL) Or general case - you read data from text files (1M lines) and skip all records which are not already in another memory hash (there are 10K lines)	[reply]
Re^3: Optimise file line by line parsing, substitute SPLIT by BrowserUk (Patriarch) on Jun 03, 2013 at 14:11 UTC
When you post code that does any one of those things you cite, more quickly than you can read the file and do nothing, I'll stump up for a nice polyurethane "Code Magician of the Year" award and send it to you. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: Optimise file line by line parsing, substitute SPLIT by vsespb (Chaplain) on Jun 03, 2013 at 14:32 UTC
more quickly than you can read the file and do nothing That does not have to be more quickly, just comparable time. 20%-30% is already significant. Also, concept that whole application run time (from start to finish) is significant is a bit wrong. Often startup time (when actually file is read) is significant, and after startup application is actually doing something useful (and can be blocked by disk/network IO or waiting for user action) till system reboot Do you want me paste code where split() taking more than 20% of time when I just read file to memory and skip some/most of records ?	[reply]
Re^5: Optimise file line by line parsing, substitute SPLIT by BrowserUk (Patriarch) on Jun 03, 2013 at 14:54 UTC
Re^6: Optimise file line by line parsing, substitute SPLIT by vsespb (Chaplain) on Jun 03, 2013 at 15:09 UTC
Some notes below your chosen depth have not been shown here
Re^3: Optimise file line by line parsing, substitute SPLIT by hdb (Monsignor) on Jun 03, 2013 at 14:17 UTC
In this kind of application, try to filter first e.g. `next unless /^abcd/` and only split if you need the fields separated.	[reply] [d/l]
Re^4: Optimise file line by line parsing, substitute SPLIT by vsespb (Chaplain) on Jun 03, 2013 at 14:34 UTC
Yes, sure, if I filter by first field. Otherwise split+regexp will be slower than just regexp or than split+comparsion.	[reply]


go ahead... be a heretic
	PerlMonks