Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Optimise file line by line parsing, substitute SPLIT

by BrowserUk (Patriarch)
on Jun 03, 2013 at 14:11 UTC ( [id://1036763]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Optimise file line by line parsing, substitute SPLIT
in thread Optimise file line by line parsing, substitute SPLIT

When you post code that does any one of those things you cite, more quickly than you can read the file and do nothing, I'll stump up for a nice polyurethane "Code Magician of the Year" award and send it to you.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^3: Optimise file line by line parsing, substitute SPLIT

Replies are listed 'Best First'.
Re^4: Optimise file line by line parsing, substitute SPLIT
by vsespb (Chaplain) on Jun 03, 2013 at 14:32 UTC
    more quickly than you can read the file and do nothing

    That does not have to be more quickly, just comparable time. 20%-30% is already significant.

    Also, concept that whole application run time (from start to finish) is significant is a bit wrong.

    Often startup time (when actually file is read) is significant, and after startup application is actually doing something useful (and can be blocked by disk/network IO or waiting for user action) till system reboot

    Do you want me paste code where split() taking more than 20% of time when I just read file to memory and skip some/most of records ?

      Do you want me paste code where split() taking more {blah}

      I want you to post code -- directly comparable to the OPs -- where doing something takes longer than doing nothing.

      But, if you really want to play, show me code that filters a 2 million line x 11 TAB separated fields, file on the value of a field whose number and filter value I supply on the command line, more quickly than:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; our $FNO //= 6; our $V //= 500; my $start = time; my @filtered; while( <> ) { my @fields = split( "\t", $_ ); $fields[ $FNO ] == $V and push @filtered,$_; } printf "Took %f seconds\n", time() - $start; printf "Kept %u records\n", scalar @filtered; __END__ C:\test>1036737 -FNO=6 -V=500 < numbers.tsv Took 19.072147 seconds Kept 2005 records C:\test>1036737 -FNO=6 -V=500 < numbers.tsv Took 19.021369 seconds Kept 2005 records

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      /blockquote

        I thought your point whas that OP is actually do nothing with data (read=nothing, read+split=nothing too), and he's going to read every word on every page soon, then split time will be insignificant.

        But it seems that you mean that OP benchmarks incorrect, because he benchmarks nothing vs split.

        Otherwise I agree that split is can't be really optimized, just like I wrote above

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1036763]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-24 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found