in reply to Re: most efficient way to scrape data and put it into a tsv file
in thread most efficient way to scrape data and put it into a tsv file

What do you mean that perl buffers I/O automatically? Also, if I have an option that puts a dot on the screen each time I successfully grab a screen and if i use the option $|=1, what am I doing? I've used $|=1 so that the dots come on the screen as I like. Am i harming performance by doing this?
  • Comment on Re^2: most efficient way to scrape data and put it into a tsv file

Replies are listed 'Best First'.
Re^3: most efficient way to scrape data and put it into a tsv file
by Aristotle (Chancellor) on Aug 30, 2004 at 10:55 UTC

    Yes, setting $| on a filehandle disables buffering. If you do that to produce dots as you go, I guess you're taking a penalty of a few nanoseconds per dot. Considering establishing a connection and pulling a page over HTTP will take anywhere from a few milliseconds to several seconds, I don't see why you should even care about about your prints.

    Performance should be your very last concern, and only if you have found that your code is actually too slow in practice. Even then, you don't start guessing at what could be made faster: you profile the code and look at where it's actually spending its time. If a script takes 3 minutes to run and you accelerate a random part of the code by 10 times, it doesn't help you much if that part of the code only took 1 second total of the runtime anyway. You're now down from 3:00 minutes to 2:59.1 minutes.

    The network I/O is going to take so much more time in your script than anything else it does that no optimization is going to matter. If you want it to go faster, make your downloads go faster. If you can't do that, don't bother changing anything.

    Makeshifts last the longest.

Re^3: most efficient way to scrape data and put it into a tsv file
by iburrell (Chaplain) on Aug 30, 2004 at 17:05 UTC
    Perl buffers I/O by default. Files are block buffered, stdout is line-buffered to a terminal and block buffered when redirected, and stderr is unbuffered. This means waits until it sees the end of a line or the buffer to do an operating system write.

    Are you writing your output to STDOUT or to a file? It sounded like the latter. Setting $| = 1 only unbuffers STDOUT. It is possible to unbuffer files, but this mainly used for logs where each line should be independent.