Mano_Man has asked for the wisdom of the Perl Monks concerning the following question:
better than just printing out said message on spot? This is of course regarding a much bigger toPrint message, with a much longer text. Secondly - is there a limit on home much a string like toPrint can hold? Can I spam toPrint with, let's say: 20000 lines? Is this even a good idea, or is there a better way ? Appreciate any help O, great monks. Mano.my toPrint=""; if(...) $toPrint .= "Error message1\n" if(...) $toPrint .= "Error message2\n" if ($toPrint) print $toPrint;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Performance In Perl
by Discipulus (Canon) on Mar 15, 2017 at 08:49 UTC | |
You'll receive for sure detailed answers but my feeling is that the maximum length of a string is based on your RAM: see Maximum string length Then, in terms of performances i suspect it is much more convenient to print out such messages as soon as possible, without accumulating them into a variable. Infact $toPrint will grow up consuming much RAM at any append you make to it. In the other scenario, you do not even need a variable: you just print out a string and it is gone. With modern hardware i suspect 20k lines are an affordable task, to print and to read, though. L*
There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS. | [reply] [d/l] [select] |
by vrk (Chaplain) on Mar 15, 2017 at 09:40 UTC | |
There are really two questions to answer: To answer the second question first, Perl has no built-in limits for string size. Try this simple test program using the splendid Devel::Size module:
On my machine, the output is
So even with one million 80-character lines, you're only using a couple hundred megabytes of RAM. To answer the I/O speed question, you can try benchmarking it like the program below:
If you run it, redirect STDERR or the comparison is meaningless: perl test.pl 2>/dev/null. Be warned that it may be a false comparison nonetheless. On my machine, printing one big lump is faster than printing one million small chunks. However, if you uncomment the size() calls to see how much the total string sizes differ, you'll find the first loop suddenly takes four times longer, because it's doing a lot more calculation at each loop iteration. Probably the only right way to answer your question is to try both in your program and with your input and see which one performs faster. It really depends on how much you can afford to keep in memory and how much computation you need to do for each individual chunk to print. | [reply] [d/l] [select] |
by Mano_Man (Acolyte) on Mar 15, 2017 at 09:45 UTC | |
| [reply] |
|
Re: Performance In Perl
by Eily (Monsignor) on Mar 15, 2017 at 09:51 UTC | |
It depends on where your prints go. In any case, there's probably some buffering going on (unless your handle is hot). If your output does not go directly to a terminal, that buffering pretty much does the same thing as you are trying to do, with the question of the size of the chuncks already handled. If your output goes to a terminal, then it is flushed every time a \n is encountered and you might benefit from constructing bigger messages. This is a candidate for Benchmarking as shown by vrk. NB: I tried the following perl -E "for (1..10) { say 'Hello'; sleep(1); }" > test.txt and tail -f text.txt. It confirmed that redirecting STDOUT to a file removes the line buffering mode: even with $| = 0, the "Hello"s are displayed straightaway when printing to the console. IE: I got all the lines at once. More information on buffering here | [reply] [d/l] [select] |
|
Re: Performance In Perl
by kcott (Archbishop) on Mar 16, 2017 at 00:24 UTC | |
G'day Mano, Welcome to the Monstery. In general, calling print 20,000 times with individual records will be slower than calling it once with all records. I ran the following Benchmark several times.
Here's a representative result:
You didn't give any indication of record size (error messages can vary wildly in length): I just used 100 'X's (plus a newline). If that's a reasonable guess, I don't imagine you'd have any problem with ~2MB of data (either holding it in memory or passing it to print). As you can see, printing every record singly was slower than the other methods. A single print with concatenated records appears a little faster than using a list; however, that wasn't the case in all runs: I'd consider these too close to call. Also bear in mind that, because I've used constant values, Perl may have performed some optimisations at compile time. Consider what other code is involved as you capture records and add them to a string or use them to populate an array. There are some other factors to take into consideration. Is this a one-off run? If not, how frequently is it run? How long does the entire process take to run? Is it being run by multiple processes at the same time? Are there other users on the system? How might this affect them? Although printing records individually may be slower in the benchmark scenario I present, if done correctly, this method should have a substantially smaller memory footprint. In addition, spreading the printing tasks over the life of the process, may mean it plays more nicely with other, concurrent processes. There's a fair amount to think about. I'd recommend writing your own benchmark, using more representative data, and running it in an environment that's closer to one in which the code will actually be run. See also: "perlperf - Perl Performance and Optimization Techniques". — Ken | [reply] [d/l] [select] |
by Mano_Man (Acolyte) on Aug 28, 2017 at 14:01 UTC | |
| [reply] |
|
Re: Performance In Perl
by Ratazong (Monsignor) on Mar 15, 2017 at 11:30 UTC | |
Hi Mano_Man I assume it is mostly a design decision: However I have experienced an issue in the past with redirecting the output to a file, using the > in a windows batch file: IIRC it didn't work when printing a text that was too long - it seemed that the data was sent out by print was much faster than the writing to the file. Unfortunately, I can't find the old script where this happened... but it might a thing you want to check/consider. HTH, Rata | [reply] [d/l] [select] |
|
Re: Performance In Perl
by afoken (Chancellor) on Mar 17, 2017 at 07:25 UTC | |
Ignore the performance part of your code for now, you got plenty of good answers. But nobody has yet mentioned a much deeper problem:
This is your basic idea, right? Now imagine that this (pseudo-)script crashes seemingly randomly. Look into your error log. You see NOTHING. The script MUST NOT crash until the very end to write the error log. Unfortunately, crashes and bugs usually ignore such rules, and occour anywhere in your code. Now, let's do it right. Don't print to STDERR, use warn and die as intended. Yes, both finally write to STDERR, but you can catch both if you want (eval, $SIG{__DIE__}, $SIG{__WARN__}). But that's not the point. The point is that STDERR is unbuffered. Everything you write there ends in the log, ASAP. So:
Now, the last thing you see in the log is "Oh noes at example.pl line 20". And as it turns out in my example. the condition leading to this is a check to work around a known bug in an XS module that is triggered by a certain combination of input data to foo(). And that's why that three lines should read:
And now, buffered vs. unbuffered. Perl files are usually buffered, and if only because the libc below perl buffers. Even if you spoon-feed a file character by character, perl and/or libc will usually buffer that until either the buffer is full or perl/libc decides that it's time to flush the buffer. As long as only the buffer is written, everything should be quite fast. Everything happens in memory in userspace. When the buffer is flushed, libc issues a syscall to actually write the file. The syscall switches to kernel mode, which is expensive, and the kernel does a lot of stuff to really write the file. This takes significantly more time. An unbuffered file still uses a buffer, but it is automatically flushed after each write command. The syscall happens for every write command. This will obviously be slower than a buffered file, especially if you write character by character. But because the buffer is flushed after every write, a following crash in user space does not affect the log file. It has already been written. What you have writen here is another buffering layer that is flushed only once, at the very end of the program. Does that improve the performance? Maybe a tiny bit. It blocks RAM that could be used for better purposes. This may become significant if you append lots of data to the buffer. What does actually happen? Is writing the error log really the bottle neck? You can find that out. Devel::NYTProf is an excellent tool that shows you where your code really spends its time. That's where you really want to start optimizing. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |
by Mano_Man (Acolyte) on Aug 28, 2017 at 14:00 UTC | |
| [reply] |
|
Re: Performance In Perl
by dsheroh (Monsignor) on Mar 16, 2017 at 08:37 UTC | |
Once? Daily? Even if you run it once a second, 24/7, for a year, the performance difference is likely to be so small that the total time saved over that entire year will be less than the time it took you to type out your question. Optimizing for programmer time is generally far more effective than trying to micro-optimize for CPU time. | [reply] |
by Mano_Man (Acolyte) on Aug 28, 2017 at 13:54 UTC | |
| [reply] |
by dsheroh (Monsignor) on Aug 29, 2017 at 08:52 UTC | |
Also, while kcott's numbers above show a 981% difference (roughly 0.2ms vs 2ms for 20k lines of output, which is to say a fraction of a microsecond per line), I note that his test builds the long strings using the x operator instead of doing 20k individual concatenations. Let's see what happens if we actually build the output string line-by-line instead, as the code in your original post does it: And the results: Only a 26% difference between printing line-by-line and appending line-by-line. It seems that the primary optimization behind a single print being so much faster in kcott's test was that it built the entire output string in one operation instead of handling each line of output separately. Which is not an optimization that you would be able to apply in the case your question describes. And, again translating this back into real numbers, the difference is 14.3 million lines/second printing them individually vs. an even 18 million lines/sec if they're concatenated first. 0.7 microseconds/line vs. 0.56 microseconds/line. A savings of approximately one second per 70 million lines of output. Over four billion lines to get a one-minute difference. Whoopty-freaking-do. How many times would each of those 100 users have to process their 700M input files for the aggregate difference to add up to the time you spent reading this reply, never mind the time I spent writing it? This kind of micro-optimization is just not worth it in 99% of cases - and, for the other 1%, you'll get bigger gains by using C or a similar high-performance language instead of Perl, and then micro-optimizing the C code if you still need more speed at that point. | [reply] [d/l] [select] |
|
Re: Performance In Perl
by Anonymous Monk on Mar 15, 2017 at 15:48 UTC | |
If you want performance in your development time however ... use Perl. | [reply] |
by Mano_Man (Acolyte) on Aug 28, 2017 at 13:51 UTC | |
| [reply] |