(I tried to make this as readable as possible - but I'm not sure suceeded)

file size is unknown

Not by us but kevyt is probably aware and can make a value judgement reconciling the size of his file with the memory resources available

Yes he knows the file size and yes he can decide if it is a problem to read the entire file in memory.
But the major point is: he probably does not (or didn't) know that the entire file is first read into memory.

processing does not start untill you are done reading

I can't think why that would be a problem here. Please could you expand on why this is bad.

Reasons why this could be(come) a problem:

most people will not realizse that you are reading it in memory first. If you really want to read it all at once then I would suggest reading it in an array first

This is a difficult topic. To what extent do you balance using using the features of Perl, or any language, against making your code accessible to beginners in the language.

Well, the thing is, this is not my code. And this is not your code either. It will be kevyt's code. He needs to fully understand it.
In the code I normally write - when not helping people - I do not really care about it and use every feature I need.

It has to depend on the type of workplace, the experience level of the workforce and the amount of staff churn.

Exactly. And this is a site that offers help (to beginners?).
So you should keep it as simple as possible or add enough explenation so that they can understand it (or atleast references to the documentation).

An experienced, stable programming team can perhaps make greater use of language features. However, if you never expose people to new techniques, they will never learn them. This exposure can be via training/mentoring or by encouraging and rewarding self-study. Personally, I am in favour of educating programmers so they can make more informed choices from a larger tool bag in order to solve problems.

I'm in favour of educating aswell. But IMHO he can't educate himself from your post. If you wanted to edcuate him then you should have (IMHO) started by explaining why he can not use tr/// to accomplish this task and then move on to a long-version of the code (as in reading in an array) and then finally moving to the shorter version.

Basically, what you did was giving him some code and hoping that he would either understand it or look it up.

The benchmark

If I run your benchmark I get completly different results which show that ByLine is the fastest (on Slackware)...

File: 689K, perl version: 5.6.1

Source file: /usr/share/dict/words. Modification: perl -pli -e '$_="$_|$_";' words. Using -7 as count.
Benchmark: running Array, ByLine, Map, each for at least 7 CPU seconds...
     Array:  8 wallclock secs ( 7.45 usr +  0.09 sys =  7.54 CPU) @  1.59/s (n=12)
    ByLine:  7 wallclock secs ( 7.05 usr +  0.07 sys =  7.12 CPU) @  2.25/s (n=16)
       Map:  7 wallclock secs ( 7.45 usr +  0.05 sys =  7.50 CPU) @  1.33/s (n=10)
         Rate    Map  Array ByLine
Map    1.33/s     --   -16%   -41%
Array  1.59/s    19%     --   -29%
ByLine 2.25/s    69%    41%     --
Benchmark: running Array, ByLine, Map, each for at least 7 CPU seconds...
     Array:  8 wallclock secs ( 7.47 usr +  0.06 sys =  7.53 CPU) @  1.59/s (n=12)
    ByLine:  7 wallclock secs ( 7.06 usr +  0.07 sys =  7.13 CPU) @  2.24/s (n=16)
       Map:  8 wallclock secs ( 7.48 usr +  0.03 sys =  7.51 CPU) @  1.33/s (n=10)
         Rate    Map  Array ByLine
Map    1.33/s     --   -16%   -41%
Array  1.59/s    20%     --   -29%
ByLine 2.24/s    69%    41%     --

Using Benchmark::Forking

Benchmark: running Array, ByLine, Map, each for at least 7 CPU seconds...
     Array:  8 wallclock secs ( 7.22 usr +  0.14 sys =  7.36 CPU) @  1.63/s (n=12)
    ByLine:  7 wallclock secs ( 7.02 usr +  0.10 sys =  7.12 CPU) @  2.25/s (n=16)
       Map:  8 wallclock secs ( 7.14 usr +  0.05 sys =  7.19 CPU) @  1.39/s (n=10)
         Rate    Map  Array ByLine
Map    1.39/s     --   -15%   -38%
Array  1.63/s    17%     --   -27%
ByLine 2.25/s    62%    38%     --
Benchmark: running Array, ByLine, Map, each for at least 7 CPU seconds...
     Array:  7 wallclock secs ( 7.22 usr +  0.11 sys =  7.33 CPU) @  1.64/s (n=12)
    ByLine:  7 wallclock secs ( 6.98 usr +  0.12 sys =  7.10 CPU) @  2.25/s (n=16)
       Map:  7 wallclock secs ( 7.10 usr +  0.09 sys =  7.19 CPU) @  1.39/s (n=10)
         Rate    Map  Array ByLine
Map    1.39/s     --   -15%   -38%
Array  1.64/s    18%     --   -27%
ByLine 2.25/s    62%    38%     --

File: 689K, perl version: 5.8.4

Source file: /usr/share/dict/words. Modification: perl -pli -e '$_="$_|$_";' words. Using -7 as count.
         Rate    Map  Array ByLine
Map    1.31/s     --    -4%   -37%
Array  1.37/s     5%     --   -34%
ByLine 2.08/s    59%    52%     --
         Rate    Map  Array ByLine
Map    1.31/s     --    -4%   -37%
Array  1.37/s     5%     --   -34%
ByLine 2.07/s    58%    51%     --

Using Benchmark::Forking

         Rate  Array    Map ByLine
Array  1.38/s     --    -2%   -33%
Map    1.40/s     2%     --   -32%
ByLine 2.07/s    50%    48%     --
         Rate  Array    Map ByLine
Array  1.38/s     --    -2%   -33%
Map    1.41/s     2%     --   -32%
ByLine 2.07/s    50%    46%     --

File: 689K, perl version: 5.8.7

Source file: /usr/share/dict/words. Modification: perl -pli -e '$_="$_|$_";' words. Using -7 as count.
         Rate    Map  Array ByLine
Map    1.27/s     --    -6%   -37%
Array  1.36/s     7%     --   -33%
ByLine 2.03/s    59%    49%     --
         Rate    Map  Array ByLine
Map    1.27/s     --    -5%   -37%
Array  1.33/s     5%     --   -34%
ByLine 2.03/s    60%    52%     --

Using Benchmark::Forking

         Rate  Array    Map ByLine
Array  1.35/s     --    -3%   -34%
Map    1.39/s     3%     --   -32%
ByLine 2.05/s    52%    47%     --
         Rate  Array    Map ByLine
Array  1.34/s     --    -3%   -34%
Map    1.38/s     3%     --   -32%
ByLine 2.03/s    52%    47%     --

File: 5.1M, perl version: 5.6.1

Source file: /usr/share/dict/dutch. Modification: perl -pli -e '$_="$_|$_";' dutch. Using -20 as count.
Benchmark: running Array, ByLine, Map, each for at least 20 CPU seconds...
     Array: 29 wallclock secs (28.83 usr +  0.17 sys = 29.00 CPU) @  0.24/s (n=7)
    ByLine: 24 wallclock secs (23.43 usr +  0.18 sys = 23.61 CPU) @  0.34/s (n=8)
       Map: 29 wallclock secs (28.97 usr +  0.32 sys = 29.29 CPU) @  0.20/s (n=6)
       s/iter    Map  Array ByLine
Map      4.88     --   -15%   -40%
Array    4.14    18%     --   -29%
ByLine   2.95    65%    40%     --
Benchmark: running Array, ByLine, Map, each for at least 20 CPU seconds...
     Array: 30 wallclock secs (28.86 usr +  0.23 sys = 29.09 CPU) @  0.24/s (n=7)
    ByLine: 24 wallclock secs (23.49 usr +  0.25 sys = 23.74 CPU) @  0.34/s (n=8)
       Map: 30 wallclock secs (29.10 usr +  0.27 sys = 29.37 CPU) @  0.20/s (n=6)
       s/iter    Map  Array ByLine
Map      4.90     --   -15%   -39%
Array    4.16    18%     --   -29%
ByLine   2.97    65%    40%     --

Using Benchmark::Forking

Benchmark: running Array, ByLine, Map, each for at least 20 CPU seconds...
     Array: 25 wallclock secs (23.82 usr +  0.43 sys = 24.25 CPU) @  0.25/s (n=6)
    ByLine: 27 wallclock secs (26.28 usr +  0.35 sys = 26.63 CPU) @  0.34/s (n=9)
       Map: 33 wallclock secs (31.89 usr +  0.42 sys = 32.31 CPU) @  0.22/s (n=7)
       s/iter    Map  Array ByLine
Map      4.62     --   -12%   -36%
Array    4.04    14%     --   -27%
ByLine   2.96    56%    37%     --
Benchmark: running Array, ByLine, Map, each for at least 20 CPU seconds...
     Array: 29 wallclock secs (28.11 usr +  0.40 sys = 28.51 CPU) @  0.25/s (n=7)
    ByLine: 27 wallclock secs (26.26 usr +  0.36 sys = 26.62 CPU) @  0.34/s (n=9)
       Map: 28 wallclock secs (26.80 usr +  0.35 sys = 27.15 CPU) @  0.22/s (n=6)
       s/iter    Map  Array ByLine
Map      4.53     --   -10%   -35%
Array    4.07    11%     --   -27%
ByLine   2.96    53%    38%     --

File: 5.1M, perl version: 5.8.4

Source file: /usr/share/dict/dutch. Modification: perl -pli -e '$_="$_|$_";' dutch. Using -20 as count.
         Rate    Map  Array ByLine
Map    1.27/s     --    -6%   -38%
Array  1.35/s     6%     --   -34%
ByLine 2.04/s    61%    51%     --
         Rate    Map  Array ByLine
Map    1.27/s     --    -6%   -37%
Array  1.35/s     6%     --   -34%
ByLine 2.03/s    60%    51%     --

Using Benchmark::Forking

       s/iter  Array    Map ByLine
Array    4.70     --    -2%   -34%
Map      4.63     2%     --   -33%
ByLine   3.10    52%    49%     --
       s/iter  Array    Map ByLine
Array    4.71     --    -2%   -34%
Map      4.64     2%     --   -33%
ByLine   3.12    51%    49%     --

File: 5.1M, perl version: 5.8.7

Source file: /usr/share/dict/dutch. Modification: perl -pli -e '$_="$_|$_";' dutch. Using -20 as count.
         Rate    Map  Array ByLine
Map    1.26/s     --    -5%   -36%
Array  1.32/s     5%     --   -32%
ByLine 1.95/s    55%    48%     --
         Rate    Map  Array ByLine
Map    1.26/s     --    -5%   -36%
Array  1.32/s     5%     --   -33%
ByLine 1.97/s    56%    49%     --

Using Benchmark::Forking

       s/iter  Array    Map ByLine
Array    4.81     --    -3%   -34%
Map      4.67     3%     --   -32%
ByLine   3.20    50%    46%     --
       s/iter  Array    Map ByLine
Array    4.82     --    -3%   -34%
Map      4.66     3%     --   -31%
ByLine   3.20    51%    46%     --

But as stated before - the difference will hardly be noticed by someone.


In reply to Re^6: Removing digits until you see | in a string by Animator
in thread Removing digits until you see | in a string by kevyt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.