in reply to Re: Speeds vs functionality
in thread Speeds vs functionality

The first check on every byte in a CSV stream is the check on the separation character. Every extra test on that byte will cause that extra test to be executed for every single byte in the stream.

Is it really so difficult to lift the single/multi-byte test out of the loop?

Even if it means that everything inside the loop is duplicated, that needn't imply a maintenance problem.

You could, for example, make the body of the (now two) loops an inlined function. They've been a part of the standard for 15 years and gcc had them long before that.

If you really feel the need to support compilers that don't, you could always substitute (another)of those aweful multiline macros.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Speeds vs functionality
by Tux (Canon) on Jul 29, 2014 at 13:14 UTC

    For speed it is just one single loop. The test for the separation character occurs - besides the check for every next byte - 5 extra times when looking ahead, e.g. after an escape character or a quotation character. Splitting the test out of the loop currently is difficult.

    The code is littered with multi-line macros, and I do not think they are awful at all. They work also on all old compilers, and as I am the maintainer, there is no one else that will see them. When digging through perl5 core code, one gets used to multi-line macros. It doesn't bother me.

    I will have another look at the approach salva suggested and see if I can improve speed there. Having also $paid work, that will not finish this week though.

    FWIW all feedback here warmly welcomed and appreciated, even if I might not agree on some


    Enjoy, Have FUN! H.Merijn
      For speed it is just one single loop.

      My point was that by duplicating that loop you can have the single byte case in one, and the multibyte case in the other and decide which loop to enter, thus neither case carries the burden of the repeated, single/multi bytes tests within the loop, and both cases benefit.

      The inline functions (my preference) or multiline macros (yours?) discussion was simply a way to mitigate some or all of the copy&paste code duplication.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.