in reply to Re: RE on lines read from in-memory scalar is very slow
in thread RE on lines read from in-memory scalar is very slow

If this problem is only reproducible on windows + cygwin , than "OS confusion" like ...

> CRLF line endings

... seems to be a plausible theory. (I seem to remember a similar discrepancy discussed here not too long ago... (or was it WSL?))

Now I'd turn the test around.

I'd try to autogenerate the string with various forms of line endings and try to see what happens.

Those variants could also be written to disk and tested again.

Unfortunately I have no windows at disposal right now.

My bet is that a string generated with plain "\n" behaves normal.

If not we would at least have taken the filesystem out of the equation.

And with generated text we could test if the time consumption is linear to the the size.

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

  • Comment on Re^2: RE on lines read from in-memory scalar is very slow (cygwin and \n)
  • Download Code

Replies are listed 'Best First'.
Re^3: RE on lines read from in-memory scalar is very slow (cygwin and \n)
by kcott (Archbishop) on Jan 24, 2024 at 16:04 UTC
    "My bet is that a string generated with plain "\n" behaves normal."

    I went back and did an additional check on the data I used for my earlier tests. Each record in that test ends with just LF. For a specific test, I created two tiny files: check_lf which just contains "qwerty<LF>"; and check_crlf in which I forced a CRLF ending, its contents are "qwerty<CR><LF>".

    $ for i in test_data test_data_Q check_lf check_crlf; do head -1 $i | +cat -vet; done XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX$ QueryXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX$ qwerty$ qwerty^M$

    For anyone unfamiliar with "cat -vet", a newline is shown as a "$" and a carriage return is shown as "^M".

    Edit: I changed several instances of NL to LF. This was for consistency with other parts of my post as well as LF being a generally recognised de facto standard (CRLF is more usual than CRNL).

    — Ken

      Sorry if I'm too tired to see the conclusion of these tests. (?)

      Are you saying that you reran them with different line endings, but the results were the same?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery