Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

When I try to use POSIX::strncmp, I get a message indicating I should use eq instead. However, what I really want to do is compare the first n characters of two strings (in my case, n == 36). How would eq efficiently do this (I will be working with literally millions of such strings)?

Thanks,

lpye

Replies are listed 'Best First'.
Re: strncmp functionality
by diotalevi (Canon) on Apr 04, 2003 at 16:07 UTC

    Benchmark these two and pick your favorite. substr($a,0,36) eq substr($b,0,36) or unpack('A36', $a) eq unpack('A36', $b). If those aren't fast enough then consider implementing your compare function in C and call out to it using Inline::C.

Re: strncmp functionality
by broquaint (Abbot) on Apr 04, 2003 at 16:30 UTC
    How would eq efficiently do this
    While I'm always a little dubious of benchmarks but here's one for your given case
    use Inline C; use POSIX qw( strncmp ); use Benchmark qw( cmpthese ); use strict; use warnings; my $n = 36; my @nums = 0 .. 99_999; my @str = map join('', 'a' .. 'z'), @nums; no warnings qw/ uninitialized void /; cmpthese(-10, { strncmp => sub { for(@nums) { strncmp_perl(@str[$_,$_ + 1], $n); } }, eq_substr => sub { for(@nums) { substr($str[$_], 0, $n) eq substr($str[$_ + 1], 0, $n) } }, eq_unpack => sub { for(@nums) { unpack("A$n", $str[$_]) eq unpack("A$n", $str[$_ + 1]) } }, }); __END__ __C__ int strncmp_perl(char* s1, char* s2, int n) { return strncmp(s1, s2, (int)n); } __output__ Benchmark: running eq_substr, eq_unpack, strncmp, each for at least 10 + CPU seconds... eq_substr: 11 wallclock secs (10.11 usr + 0.01 sys = 10.12 CPU) @ 5 +.34/s (n=54) eq_unpack: 11 wallclock secs (10.16 usr + 0.00 sys = 10.16 CPU) @ 2 +.07/s (n=21) strncmp: 11 wallclock secs (10.04 usr + 0.00 sys = 10.04 CPU) @ 5 +.88/s (n=59) Rate eq_unpack eq_substr strncmp eq_unpack 2.07/s -- -61% -65% eq_substr 5.34/s 158% -- -9% strncmp 5.88/s 184% 10% --
    So it looks like the Inline::C strncmp() is slightly faster than substr() and eq combined, but always take benchmarks with a grain of salt.
    HTH

    _________
    broquaint

      Thanks for that. I used the following comparison in a benchmark:
      index(substr($str[$_], 0, $n), $str[$_ + 1], 0) != 0
      and came up with:
      Benchmark: running eq_index, eq_substr, each for at least 10 CPU secon +ds... eq_index: 14 wallclock secs (10.08 usr + 0.00 sys = 10.08 CPU) @ 3 +.08/s (n=31) eq_substr: 16 wallclock secs (10.30 usr + 0.00 sys = 10.30 CPU) @ 3 +.01/s (n=31) Rate eq_substr eq_index eq_substr 3.01/s -- -2% eq_index 3.08/s 2% --
      Thoughts or comments?

      -Lynn

      update (broquaint): added formatting

Re: strncmp functionality
by Mr. Muskrat (Canon) on Apr 04, 2003 at 16:08 UTC

    You want to use eq for checking the equality of strings.

    perlop says:

    Binary ``=='' returns true if the left argument is numerically equal to the right argument.
    Binary ``eq'' returns true if the left argument is stringwise equal to the right argument.

    Update! The docs for Posix say this:

    strcmp() is C-specific, use eq or cmp instead, see the perlop manpage.

    2nd update diotalevi has a better answer since I didn't take into account that you only want to compare the first 36 characters.