in reply to The sum of absolute differences in the counts of chars in two strings.

Now I just saw Perlbotics has posted an Inline C solution already but anyway, this is a bit simpler:
use Inline C; print char_freq_diff_sum("123456", "abcdefghijkl"),"\n"; __DATA__ __C__ #define abs(x) (x<0?-x:x) /* no need for math.h */ int char_freq_diff_sum(char *s, char *t) { int i, sum=0, freqs[256]={0}; for(i=0; i<strlen(s); ++i) freqs[s[i]]++; for(i=0; i<strlen(t); ++i) freqs[t[i]]--; for(i=0; i<256; ++i) sum += abs(freqs[i]); return sum; }
Of course it makes the same dangerous assumptions like 8-bit character set. Unicode from UCS2 should be possible using the same principle though, a couple orders of magnitude slower but still much faster than pure Perl.
  • Comment on Re: The sum of absolute differences in the counts of chars in two strings.
  • Download Code

Replies are listed 'Best First'.
Re^2: The sum of absolute differences in the counts of chars in two strings.
by JavaFan (Canon) on Nov 19, 2011 at 19:55 UTC
    Do note that in C, strlen isn't a constant time operation - it walks the string looking for a NUL byte. So, for speed issues, one either caches the result, or does away with the index altogether:
    while (*s) {freq[*s++]++;} while (*t) {freq[*t++]--;}
    Your freqs[256]={0}; is unfamiliar syntax to me. Some gcc modernism to initialize the array?
      int freqs[256]={0};

      I think that syntax is part of C99, inherited back as a "good idea" from C++. If the initializer list in curly brackets is shorter than the number of elements in the entity being initialized, all elements not enumerated in the list will initialize to zero. Ex: int freqs[256]={1}; would initialize as freqs[0] = 1, freqs[1] = 0, freqs[2] = 0, etc.

      gcc offers some additional features beyond that, but the syntax used here should be portable to any C99 implementation.


      Dave

      I'm not sure when this array initialization was standardized but since I learned C almost 20 years ago I haven't been treated to any compiler that didn't understand it.

      You've got a point about the strlen. Actually I had it in a separate function first that got a const char* string argument, in that case any optimizer worth its name should notice it's an invariant and do the caching transparently. I haven't looked at the compiled code so it may be that a non-const string would disable this optimization, although the code is simple enough that I'd think the optimizer would notice the string is never modified inside the loop and do it anyway.