Ah, the "One-at-a-Time" algorithm. That's quite simple in comparison to the first one on the cited page, his "new hash". That takes 12 bytes at a time with a 12-byte internal state, but it is actually faster (order of 6n instead of 9n) because it handles more in a gulp.