Yes, I tried doing the seen sort join thing before the matches thereby avoiding the 2 - 1, 1 - 2 overhead, but with the data given, there was no benefit (in fact it was slower).
I did benchmark your code against mine, and on the DATA, mine was about 40% faster. Possibly on a much more extensive set of data, the difference wouldn't be so great.