Yes, I think letting punct be single character tokens is the way to go here. That regex is probably intuitive to some, but it looks like a maintenance nightmare.
Thanks for the info on the link rule. I had completely forgotten to stop to think about what all that would match.