This is a really hard problem if you want high accuracy (I've done this for money before). You probably want to start by trying to separate the individual citations without parsing them. Then you can iteratively come up with a library of regexes to pull them apart by starting with a few obvious and conservative patterns, examining the errors and misses, and developing more patterns. If you want to get really fancy, you can disambiguate by e.g. recognizing known authors' names as part of the author list rather than the title. Eventually you'll just have to throw up your hands and say "good enough"...