in reply to Partitioning a set of strings by regular expressions

This module Biblio::Citation::Parser::Standard uses template matching to match bibliography input. It has collected as it claims, 400 templates. You can see them and add to them I guess.

Another approach: assuming you can easily extract the title, an author and/or DOI/ISBN/etc. (I mean getting the title is not as difficult as getting the vol.) then do a search on these terms in any scientific publisher like ScienceDirect, Springer et al. And try to get back the result in a standard format like BibTex. Or, perhaps there is a huge database of scientific publications which you can fuzzy-match with your data. I am not talking manually doing the search but create an automatic tool re: web-scrapper etc.

There are also tools that they claim they can do this sort of thing: AnyStyle.io and Crossref.org they are open source and they may have a web api.

A combination of all of the above can be successful.

  • Comment on Re: Partitioning a set of strings by regular expressions