Re: Sorting file with regex

I suggest that you break this process down into two parts: data-extraction (parsing), then sorting or otherwise using the data. For instance, you could create an SQLite2 database (file ...), then write a script which uses one of the many TeX-processing packages on http://search.cpan.org to extract and parse those bibiliography entries ... placing both the entire entry and the various bits that have been extracted from it into one or more relational database tables. Now, once you have written and perfected that script, you have a tool that you can actually use to do anything-you-want with what is in that file. You can easily search and sort by any field, and so on.

This “separation of concerns” should greatly simplify your efforts in the long run.

However – and let me just go ahead and say this – if this is a one-time effort involving a reasonably sized bibliography, consider using a nice graduate-assistant just to do the job by hand and, ahem, “get ’er done.™” This is going to be a somewhat-intricate program that ultimately will be doing what any person can do (and, do better) when they read. Weigh the benefits of various approaches, of which computer software is only one.

Many substantial documents which are formatted in TeX are written in XML-based “semantic” document languages such as DocBook. Evaluate whether the source document exists in any other, more parseable form. Even though such source-documents usually are not published, publishers often have access to them and can provide them to researchers. In such files, elements such as bibliography-entries (and citations) are fully expressed as an XML data structure from which TeX (or anything else) is generated.