postindustrial_hamst has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've got a .tex file that contains bibliography entries and I would like to sort them alphabetically but they are sth like: '\bibitem{all99}Allison P.D. (...)'
I'd like them to be sorted by the last name (in this case Allison) and I can't figure out how to do that using sort and regex's. Typical sort just sorts them by {} braces. Can you help me?

Replies are listed 'Best First'.
Re: Sorting file with regex
by davido (Cardinal) on Jun 11, 2014 at 07:09 UTC

    There's not enough information provided. Is last name always the first thing that follows the closing brace? How would you determine the end of the last name, and start of something that isn't relevant to the sort (such as "P.D.")? You can't assume that last names don't contain spaces or other non-alpha characters; consider "Van den Berghe"; a single last name consisting of three words.

    It is easy to sort by a portion of a string, but hard to guess at what might be the boundaries of the portion of the string you wish to use in the sort criteria.


    Dave

Re: Sorting file with regex
by CountZero (Bishop) on Jun 11, 2014 at 10:17 UTC
    It looks there is something really wrong with your bibliography and \bibitems. For starters, it is advised to keep your bibliography in a separate file with the .bib extension and not in a .tex-file. More important however is that the names should be first last or first middle last and then (La)TeX will take care of proper sorting in your bibliography. There should be no need to take care of sorting this yourself.

    More info you will find at LaTeX Bibliography Management

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Sorting file with regex
by wjw (Priest) on Jun 11, 2014 at 07:15 UTC

    Maybe just pull out the substring between '}' and '(', stick that substring in a hash that uses the whole string as the key and the substring as a value. Kind of brute force, but if regex is not your strong suit(as it is not really mine), then this method should work for you...once that is done, sorting is a matter of making an array of the values(Author), sorting the array and printing out the hash keys with the values found in the sorted array.

    Hope that is helpful...

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

Re: Sorting file with regex
by vinoth.ree (Monsignor) on Jun 11, 2014 at 09:24 UTC

    Hi,

    By assuming that the last name comes after the }, I have written the code here, which sorts based on the last name.

    use strict; use warnings; use Data::Dumper; my @lines; while (<DATA>) { chomp; $_ =~/}(\w+)/; push @lines, [$1,$_]; } @lines = sort { $a->[0] cmp $b->[0] } @lines; print Dumper \@lines; __DATA__ '\bibitem{all99}Allison P.D.' '\bibitem{al200}Vinoth B.D.' '\bibitem{al200}Ana A.D.'

    All is well

      Further to davido's point above, how does this handle names like Van den Berghe or O'Boyle?

Re: Sorting file with regex
by Anonymous Monk on Jun 11, 2014 at 08:28 UTC

    As the other monks have said, your post doesn't include enough information. Especially the input is unclear: Does the file consist entirely of lines that look like your single example? Can you provide more examples of input?

    Anyway, here's one way to do it in Perl that makes use of Tie::File and the Schwartzian transform. It may be a little naive due to lack of sample input.

    use Tie::File; tie my @file, 'Tie::File', $filename or die "tie failed"; @file = map {$$_[0]} sort { $$a[1] cmp $$b[1] } map { /^\\bibitem\{.+?\}\s*(.+)$/; [$_, $1] } @file; untie @file;

    On the other hand, maybe you should look into a more "complete" module such as Text::BibTeX?

Re: Sorting file with regex
by perlfan (Parson) on Jun 11, 2014 at 12:59 UTC
    There are tools for managing bibtex out there. There are also Perl modules for parsing and interacting with bibtex. TIMTOWTDI, but don't reinvent the wheel.
Re: Sorting file with regex
by Madams (Pilgrim) on Jun 15, 2014 at 05:12 UTC
    Please, to preserve your sanity while using (La)TeX...

    Listen first to CountZero, then listen to perlfan and save your self much misery.

Re: Sorting file with regex
by locked_user sundialsvc4 (Abbot) on Jun 11, 2014 at 12:39 UTC

    I suggest that you break this process down into two parts:   data-extraction (parsing), then sorting or otherwise using the data.   For instance, you could create an SQLite2 database (file ...), then write a script which uses one of the many TeX-processing packages on http://search.cpan.org to extract and parse those bibiliography entries ... placing both the entire entry and the various bits that have been extracted from it into one or more relational database tables.   Now, once you have written and perfected that script, you have a tool that you can actually use to do anything-you-want with what is in that file.   You can easily search and sort by any field, and so on.

    This “separation of concerns” should greatly simplify your efforts in the long run.

    However – and let me just go ahead and say this – if this is a one-time effort involving a reasonably sized bibliography, consider using a nice graduate-assistant just to do the job by hand and, ahem, “get ’er done.™”   This is going to be a somewhat-intricate program that ultimately will be doing what any person can do (and, do better) when they read.   Weigh the benefits of various approaches, of which computer software is only one.

    Many substantial documents which are formatted in TeX are written in XML-based “semantic” document languages such as DocBook.   Evaluate whether the source document exists in any other, more parseable form.   Even though such source-documents usually are not published, publishers often have access to them and can provide them to researchers.   In such files, elements such as bibliography-entries (and citations) are fully expressed as an XML data structure from which TeX (or anything else) is generated.