Re: Sorting file with regex
by davido (Cardinal) on Jun 11, 2014 at 07:09 UTC
|
There's not enough information provided. Is last name always the first thing that follows the closing brace? How would you determine the end of the last name, and start of something that isn't relevant to the sort (such as "P.D.")? You can't assume that last names don't contain spaces or other non-alpha characters; consider "Van den Berghe"; a single last name consisting of three words.
It is easy to sort by a portion of a string, but hard to guess at what might be the boundaries of the portion of the string you wish to use in the sort criteria.
| [reply] |
Re: Sorting file with regex
by CountZero (Bishop) on Jun 11, 2014 at 10:17 UTC
|
| [reply] [d/l] [select] |
Re: Sorting file with regex
by wjw (Priest) on Jun 11, 2014 at 07:15 UTC
|
Maybe just pull out the substring between '}' and '(', stick that substring in a hash that uses the whole string as the key and the substring as a value. Kind of brute force, but if regex is not your strong suit(as it is not really mine), then this method should work for you...once that is done, sorting is a matter of making an array of the values(Author), sorting the array and printing out the hash keys with the values found in the sorted array.
Hope that is helpful...
...the majority is always wrong, and always the last to know about it...
Insanity: Doing the same thing over and over again and expecting different results...
A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct
| [reply] |
Re: Sorting file with regex
by vinoth.ree (Monsignor) on Jun 11, 2014 at 09:24 UTC
|
Hi,
By assuming that the last name comes after the }, I have written the code here, which sorts based on the last name.
use strict;
use warnings;
use Data::Dumper;
my @lines;
while (<DATA>) {
chomp;
$_ =~/}(\w+)/;
push @lines, [$1,$_];
}
@lines = sort { $a->[0] cmp $b->[0] } @lines;
print Dumper \@lines;
__DATA__
'\bibitem{all99}Allison P.D.'
'\bibitem{al200}Vinoth B.D.'
'\bibitem{al200}Ana A.D.'
| [reply] [d/l] |
|
|
| [reply] |
Re: Sorting file with regex
by Anonymous Monk on Jun 11, 2014 at 08:28 UTC
|
As the other monks have said, your post doesn't include enough information. Especially the input is unclear: Does the file consist entirely of lines that look like your single example? Can you provide more examples of input?
Anyway, here's one way to do it in Perl that makes use of Tie::File and the Schwartzian transform. It may be a little naive due to lack of sample input.
use Tie::File;
tie my @file, 'Tie::File', $filename or die "tie failed";
@file = map {$$_[0]} sort { $$a[1] cmp $$b[1] }
map { /^\\bibitem\{.+?\}\s*(.+)$/; [$_, $1] } @file;
untie @file;
On the other hand, maybe you should look into a more "complete" module such as Text::BibTeX? | [reply] [d/l] |
Re: Sorting file with regex
by perlfan (Parson) on Jun 11, 2014 at 12:59 UTC
|
There are tools for managing bibtex out there. There are also Perl modules for parsing and interacting with bibtex. TIMTOWTDI, but don't reinvent the wheel. | [reply] |
Re: Sorting file with regex
by Madams (Pilgrim) on Jun 15, 2014 at 05:12 UTC
|
Please, to preserve your sanity while using (La)TeX...
Listen first to CountZero, then listen to perlfan and save your self much misery.
| [reply] |
Re: Sorting file with regex
by locked_user sundialsvc4 (Abbot) on Jun 11, 2014 at 12:39 UTC
|
I suggest that you break this process down into two parts: data-extraction (parsing), then sorting or otherwise using the data. For instance, you could create an SQLite2 database (file ...), then write a script which uses one of the many TeX-processing packages on http://search.cpan.org to extract and parse those bibiliography entries ... placing both the entire entry and the various bits that have been extracted from it into one or more relational database tables. Now, once you have written and perfected that script, you have a tool that you can actually use to do anything-you-want with what is in that file. You can easily search and sort by any field, and so on.
This “separation of concerns” should greatly simplify your efforts in the long run.
However – and let me just go ahead and say this – if this is a one-time effort involving a reasonably sized bibliography, consider using a nice graduate-assistant just to do the job by hand and, ahem, “get ’er done.™” This is going to be a somewhat-intricate program that ultimately will be doing what any person can do (and, do better) when they read. Weigh the benefits of various approaches, of which computer software is only one.
Many substantial documents which are formatted in TeX are written in XML-based “semantic” document languages such as DocBook. Evaluate whether the source document exists in any other, more parseable form. Even though such source-documents usually are not published, publishers often have access to them and can provide them to researchers. In such files, elements such as bibliography-entries (and citations) are fully expressed as an XML data structure from which TeX (or anything else) is generated.
| |