Interesting Regex Question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Interesting Regex Question by busunsl (Vicar) on Jul 02, 2001 at 14:46 UTC
CPAN has a module to deal with BibTex: bibtex	[reply]
Re: Interesting Regex Question by jeroenes (Priest) on Jul 02, 2001 at 14:48 UTC
There is a package by Dana Jacobsen that is very useful for these kind of things. I use it very often. It's not under maintenance though. You can find it here. If you want to write it yourself, you have to write a real parser. Try Parse::RecDescent. If it's always in the newline format as you posted it, try to split on /\s,\s\n/. Cheers, Jeroen "We are not alone"(FZ)	[reply]
Re: Interesting Regex Question by tachyon (Chancellor) on Jul 02, 2001 at 15:03 UTC
The only function of the commas between the entries is to define the elements of a list - you need () not {} to define a list BTW. I assume you are presenting a list as what you do present will not compile. Once we assign to the @article array the commas between the elements are discarded as they have done their job of separating the list elements. Thus we can do what you ask like this: my @article = ( 'author = {Author, J.P }' , 'title = "A paper about things"' , 'etc...' ); # iterate over @article array removing all commas # the commas in the list assignment are gone having # been used to assign the list elements to the array # @article so we have no problem with them now. s/,//g for @article; # this is the same as foreach (@article) { $_ =~ s/,//g; } # in less idiomatic Perl where we do not take advantage of # the aliasing of array elements to $_ we have to write for my $i (0..$#article) { $article[$i] =~ s/,//g; } # to print all the elements of a list # on newlines I would normally just write: print "$_\n" for @article; # this takes advantage of the aliasing to $_ in a for loop # alternatively in long perl foreach my $stuff (@article) { print "$stuff\n"; } # to join all the edited elements will commas my $joined = join ",", @article; print $joined; [download] Update - Whoops! Having solved the wrong problem here is a solution to the bibtex code parsing such as I understand it. Thanks to kschwab. Assuming a record as you have shown this does the trick. `my $string = <<'STRING'; @article{ author = {Author, J.P } , title = "A paper, about things" , etc.. } STRING my $open = ''; my $commaless = $1 if $string =~ s/^(\s@\w+\s{)//; for (split //,$string) { if (/{\|"/ and not $open) { $open = /{/ ? '}' : '"'; $commaless .= $_; next; } $commaless .= $_ unless /,/ and $open; $open = '' if $open eq $_; } print $commaless;` [download] First we eat up the opening @article{ so we get into the guts of the problem. Then the code splits the string into characters. Now, if we find an opening delimiter we set $open to the appropriate closing delimiter '}' or '"' depending on what it is and declare it open season on commas until we find the closing delimiter. We add all the chars which are not commas to $commaless and thus remove the commas as desired. Trying to do this with a single regex would be difficult, and certainly harder to understand. hope this helps cheers tachyon s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: Re: Interesting Regex Question by kschwab (Vicar) on Jul 02, 2001 at 16:51 UTC
Lots of good info, but I think you may have confused the original node's example bibtex source as perl code. The piece: `@article{ author = {Author, J.P } , title = "A paper about things" , etc.. }` [download] is bibtex source, and not an attempt by the node creator to make a perl array....	[reply] [d/l]
Re: Re: Re: Interesting Regex Question by tachyon (Chancellor) on Jul 02, 2001 at 18:26 UTC
Thanks for the heads up. As you correctly assume, I've got no idea what bibtex is - strange source though. I've modifie the posted code (second bit that does the string) to handle this assuming you get this entire entity as a single string. Hope that's correct tachyon s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]
Re:{4} Interesting Regex Question by jeroenes (Priest) on Jul 02, 2001 at 20:03 UTC

Update - Whoops!