Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello perl monks, Say I have a bibtex entry such as
@article{ 
   author = {Author, J.P } ,
   title  = "A paper about things" ,
   etc..
}
I want to replace all the commas only within {}'s and/or ""'s (i.e. the commas between entries have to stay). This is so I can then split() it up by comma and get the elements to some arrays. I'll spare you my vain attempts, suffice to say i'm just not monkey enough ...

Replies are listed 'Best First'.
Re: Interesting Regex Question
by busunsl (Vicar) on Jul 02, 2001 at 14:46 UTC
    CPAN has a module to deal with BibTex: bibtex
Re: Interesting Regex Question
by jeroenes (Priest) on Jul 02, 2001 at 14:48 UTC
    There is a package by Dana Jacobsen that is very useful for these kind of things. I use it very often. It's not under maintenance though. You can find it here.

    If you want to write it yourself, you have to write a real parser. Try Parse::RecDescent. If it's always in the newline format as you posted it, try to split on /\s*,\s*\n/.

    Cheers,

    Jeroen
    "We are not alone"(FZ)

Re: Interesting Regex Question
by tachyon (Chancellor) on Jul 02, 2001 at 15:03 UTC

    The only function of the commas between the entries is to define the elements of a list - you need () not {} to define a list BTW. I assume you are presenting a list as what you do present will not compile. Once we assign to the @article array the commas between the elements are discarded as they have done their job of separating the list elements.

    Thus we can do what you ask like this:

    my @article = ( 'author = {Author, J.P }' , 'title = "A paper about things"' , 'etc...' ); # iterate over @article array removing all commas # the commas in the list assignment are gone having # been used to assign the list elements to the array # @article so we have no problem with them now. s/,//g for @article; # this is the same as foreach (@article) { $_ =~ s/,//g; } # in less idiomatic Perl where we do not take advantage of # the aliasing of array elements to $_ we have to write for my $i (0..$#article) { $article[$i] =~ s/,//g; } # to print all the elements of a list # on newlines I would normally just write: print "$_\n" for @article; # this takes advantage of the aliasing to $_ in a for loop # alternatively in long perl foreach my $stuff (@article) { print "$stuff\n"; } # to join all the edited elements will commas my $joined = join ",", @article; print $joined;

    Update - Whoops!

    Having solved the wrong problem here is a solution to the bibtex code parsing such as I understand it. Thanks to kschwab. Assuming a record as you have shown this does the trick.

    my $string = <<'STRING'; @article{ author = {Author, J.P } , title = "A paper, about things" , etc.. } STRING my $open = ''; my $commaless = $1 if $string =~ s/^(\s*@\w+\s*{)//; for (split //,$string) { if (/{|"/ and not $open) { $open = /{/ ? '}' : '"'; $commaless .= $_; next; } $commaless .= $_ unless /,/ and $open; $open = '' if $open eq $_; } print $commaless;

    First we eat up the opening @article{ so we get into the guts of the problem. Then the code splits the string into characters. Now, if we find an opening delimiter we set $open to the appropriate closing delimiter '}' or '"' depending on what it is and declare it open season on commas until we find the closing delimiter. We add all the chars which are not commas to $commaless and thus remove the commas as desired.

    Trying to do this with a single regex would be difficult, and certainly harder to understand.

    hope this helps

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Lots of good info, but I think you may have confused the original node's example bibtex source as perl code. The piece:

      @article{ author = {Author, J.P } , title = "A paper about things" , etc.. }

      is bibtex source, and not an attempt by the node creator to make a perl array....

        Thanks for the heads up. As you correctly assume, I've got no idea what bibtex is - strange source though. I've modifie the posted code (second bit that does the string) to handle this assuming you get this entire entity as a single string. Hope that's correct

        tachyon

        s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print