Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

Here i need some help in split.

Input:
<AU>PAUL A. HUBBARD,<SUP>1</SUP> WENFENG YU,<SUP>2</SUP> HORST SCHULZ,<SUP>2</SUP> JUNG-JA P. KIM<SUP>1</SUP></AU>

output:

<contribgroup> <contributor>PAUL A. HUBBARD<SUP>1</SUP></contributor> <contributor>WENFENG YU<SUP>2</SUP></contributor> <contributor>HORST SCHULZ<SUP>2</SUP></contributor> <contributor>JUNG-JA P. KIM<SUP>1</SUP></contributor> </contribgroup>

In the input the <SUP>(.*?)</SUP> is optional.

So i tried using split , and retaining <SUP> . my @au =  map '<contributor">'.$_.'</contributor>', split /(,|(?=<\/SUP>)) /, $au;

or is there anyother easy way to do it.

Replies are listed 'Best First'.
Re: split with '|'
by ikegami (Patriarch) on Apr 22, 2005 at 05:56 UTC

    What follows is an alternative which is probably faster than the OP's due to the lack of lists and arrays:

    $au = 'PAUL A. HUBBARD,<SUP>1</SUP>' .' WENFENG YU,<SUP>2</SUP>' .' HORST SCHULZ,<SUP>2</SUP>' .' JUNG-JA P. KIM<SUP>1</SUP>'; $au =~ s{,((?:<SUP>.*?</SUP>)?)\s*} {$1</contributor>\n<contributor>}g; $au = "<contributor>$au</contributor>\n"; print($au); __END__ <contributor>PAUL A. HUBBARD<SUP>1</SUP></contributor> <contributor>WENFENG YU<SUP>2</SUP></contributor> <contributor>HORST SCHULZ<SUP>2</SUP></contributor> <contributor>JUNG-JA P. KIM<SUP>1</SUP></contributor>

    So far, I've always found that split with a capture in the pattern is at least as easy to implement as a straight regexp.

    Updated.

      Thanks for your reply, but i dont need the comma before <SUP> and also the <SUP>.*?</SUP> is optional.

        Sorry, tired. Fixed.

Re: split with '|'
by jbrugger (Parson) on Apr 22, 2005 at 06:14 UTC
    Something like this?
    my $au = '<AU>PAUL A. HUBBARD,<SUP>1</SUP> WENFENG YU,<SUP>2</SUP> HOR +ST SCHULZ,<SUP>2</SUP> JUNG-JA P. KIM<SUP>1</SUP></AU>'; my @au = map '<contributor>'.$_.'</SUP></contributor>', split(/<\/SUP +>/, $au); pop @au; foreach my $line (@au) { $line =~ s/\,|<.?AU>|(>)\s/$1/gi; print $line ."\n"; }
    "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
Re: split with '|'
by Anonymous Monk on Apr 22, 2005 at 05:58 UTC

    The above coding is not working properly, could anyone find what is the problem.

    Sorry for posting the question here, as i could not able to edit the above node.

      Here you go:
      map '<contributor>'.$_.'</contributor>', split ........

      Update: nm, it doesn't work. It's way too complicated to do this with split. I don't think it's even possible to do it with one map, one split, and no other looping commands.

      Here's a version that uses split, but only after fixing the location of the comma:

      $au =~ s{,(<SUP>.*?</SUP>)}{$1,}g; $au = join '', map "<contributor>$_</contributor>\n", split /,\s*/, $au;
        Still tired ;-)? Unmatched ( in regex; marked by <-- HERE in m/,(( <-- HERE ?:<SUP>.*?</
        "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
      Sorry for posting the question here, as i could not able to edit the above node.
      No need to be sorry. If you sign up for an account, you can edit your nodes. That way, we can attribute questions with a person. Accounts here are free (definately as in beer, mostly as in speech).

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come