rajaman has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

$string='C*ID1*Mac*C release for EA's D*ID1*Spore1 game*D; D*ID1*Spore 1*D is better than D*ID2*Spore 2 game*D.';

I am trying to split the above string and save the result in an array in such a way that each tagged segment is a separate array element. For example, 'C*ID1*Mac*C', 'D*ID1*Spore1 game*D', 'D*ID1*Spore 1*D', and 'D*ID2*Spore 2 game*D' should become four separate array elements.

The usual splitting on space or '*' does not work, e.g.:

@fields = split(/\s/, $string);

Please suggest.

Thank you.

Replies are listed 'Best First'.
Re: splitting a string on pre-defined tags
by tybalt89 (Monsignor) on May 22, 2018 at 20:18 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1215064 use strict; use warnings; use Data::Dumper; my $string='C*ID1*Mac*C release for EA\'s D*ID1*Spore1 game*D; D*ID1*S +pore 1*D is better than D*ID2*Spore 2 game*D.'; my @fields; push @fields, $& while $string =~ /\b([A-Z])\*.*?\*\1\b/g; print Dumper \@fields;

    Outputs:

    $VAR1 = [ 'C*ID1*Mac*C', 'D*ID1*Spore1 game*D', 'D*ID1*Spore 1*D', 'D*ID2*Spore 2 game*D' ];

      You beat me to it!

      use strict; use warnings; use 5.10.0; my $string="C*ID1*Mac*C release for EA's D*ID1*Spore1 game*D; D*ID1*Sp +ore 1*D is better than D*ID2*Spore 2 game*D."; my @fields; while ($string =~ /([A-Z])\* # A single capital letter followed by s +tar ID\d+\* # String 'ID' followed by a number and +a star .*? # Anything \*\1 # A star followed by the original singl +e capital letter /gx) { push @fields, $&; }

      Jim

Re: splitting a string on pre-defined tags
by Corion (Patriarch) on May 22, 2018 at 20:20 UTC

    Usually it's easier to match what you want to keep instead of splitting on the stuff you don't want:

    #!perl -w use strict; use Data::Dumper; my $string=q{C*ID1*Mac*C release for EA's D*ID1*Spore1 game*D; D*ID1*S +pore 1*D is better than D*ID2*Spore 2 game*D.}; my @sections; while( $string =~ m!(([CD])\*(.*?)\*\2)!g) { push @sections, $1; }; print Dumper \@sections; __END__ $VAR1 = [ 'C*ID1*Mac*C', 'D*ID1*Spore1 game*D', 'D*ID1*Spore 1*D', 'D*ID2*Spore 2 game*D' ];

    The regular expression looks for a C or D followed by a * and then slowly goes forward until it finds a * followed by whatever it matched at the start.

Re: splitting a string on pre-defined tags
by tybalt89 (Monsignor) on May 22, 2018 at 22:10 UTC

    General rule of thumb:

    If you know what you don't want, use split.

    If you know what you want, use regex.