in reply to Re: Regex for matching and appending
in thread Regex for matching and appending

Hi,

I would like my output to be:

$individual[0]=
-OECHEM 658567-

1 2 0000 V2000
4 \t 5 8.7 7.655 3
2 \t 55 6 4 5
M END
$$$$

$individual1=
-OECHEM 35343-

3 6 0000 V2000
1 \t 7 6 4.6 9
2 \t 45 0 3 5
M END
$$$$

i.e. having 2 seperate regex groups (or array elements) of just everything up to M END. and then add $$$$ to the next line.

I have so far tried:

my @individual; @individual = split (/\$\$\$\$/,$file);

The split works but doesnt include the token used to split on, i.e. i end up with

$individual[0] =
-OECHEM 658567-

1 2 0000 V2000
4 \t 5 8.7 7.655 3
2 \t 55 6 4 5
M END
> <compound id>
665765765
> <source>
db1

$individual[1] =
-OECHEM 35343-

3 6 0000 V2000
1 \t 7 6 4.6 9
2 \t 45 0 3 5
M END
> <compound id>
3546789
> <source>
db1

I tried a regex like /(.*)?M END/ig May have to do two steps to do grouping and then substitution.

Thanks.

Replies are listed 'Best First'.
Re^3: Regex for matching and appending
by moritz (Cardinal) on Dec 10, 2008 at 10:21 UTC
    split removes the token it matched, but since you know what it is (here: $$$$ you can simply add it again.

    Instead of slurping the whole file and then slipping, you can set the input record separator accordingly:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @records; { local $/ = '$$$$' . "\n"; while (my $record = <DATA>) { my ($stripped) = split /\nM END\n/, $record, 2; push @records, "$stripped\nM END\n$/"; } } print Dumper \@records; __DATA__ -OECHEM 658567- 1 2 0000 V2000 4 \t 5 8.7 7.655 3 2 \t 55 6 4 5 M END > <compound id> 665765765 > <source> db1 $$$$ -OECHEM 35343- 3 6 0000 V2000 1 \t 7 6 4.6 9 2 \t 45 0 3 5 M END > <compound id> 3546789 > <source> db1 $$$$
      thank you thats great - it works. Just out of interest, does __DATA__ act as an "in file" file handler? thanks
        Just out of interest, does __DATA__ act as an "in file" file handler? thanks

        Aye, DATA is a file handle that points to where __DATA__ is. See perldata for details.

        Kinda. See perldata and search for "__DATA__".