in reply to Regex for matching and appending

So what have you tried so far? Show us some code, it's likely that it needs only minor corrections.

It would also help to see what you want the output to be, because your verbal description isn't very exact (at least not to me).

Replies are listed 'Best First'.
Re^2: Regex for matching and appending
by joec_ (Scribe) on Dec 10, 2008 at 10:03 UTC
    Hi,

    I would like my output to be:

    $individual[0]=
    -OECHEM 658567-

    1 2 0000 V2000
    4 \t 5 8.7 7.655 3
    2 \t 55 6 4 5
    M END
    $$$$

    $individual1=
    -OECHEM 35343-

    3 6 0000 V2000
    1 \t 7 6 4.6 9
    2 \t 45 0 3 5
    M END
    $$$$

    i.e. having 2 seperate regex groups (or array elements) of just everything up to M END. and then add $$$$ to the next line.

    I have so far tried:

    my @individual; @individual = split (/\$\$\$\$/,$file);

    The split works but doesnt include the token used to split on, i.e. i end up with

    $individual[0] =
    -OECHEM 658567-

    1 2 0000 V2000
    4 \t 5 8.7 7.655 3
    2 \t 55 6 4 5
    M END
    > <compound id>
    665765765
    > <source>
    db1

    $individual[1] =
    -OECHEM 35343-

    3 6 0000 V2000
    1 \t 7 6 4.6 9
    2 \t 45 0 3 5
    M END
    > <compound id>
    3546789
    > <source>
    db1

    I tried a regex like /(.*)?M END/ig May have to do two steps to do grouping and then substitution.

    Thanks.

      split removes the token it matched, but since you know what it is (here: $$$$ you can simply add it again.

      Instead of slurping the whole file and then slipping, you can set the input record separator accordingly:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @records; { local $/ = '$$$$' . "\n"; while (my $record = <DATA>) { my ($stripped) = split /\nM END\n/, $record, 2; push @records, "$stripped\nM END\n$/"; } } print Dumper \@records; __DATA__ -OECHEM 658567- 1 2 0000 V2000 4 \t 5 8.7 7.655 3 2 \t 55 6 4 5 M END > <compound id> 665765765 > <source> db1 $$$$ -OECHEM 35343- 3 6 0000 V2000 1 \t 7 6 4.6 9 2 \t 45 0 3 5 M END > <compound id> 3546789 > <source> db1 $$$$
        thank you thats great - it works. Just out of interest, does __DATA__ act as an "in file" file handler? thanks