bmal has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a string ($sentence) containing pattern in this format:

A~~~.*~A

Now I wanna extract all "A~~~.*~A" from the string, each time i find one match, I change the $_ to contain the rest of the string, then sub the string by s/$_// and assign to $remainPart, so that the $remainPart only store the portion before this "A~~~.*~A" pattern..

However, i have a problem here, i extract $_ from $sentence, but cannot use it to sub $remainPart(which is equal to $sentence originally). Wondering why. Any suggestion?

Here is the source code:

#! usr/bin/perl $sentence = " he PP study VB A~~~language~A NP and CC play VB A~~~game~A NP ." $_ = $sentence; while ($_ ne '') { if (m/A~~~(.*)~A/g) { # global match $remainPart = $sentence; ($_) = /(\G(.|\n)*)/g; # store rest of the string # (after pos) to $_ $remainPart =~ s/$_//; print "$remainPart\n"; } else { $_ = ''; } }

By right, first time the $_ will contain

" NP and CC play VB A~~~game~A NP ."

but as the $remainPart is printed out, it is still same as $sentence, which indicates the substitution is not successful.

Replies are listed 'Best First'.
Re: Regexp Substitution Problem!!
by davorg (Chancellor) on Jan 02, 2002 at 16:34 UTC

    I'm not really sure what you're trying to do. If all you want to do is remove the A~~~(.*)~A sections from the string, then all you need is something like:

    s/A~~~(.*?)~A//g;

    Note that I've changed your .* to .*?. See Death to Dot Star! for the reason why.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Regexp Substitution Problem!!
by simon.proctor (Vicar) on Jan 02, 2002 at 16:56 UTC
    I wasn't quite sure what you wanted but if I understand you rightly then this should help. From what I can tell, you want to grab anything A~~~stuff~A and remove it from the string and print out what you removed. This snippet will do that for you:

    $sentence = " he PP study VB A~~~language~A NP and CC play VB A~~~game~A NP ."; $_ = $sentence; while (m/(A~~~([^~]+)~A)/g) { # debug print print $1 . "\t" . $2 . "\n"; # Remove our match push @parts, $2; $_ =~ s/$1//g; } $sentence = $_; # print the bits we took out foreach (@parts) { print $_ . "\n"; } # print the new sentence print $sentence . "\n";


    Hope that helps - one problem you did have was with your original match of A~~~(.*)~A. This greediness would have caused you real problems :P

    For example, if your sentence had been
    $sentence = "A~~~one~A and A~~~two~A"; $sentence =~ m/A~~~(.*)~A/g; print $1;
    Then you are going to get unexpected results. You can use the non greedy quantifier ? to help out here or you can change the regex syntax.

    HTH :)
Re: Regexp Substitution Problem!!
by Trimbach (Curate) on Jan 02, 2002 at 18:07 UTC
    I think you're trying to go about this in a weird way, where what you really want is to do successive matches against a string, finding a marker (the A~~~blah~A part) and then the stuff that occurred up to the marker. The whole "match something then erase what you just matched" is kinda strange, as it's much easier to just find what you want, stick it into a separate variable (where you can fold-spindle-and-mutilate as much as you want) without having to mess with the original string itself. Maybe something like this?
    #!usr/bin/perl my $sentence = " he PP study VB A~~~language~A NP and CC play VB A~~~game~A NP ."; while ($sentence =~ m/(.*?)A~~~(.*?)~A/sg) { my $before_marker = $1; my $marker = $2; print "Before marker is: $before_marker\n"; print "The marker is: $marker\n\n"; # Do stuff with $before_marker and $marker }
    which will produce:
    Before marker is: he PP study VB The marker is: language Before marker is: NP and CC play VB The marker is: game
    Does this get at what you're trying to do? If not, post some more details as to what you really want, as I suspect everyone's a little confused right now.

    Gary Blackburn
    Trained Killer

Following: Regexp Substitution Problem!!
by bmal (Novice) on Jan 02, 2002 at 17:20 UTC

    Oh, I think I find the answer. The problem should be with the string, in the first post the string should cause no problem

    $sentence = " he PP study VB A~~~language~A NP and CC play VB A~~~game~A NP ."

    but if I change the string into this

    $sentence = " he PP study VB A~~~language~A NP and CC play VB A~~~(G|g)ames?~A NP ."

    because /(G|g)ames?/ matches Games, Game, games, game, but it will not match "(G|g)ames?". I think that's the reason.

    but how can I sub the $_ from $remainPart anyway?