in reply to Re: regex, pos, \G, and substr
in thread regex, pos, \G, and substr

I think it's perceptive to split the guts of the phrases on [ ]and[ ], but it's really important in my case that the leftovers are only digits. While I could throw grep { /^\d+$/ } in front of the split, I'd lose visibility to any non-digit stuff that was (mistakenly) there in the process of following through with the replace side of the (s)ubstitute operator. In other words, I'd rather leave everything alone if there's anything "non-digit" besides the and splitters in there. BTW, I like the single-quotes for delimiting the split regex.

Replies are listed 'Best First'.
Re^3: regex, pos, \G, and substr
by BrowserUk (Patriarch) on Jun 03, 2007 at 03:34 UTC

    That's what I meant by strengthening the regex. Note that the non-conformant additional third line is left untouched:

    #! perl -slw use strict; my $data_stg = 'junk text update 8923 mark complete update 8324 mark ' . 'complete more junk update 5438 and 5843 and 1522 mark ' . 'complete more junk update junk and 5843 and 1522 mark ' . 'complete update 8435 and 9323 mark complete true junk' ; $data_stg =~ s[update ((?:\d+|\s|and)+) mark complete]{ join ' ', map{ "update $_ mark complete"} split '\s+and\s+', $1 }ge; print $data_stg; __END__ ## Output wrapped to match input for easier verification. junk text update 8923 mark complete update 8324 mark complete more junk update 5438 mark complete update 5843 mark complete + update 1522 mark complete more junk update junk and 5843 and 1522 mark complete update 8435 mark complete update 9323 mark complete true junk

    Alternatively, verify that the split values are numeric, produce a warning and put the original back if not:

    #! perl -slw use strict; my $data_stg = 'junk text update 8923 mark complete update 8324 mark ' . 'complete more junk update 5438 and 5843 and 1522 mark ' . 'complete more junk update junk and 5843 and 1522 mark ' . 'complete update 8435 and 9323 mark complete true junk' ; $data_stg =~ s[(update (.+?) mark complete)]{ my @numbers = split '\s+and\s+', $2; if( grep{ !/^\d+$/ } @numbers ) { warn "Malformed request: '$1'\n"; $1; } else{ join ' ', map{ "update $_ mark complete"} @numbers; } }ge; print $data_stg; __END__ ## Output wrapped to match input for easier verification. Malformed request: 'update junk and 5843 and 1522 mark complete' junk text update 8923 mark complete update 8324 mark complete more junk update 5438 mark complete update 5843 mark complete + update 1522 mark complete more junk update junk and 5843 and 1522 mark complete update 8435 mark complete update 9323 mark complete true junk
    BTW, I like the single-quotes for delimiting the split regex.

    Most don't. They consider it a bad habit of mine.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: regex, pos, \G, and substr
by ysth (Canon) on Jun 04, 2007 at 01:02 UTC
    I'd rather leave everything alone if there's anything "non-digit" besides the and splitters in there.
    Then leave that part the same as in your original looping regex:
    s[update (\d+(?: and \d+)+) mark complete]{...}ge;