dspivey has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks. I'm not new to regex, but I can't seem to figure this substitution problem out.

Problem: I want to do multiple substitutions, but only on one part of the line. Quite simply, replace hyphen (-) with underscore (_) in the left-side (key) portion of this config data. Basically, everything before the colon should be subjected to the substitution of - to _. Because hyphens can exist in the value portion, I've had difficulty bailing out of the expression.

session-redis-hosts-fault-tolerant: "tyk-redis-1.gateways.svc.cluster. +local,tyk-redis-2..." message-center-db: "http://message-center-db-1b.message-center-db.svc. +cluster.local"

* I completed this using multiple steps, but there must be a 1-liner for this.
* I'm also doing in-line modification of multiple files.

This is the solution I came up with. It works, but how would I improve this using more advanced regex features? I've recently read about "backtracking control verbs", but I can't seem to figure out how to apply any of them to my problem.

perl -pi -e '($match) = m/^([^:]+)\:/; $ds = $match; $ds =~ s/\-/\_/g; + s/$match/$ds/' *.config

Replies are listed 'Best First'.
Re: Repeated substitution on 1 side of a line only
by choroba (Cardinal) on Dec 16, 2016 at 15:18 UTC
    No verbs needed. Just replace - with _, and once you find :, replace the whole remainder of the string with itself:
    perl -pe 'BEGIN { %h = qw( - _ ) } s/(-)|(:.*)/$h{$1}$2/g'

    Explanation: One of $h{$1} and $2 is always empty, as only one part of the alternative can match.

    Update: Added the explanation and wrapped the hash initialization into a BEGIN.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Or skip the hash altogether...

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1177915 use strict; use warnings; while(<DATA>) { s#-|(:.*)# $1 // '_' #ge; print; } __DATA__ session-redis-hosts-fault-tolerant: "tyk-redis-1.gateways.svc.cluster. +local,tyk-redis-2..." message-center-db: "http://message-center-db-1b.message-center-db.svc. +cluster.local"
        I wanted to get the result without /e or any other "advanced" tricks.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Repeated substitution on 1 side of a line only
by BrowserUk (Patriarch) on Dec 16, 2016 at 15:28 UTC

    Another approach that avoids the regex engine:

    $s = 'message-center-db: "http://message-center-db-1b.message-center-d +b.svc.cluster.local"';; substr( $s, 0, index( $s, ':' ) ) =~ tr[-][_];; print $s;; message_center_db: "http://message-center-db-1b.message-center-db.svc. +cluster.local"

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Repeated substitution on 1 side of a line only
by tybalt89 (Monsignor) on Dec 16, 2016 at 15:38 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1177915 use strict; use warnings; while(<DATA>) { s#.*?:# $& =~ tr/-/_/r #e; # /r is your friend :) print; } __DATA__ session-redis-hosts-fault-tolerant: "tyk-redis-1.gateways.svc.cluster. +local,tyk-redis-2..." message-center-db: "http://message-center-db-1b.message-center-db.svc. +cluster.local"
Re: Repeated substitution on 1 side of a line only
by LanX (Saint) on Dec 16, 2016 at 15:04 UTC
    TIMTOWTDI

    Personally I would split on colon first, replace the first part and print out all parts again instead of a second replace like you do.

    Anyway for a one liner like you've shown you could use the eval modifier in substitute to run a replacement in the match only.

    Something like s# ^([^:]+) # $1 =~ tr/-/_/ #xe

    That's untested, not sure if $1 is a read only value and don't know all tr options by heart.

    try the r modifier with an embedded s/// then or copy to another var.

    hope you got the idea. :)

    update

    A "pure" regex without eval could probably work with \K meta in the regex to continue searching after each hyphen.

    Or by combining look around assertions.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Ah, yes, you've helped me answer one of my questions that gave me trouble! That was, how to modify something that had been captured in a group? I realized I could capture the key (before the colon), but then didn't know how to just modify that part of it. I see your solution now, makes sense. Thank you!

      Yes, certainly more than one way, but I got so excited when I started reading about (*SKIP) (and the like) and recursion (R)? that I thought perhaps there was a solution there.

Re: Repeated substitution on 1 side of a line only
by AnomalousMonk (Archbishop) on Dec 16, 2016 at 15:09 UTC

    Assuming that the first  : (colon) is the separator between the "key" and "value" parts of a record (needs Perl version 5.10+ for  \K operator):

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my @ra = ( 'session-redis-hosts-fault-tolerant: \"tyk-redis-1.gateways.svc.clu +ster.local,tyk-redis-2...\"', 'message-center-db: \"http://message-center-db-1b.message-center-db +.svc.cluster.local\"', 'a-b-c:d-e-f:g-h-i:j-k', ); ;; for my $s (@ra) { print qq{'$s'}; $s =~ s{ \G [^:]*? \K - (?= [^:]* :) }{_}xmsg; print qq{'$s' \n}; } " 'session-redis-hosts-fault-tolerant: "tyk-redis-1.gateways.svc.cluster +.local,tyk-redis-2..."' 'session_redis_hosts_fault_tolerant: "tyk-redis-1.gateways.svc.cluster +.local,tyk-redis-2..."' 'message-center-db: "http://message-center-db-1b.message-center-db.svc +.cluster.local"' 'message_center_db: "http://message-center-db-1b.message-center-db.svc +.cluster.local"' 'a-b-c:d-e-f:g-h-i:j-k' 'a_b_c:d-e-f:g-h-i:j-k'

    Update: Actually, the look-ahead portion of
        $s =~ s{ \G [^:]*? \K - (?= [^:]* :) }{_}xmsg;
    seems unnecessary. Changing the first inverted character class to  [^:-] is simpler:
        $s =~ s{ \G [^:-]* \K - }{_}xmsg;


    Give a man a fish:  <%-{-{-{-<

      Thank you for the reply. I was hoping that because I'm using in-line modification from the command line, that it could somehow be a 1-liner rather than a for loop?

      To answer your question, yes, the colon is the delimiter.

        The for-loop is just for the purpose of showing an example with various strings. Concentrate on the  s/// and what it's doing — and in particular, see the update. No reason the  s/// couldn't just be dropped into something like the one-liner you show in the OP.


        Give a man a fish:  <%-{-{-{-<

        I was hoping that because I'm using in-line modification from the command line, that it could somehow be a 1-liner
        REM Windows CMD prompt perl -F: -e "$F[0] =~ s/-/_/g; print join(':',@F);"
        # Linux/Unix/Cygwin shell perl -F: -e '$F[0] =~ s/-/_/g; print join(":",@F);'

        Disclaimer: Not tested.

        Explanation: -F: turns on auto-split (-a) and auto-loop (-n) and sets the split character to ':'