flightdm has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Merry Monks of the Monastery - I'm stumped on trying to figure out a regex for this, and it's a bit of surprise to me; I've been merrily regexing for years, but I've come a sudden cropper here. I'd appreciate some help. I need to replace all tabs in a line that are not a) next to a double-quote, b) at the beginning of the line, or c) at the end of the line. Now, I could say something nasty like
s/(?:^\t|\t$)/-=-|1|-=-/g; s/([^"])(?:\t")/$1-=-|2|-=-/g; s/(?:"\t)([^"])/-=-|3|-=-$1/g; s/"\t"/-=-|4|-=-/g; s/\t//g; #...and then the obvious Opposite Thing
but... my Perlish soul yearns for something more - you know, prettier. Isn't there some magic look-around regex that could do it all in one pass? I've been fiddling and fiddling with it, well past the point of practicality, and now I just want to know if it's doable.

Replies are listed 'Best First'.
Re: Regex for replacing a character "not next to" another character
by BrowserUk (Patriarch) on Nov 04, 2016 at 01:42 UTC

    Try this:

    $s = "\tfred\tx\t\"abc\tdef\"\ty\tz\t";; print "'$s'";; ' fred x "abc def" y z ' $s =~ s[(?<!^)(?<!")\t(?!"|$)][***]g;; print "'$s'";; ' fred***x "abc***def" y***z '

    It didn't replace the tab at the beginning of the string, the one preceding the first double quote or following the second, nor the one at the end.

    Expanded the regex is:

    s[ (?<!^) # not preceded by the start of string (?<!") # nor by a double quote \t # replace tabs (?!"|$) # that are also not followed by a double quote or the end +of string ][***]gx;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Outstanding! Thanks much, especially for the detailed explanation. Worked perfectly!
Re: Regex for replacing a character "not next to" another character
by tybalt89 (Monsignor) on Nov 04, 2016 at 01:36 UTC
    s/(?<=[^"])\t(?!"|$)//g

    If I understand your requirements...

      Hi

      Probably I still need more coffee ... ;)

      ... but shouldn't this  look-behind assertion

      >  (?<=[^"])

      rather be negated?

       (?<![^"])

      edit

      Never mind.

      Indeed not enough coffee, you are mixing two approaches which was confusing me.

      UPDATE

      there is a limitation in your approach when dealing with multiple lines. (though the OP didn't explicitly ask for this)

      DB<135> $str = "\tstart\tmiddle1\t\"quote1\tquote2\"\tmiddle2\tend\t +"; => "\tstart\tmiddle1\t\"quote1\tquote2\"\tmiddle2\tend\t" DB<136> $str .= "\n$str" => "\tstart\tmiddle1\t\"quote1\tquote2\"\tmiddle2\tend\t\n\tstart\tmi +ddle1\t\"quote1\tquote2\"\tmiddle2\tend\t" DB<137> p $str start middle1 "quote1 quote2" middle2 end start middle1 "quote1 quote2" middle2 end => 1 DB<138> p $str =~ s/ (?<=[^"]) \t (?!"|$) /***/gmxr start***middle1 "quote1***quote2" middle2***end ***start***middle1 "quote1***quote2" middle2***end => 1 DB<139> p $str =~ s/ (?<!")(?<!^) \t (?!"|$) /***/gmxr start***middle1 "quote1***quote2" middle2***end start***middle1 "quote1***quote2" middle2***end => 1

      You filter tabs "preceded by any character which isn't a quote" with (?<=[^"]) supposing that line-start is not any character.

      As you can see BUK's approach still works in this case.

      FWIW:

      It was first confusing me that (?!"|^) wasn't used, but the regex engine rejects "variable length look-behind assertions" (which is not really the case here)

      DB<140> p $str =~ s/ (?<!"|^) \t (?!"|$) /***/gmxr Variable length lookbehind not implemented in regex m/ (?<!"|^) \t (?! +"|$) / at (eval 110)[multi_perl5db.pl:644] line 2.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        For a multi-line string, simply add \n to the negated character group:

        s/(?<=[^"\n])\t(?!"|$)//gm

        and of course a /m for the $