r has asked for the wisdom of the Perl Monks concerning the following question:

Hi out there,
who of you can solve the following problem?

Our goal is to read from a non-formatted text file and print it out with all occurences of exactly two same characters in consecutive order enclosed in parentheses. Condition is to do it in an one-line perl script, if possible using regex (and not split because thatīs too easy).

If our input file would be:

sssshjfdhgriuu
xxddvvggggaaaiii
ougur9s oknddpp
hgorsagnnnnccc
nkjl(())uh

... the output should be:

sssshjfdhgri(uu)
(xx)(dd)(vv)ggggaaaiii
ougur9s( )okn(dd)(pp)
hgorsagnnnnccc
nkjl((()()))uh

One of the non-working (and more than one line) attempts:
while (<>) { s/(.)((?!\1).)\2(?!\2)/$1\($2$2\)$3/g; s/^(.)\1(?!\1)/\($1$1\)$2/; print; }
Looking forward to your answers,
r

Replies are listed 'Best First'.
Re: How to do it in one line?
by Courage (Parson) on Jul 03, 2002 at 22:40 UTC
    seems like homejob, because it's useless but tuitfull. Anyway (on Win32 shell):
    perl -wne "s/(.)\1+(?!\1)/length($&)>2?qq[$&]:qq[($1$1)]/ge;print" fil +e.txt
    on Unix just
    perl -wne 's/(.)\1+(?!\1)/length($&)>2?qq[$&]:qq[($1$1)]/ge;print' fil +e.txt
    Got same result on your input!

    Courage, the Cowardly Dog.

      Same basic idea, but I shaved a few chars off of it....
      # 1 2 3 4 #2345678901234567890123456789012345678901234 perl -pe's/(.)\1(\1*)(?!\1)/$2?$&:"($&)"/ge' file.txt

      -Blake

        Thanks a lot, although it is not a homejob ;-) Question is if itīs possible using exclusively regex (no ternary operator).

        Explore the unknown,
        know the known.

        -r
Re: How to do it in one line?
by Abigail-II (Bishop) on Jul 04, 2002 at 11:03 UTC
    Not the shortest possible answer, but without using /e:
    perl -pe's/(\G|(.)(?!\2))((.)\4)(?!\4)/$2($3)/g' file

    Abigail

      I don't suppose that someone would consider explaining how this works for me?

      I spent a couple of hours on trying to understand this in conjunction with perlman:perlre and I am frankly lost.

        s/(\G # Either were we finished the previous time # (or at the beginning the first time). | # Or (.)(?!\2)) # A character ($2) not followed by itself ( # Capture ($3) (.)\4(?!\4) # A character ($4) followed exactly once by + itself. ) # End capture ($3) /$2($3)/xg # Replace with $2 (what preceeded) followed + by # "($3)" (the duplicate).
        It boils down to that if you have exactly two occurances of a character, something else should preceed those chararcters, or they should be at the beginning of the string (or were we left off the previous time).

        Abigail