Klainn has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure this is very simply, but my perl is very weak so I'm not really having much luck.

I have a string like such:

cd / ; /path/to/R/R_latest --vanilla --args "fName='rGSDPlan';jobCode= +682718;jobId=6827181;" < job682718.R > job_6827181.txt

I want to translate all ; to && except where they fall between double quotes. I'm sure this is pretty simple, but I'm having no luck. I need to do this within a bash shell script so I was looking to do a in place replace on the string.

Any tips would be appreciated!

Replies are listed 'Best First'.
Re: In place replace, ignoring between quotes
by toolic (Bishop) on Oct 25, 2013 at 17:50 UTC
    Brute force...
    use warnings; use strict; my $str = q(cd / ; /path/to/R/R_latest --vanilla --args "fName='rGSDPl +an';jobCode=682718;jobId=6827181;" < job682718.R > job_6827181.txt); my $new; my $out = 1; for (split //, $str) { $out = ! $out if /"/; $new .= (/;/ and $out) ? '&&' : $_; }
      This is neat. This is an implementation of a state machine, right? In the debugger it looks like $out just undefines and redefines itself for every double quote? I don't understand how you can just say, equal not yourself. How does that work?
        Yes, this is a state machine. I spend a lot of time coding in Verilog, where toggling a bit is a natural part of the language. I guess this would be cleaner in Perl:
        $out = $out ? 0 : 1 if $_ eq '"';

        I do not know why Perl sets $out to undef.

        If $var is true, then !$var is false ('').
        If $var is false, then !$var is true (1).

        That new value is then assigned to $var

        Alternatively, you could say $var ^= 1 to do the same sort of thing, just with 1 and 0 instead of 1 and ''.

Re: In place replace, ignoring between quotes
by LanX (Saint) on Oct 25, 2013 at 17:44 UTC
    > I'm sure this is very simply, but my Perl is very weak so I'm not really having much luck.

    nope it's not trivial.

    But a simplistic solution of 3 phase regexing is to extract and replace all parts within quotes with a safe placeholder (using special characters like '§'), then do your translation and at the end reenter the extracted parts.

    Other approaches can be found by searching for "recursive parsing regexes", but IMHO they are overkill in this case.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    proof of concept

    DB<131> $str=$str0 => " bla0 \"ignore a1\" bla1 \"ignore a2\" bla2" DB<132> $x=0;$extract[$x++]=$& while $str =~ s/"[^"]*"/<$x>/ => "" DB<133> @extract => ("\"ignore a1\"", "\"ignore a2\"") DB<134> $str =~ s/a/A/g => 3 DB<135> $str => " blA0 <0> blA1 <1> blA2" DB<136> $str =~ s/<(\d+)>/$extract[$1]/g => 2 DB<137> $str => " blA0 \"ignore a1\" blA1 \"ignore a2\" blA2"
Re: In place replace, ignoring between quotes (one regex)
by LanX (Saint) on Oct 25, 2013 at 18:24 UTC
    here one pure regex solution using /e eval-option to handle different cases:
    DB<151> $str=$str0 => " bla0 \"ignore a1\" bla1 \"ignore a2\" bla2" DB<152> $str =~ s/("[^"]*?"|a)/ $1 eq "a" ? "A" : $1 /ge => 5 DB<153> $str => " blA0 \"ignore a1\" blA1 \"ignore a2\" blA2"

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      even shorter and more general in usage:

      DB<229> $_=$str0 => " bla0 \"ignore a1\" bla1 \"ignore a2\" bla2" DB<230> s/("[^"]*")|a/ $1 or 'A' /ge => 5 DB<231> $_ => " blA0 \"ignore a1\" blA1 \"ignore a2\" blA2"

      or for the OP

      DB<226> $_=q{cd / ; /path/to/R/R_latest --vanilla --args "fName='rGS +DPlan';jobCode=682718;jobId=6827181;" < job682718.R > job_6827181.txt +} DB<227> s/("[^"]*")|;/ $1 or '&&' /ge => 2 DB<228> $_ => "cd / && /path/to/R/R_latest --vanilla --args \"fName='rGSDPlan';j +obCode=682718;jobId=6827181;\" < job682718.R > job_6827181.txt"

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Re: In place replace, ignoring between quotes
by LanX (Saint) on Oct 25, 2013 at 18:42 UTC
    last but not least

    DB<171> $str=$str0 => " bla0 \"ignore a1\" bla1 \"ignore a2\" bla2" DB<172> @parts=split '"',$str => (" bla0 ", "ignore a1", " bla1 ", "ignore a2", " bla2") DB<173> map {s/a/A/ unless $x++%2} @parts => (1, 1, 1, 1, 1) DB<174> $str=join '"',@parts => " blA0 \"ignore a1\" blA1 \"ignore a2\" blA2"

    also as one-liner

    DB<185> $x=0;$str=$str0 => " bla0 \"ignore a1\" bla1 \"ignore a2\" bla2" DB<186> $str= join '"', map {s/a/A/ unless $x++%2;$_} split '"', $st +r => " blA0 \"ignore a1\" blA1 \"ignore a2\" blA2"

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: In place replace, ignoring between quotes
by AnomalousMonk (Archbishop) on Oct 25, 2013 at 21:27 UTC

    Can also be done as a 'pure' regex (but without need for  /e evaluation) with the Special Backtracking Control Verbs (see perlre) of 5.10+.

    >perl -wMstrict -le "my $s = q{cd / ; /path/latest --van --args \"fName='foo';jobCode=12;jobId=34 +;\" < j1.R > j1.txt}; print qq{'$s'}; ;; my $d_quo = qr{ \" [^^\"]* (?: \\. [^\"]*)* \" }xms; ;; $s =~ s{ $d_quo (*SKIP)(*FAIL) | ; }{&&}xmsg; print qq{'$s'}; " 'cd / ; /path/latest --van --args "fName='foo';jobCode=12;jobId=34;" < + j1.R > j1.txt' 'cd / && /path/latest --van --args "fName='foo';jobCode=12;jobId=34;" +< j1.R > j1.txt'

    Note: Without the escapology required by the Windoze command line, the  $d_quo regex is
        my $d_quo = qr{ " [^"]* (?: \\. [^"]*)* " }xms;
    I hope that's a little more clear!

Re: In place replace, ignoring between quotes
by Lennotoecom (Pilgrim) on Oct 25, 2013 at 19:05 UTC
    following LanX's steps
    $_ = q(aaa; ; bbb ; "ccc ; ddd ;" ee;e ";" fff ;); s/(".+;.+"|;)/$1 ne ';' ? $1 : '&&'/ge;
    outputs
    aaa&& && bbb && "ccc ; ddd ;" ee;e ";" fff &&
    so It fit the authors task, if he doesn't have any
    of these constructions:
    ; " ; " ; " ; " ;
    seems to me that, these should be processed with toggles =_=
    LanX's code for the author's task:
    $b = 0; $_ = q(aaa; ; bbb ; "ccc ; ddd ;" ee;e ";" fff ;); $a = join '', map {$b =!$b if /"/; s/;/'&&'/ unless $b; $_} split //; output: aaa&& && bbb && "ccc ; ddd ;" ee&&e ";" fff &&
Re: In place replace, ignoring between quotes
by aitap (Curate) on Oct 26, 2013 at 07:52 UTC
    Another approach (not the fastest) relying on Text::ParseWords to split the command into "words" (hopefully) like shell does:
    $ perl -MText::ParseWords=quotewords -nle 'print join " ", map { $_ eq + ";" ? "&&" : $_ } quotewords qr/\s+/, 1, $_' cd / ; /path/to/R/R_latest --vanilla --args "fName='rGSDPlan';jobCode= +682718;jobId=6827181;" < job682718.R > job_6827181.txt cd / && /path/to/R/R_latest --vanilla --args "fName='rGSDPlan';jobCode +=682718;jobId=6827181;" < job682718.R > job_6827181.txt
Re: In place replace, ignoring between quotes
by Anonymous Monk on Oct 25, 2013 at 22:30 UTC

    Look-ahead for even number of quotes:

    $_ = q[cd / ; /path/to/R/R_latest --vanilla --args "fName='rGSDPlan';j +obCode=682718;jobId=6827181;" < job682718.R > job_6827181.txt]; say s/;(?=[^"]*(?:"[^"]*"[^"]*)*$)/&&/gr;

    Output:

    cd / && /path/to/R/R_latest --vanilla --args "fName='rGSDPlan';jobCode +=682718;jo bId=6827181;" < job682718.R > job_6827181.txt
Re: In place replace, ignoring between quotes
by choroba (Cardinal) on Oct 26, 2013 at 06:48 UTC
    Just watch out for quoted double quotes
    echo '"'
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      (continuing Re^2: In place replace, ignoring between quotes (one regex))

      this should handle single and double quotes and escaping

      DB<145> p $s3=q{ "no" yo \\' yo ' no \\\\\\' no ' yo } "no" yo \' yo ' no \\\' no ' yo DB<146> $_=$s3; s/ ( \\. | ' (?: \\.|[^'] )* ' | " (?: \\.|[^"] )* +" ) | o / $1 or 'O' /xge ; => 6 DB<147> p $_ "no" yO \' yO ' no \\\' no ' yO

      If escapes outside of quotes should be ignored depends on the use-case, otherwise just skip the first or-case.

      of course both quote-cases can be joined:

      DB<148> $_=$s3; s/ ( \\. | (['"]) (?: \\.|[^\2] )* \2 ) | o / $1 o +r 'O' /xge ; => 6 DB<149> p $_ "no" yO \' yO ' no \\\' no ' yO

      I think it's a good example to show how the regex engine does backtracking! =)

      update

      maybe easier to reuse with non-greedy quantifier

      DB<110> $_=$s3; s/ ( \\. | (['"]) (?: \\.|. )*? \2 ) | o / $1 or ' +O' /xge ; print ; () "no'" yO \' yO ' no \\\' no ' yO

      Cheers Rolf

      ( addicted to the Perl Programming Language)