Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: How to replace envs in path?

by BillKSmith (Monsignor)
on May 22, 2022 at 23:07 UTC ( [id://11144104]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to replace envs in path?
in thread How to replace envs in path?

I really do not understand your test cases. Please post code that we can run and duplicate both your successes and your failures (and know the difference). Note how my example used Test::More to show that my function did exactly what I expected (Perhaps not what you wanted, but you can tell). You have a good start with your pass and fail test cases. Unfortunately, we cannot tell if they are single or doubly quoted strings or if you intend to include newlines. The same thing applies to your expected results. Are backslashes literal or are they escapes? The result fragments you posted are much harder to test than complete strings.
Bill

Replies are listed 'Best First'.
Re^4: How to replace envs in path?
by ovedpo15 (Pilgrim) on May 23, 2022 at 06:14 UTC
    Hey Bill! The input of that sub is a path so under Unix it could be both - escaped or not escaped. Consider the following example:
    set playground="${HOME}/playground" set file1=${playground}'/fi$le1' set file2=${playground}'/file2$' set file3=${playground}'/f$i$l$e$3$' set file4=${playground}'/fi\$le4' mkdir -p ${playground} touch ${file1} touch ${file2} touch ${file3} touch ${file4}
    It contains different use cases. If you take a look at them, you can see that:
    ls -la $HOME/playground total 8 drwxr-s--- 2 root root 4096 May 22 06:26 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$ -rw-r----- 1 root root 0 May 22 06:26 fi$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\$le4 -rw-r----- 1 root root 0 May 22 06:26 file2$
    The two most interesting ones are file1 and file4. File1 does not escape the special symbol but File4 does escape it. I even can go further and create something like:
    > touch $HOME/playground/fi\\\\\$le5 > ls -la $HOME/playground/ total 8 drwxr-s--- 2 root root 4096 May 22 22:57 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 f$i$l$e$3$ -rw-r----- 1 root root 0 May 22 06:26 fi$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\$le4 -rw-r----- 1 root root 0 May 22 22:57 fi\\$le5 -rw-r----- 1 root root 0 May 22 06:26 file2$
    All of them are considered valid paths. My utility gets those paths and should check if there is a defined env in that path and if so, replace it. So the current algorithm is:
    - While there is a substring in the path that starts with $ and does not have backslash before the $, do:
    -- If the env is defined, replace it.
    -- Otherwise, escape the $ symbol with a backslash to indicate that it does not have a defined env (otherwise you will get an infinite loop).
    - Remove all the backslashes that escaped $.

    So the problem with that algorithm is a path where the $ symbol is already escaped - in order words, to distinguish between custom escaping as part of the algorithm and original escaping. It's a really rare corner case but could happen and I'm wondering how to handle it.
    The one thing that came to my mind in order to solve it, is to add some rare string, instead of just the backslash. For example, as you suggested, add UNSPECIFIED instead of the backslash. So you get something like:
    drwxr-s--- 2 root root 4096 May 22 22:57 . drwxr-s--- 7 root root 4096 May 22 06:26 .. -rw-r----- 1 root root 0 May 22 06:26 fUNSPECIFIED$iUNSPECIFIED$lUN +SPECIFIED$eUNSPECIFIED$3UNSPECIFIED$ -rw-r----- 1 root root 0 May 22 06:26 fiUNSPECIFIED$le1 -rw-r----- 1 root root 0 May 22 06:26 fi\UNSPECIFIED$le4 -rw-r----- 1 root root 0 May 22 22:57 fi\\UNSPECIFIED$le5 -rw-r----- 1 root root 0 May 22 06:26 file2UNSPECIFIED$
    And then just remove every occurence of UNSPECIFIED. But I see two problems with it:
    1. Of course, if user creates a file that contains UNSPECIFIED in the path, it will break it. But if there is no better solution that solves 100% of the cases, then it will do.
    2. My regex is [^\\]\$\{?(\w+)\}?. The [^\\] is to say "every char except backslash". How can I say "every substring that starts with $ and does not have UNSPECIFIED before it? In order words, how to fix the regex sequences to support UNSPECIFIED instead of a backslash?
      Please excuse me for not answering your latest question. Based on this new info, my understanding of you real requirement has changed. I now believe that you want to replace all UNIX variable references in a UNIX pathname with their value the same as UNIX does. You make an exception if the variable is not defined. You want to leave the reference in place. (UNIX would replace it with a null string.) This view of the requirements suggests a much different solution.
      use strict; use warnings; use Test::More tests => 5; my %ENV = ( # Overrides the global for this test playground => 'myplay', HOME => 'myhome', le1 => 'fix_for_file1' ); my @expected = ( q(set playground="myhome/playground")."\n", q(set file1=myplay'/fifix_for_file1')."\n", q(set file2=myplay'/file2$')."\n", q(set file3=myplay'/f$i$l$e$3$')."\n", q(set file4=myplay'/fi\$le4')."\n", ); # look # behind |------ $1 ------| my $with_braces = qr/ (?<!\\) ( \$ \{ (\w+) \} ) /x; my $without_braces = qr/ (?<!\\) ( \$ (\w+) ) /x; # |$2 | while (<DATA>) { s!$with_braces !$ENV{$2}//$1!xge; s!$without_braces!$ENV{$2}//$1!xge; is( $_ , shift @expected, $_ ); } __DATA__ set playground="${HOME}/playground" set file1=${playground}'/fi$le1' set file2=${playground}'/file2$' set file3=${playground}'/f$i$l$e$3$' set file4=${playground}'/fi\$le4'

      OUTPUT:

      1..5 ok 1 - set playground="myhome/playground" # ok 2 - set file1=myplay'/fifix_for_file1' # ok 3 - set file2=myplay'/file2$' # ok 4 - set file3=myplay'/f$i$l$e$3$' # ok 5 - set file4=myplay'/fi\$le4' #
      Bill
        # look # behind |------ $1 ------| my $with_braces = qr/ (?<!\\) ( \$ \{ (\w+) \} ) /x; my $without_braces = qr/ (?<!\\) ( \$ (\w+) ) /x; # |$2 |

        Note that this use of lookbehind is generally not a safe way to look for an unescaped $, because generally when backslashes are used for escaping, they are also used for escaping backslashes themselves: \\$HOME would then be considered an escaped backslash followed by a replaceable variable reference.

        In this context, it's less obvious what is correct since backslashes are not generally being used for escaping. As such, I'm not sure that the spec is coherent as a whole - it looks like there would be no way to express a string that should have one or more actual backslashes followed by a variable.

        Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144104]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-04-18 17:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found