Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:

PROBLEMS BELOW APPEAR SOLVED

I have one problem left to solve, appears straightforward, but my code is doing incomplete job of getting rid of bogus "current dir" characters -- i.e. changing "/./" to "/"

n00b perlmonk humbly requests grizzled old perlmonks if they can help. What the heck is the correct terminology for a n00b perlmonk anyway?

My comment and code follow

# D) Rid input files of all instances of the unix current directory symbol "/./"
# Example: /vobs/synergy_core_apps/code/./././personalize/src would be changed to
# /vobs/synergy_core_apps/code/personalize/src

$file_vep_cleaned =~ s/\/\.\//\//g;

=======================

I am getting started in adding sed to my Perl programs and would appreciate some help with these 2 sed problems.

I need to write a number of sed statements to clean up errors from input files consisting of clearcase directories and files of the form:

/vobs/dirname1/dirname2/dirname3/filename
or
/vobs/dirname1/dirname2/dirname3

I have completed all the needed sed statements except for two that are needed to clean up the following errors:

1) Convert duplicated front-slash markers to single front-slash markers, i.e.

/vobs/dirname1////dirname2//dirname3////filename
becomes
/vobs/dirname1/dirname2/dirname3/filename

I have tried to replace 2 or more "/"'s with a single "/" using

$filename =~ s/\/[\/]+/\//;

but the above statement does nothing.

What Perl sed syntax might be used to make this transformation?

2) Another error is to rid my input file of all instances of the unix parent directory symbol ".." - i.e.

/vobs/synergy_core_apps/code/../code/personalize/src

would be changed to

/vobs/synergy_core_apps/code/personalize/src

Does anyone know how this might be done using Perl sed syntax?

Thank You,
Amphaiaraus

Replies are listed 'Best First'.
Re: 2 substitution problems
by parv (Parson) on Mar 28, 2009 at 07:31 UTC

    s/// stands for substitution operation, not sed. sed(1) is not synonymous to substitution, even though that may be its most used feature. Could you please update the title of your post to "2 substitution problems"?

    /vobs/dirname1////dirname2//dirname3////filename becomes /vobs/dirname1/dirname2/dirname3/filename

    I have tried to replace 2 or more "/"'s with a single "/" using

    $filename =~ s/\/[\/]+/\//;

    but the above statement does nothing

    That last statement is false, which you would have noticed if you had printed the file name before & after s/// operation. You would have seen that at least first set of consecutive '/' is squashed to single one. To flatten all the rest, all you need to do is globally replace by using /g flag ...

    # Used a delimiter other than '/' to avoid escaping '/'. $p =~ s!//+!/!g;

    As for your second problem, the subdirectory needs to be captured to reference in the pattern (and later in substitution) ...

    $p = '/vobs/synergy_core_apps/code/../code/personalize/src'; print 'before: ' , $p , "\n"; # Besides the /g flag, used /x in order to make the pattern stand out +. $p =~ s{ ([^/]+) /[.]{2}/ \1 } /$1/gx; print 'after: ' , $p , "\n";

    If the paths being cleansed actually exist on the file system, I would rather use &Cwd::realpath.

    See also: perlretut, perlre, YAPE::Regex::Explain, perlfaq6, Mastering Regular Expressions, etc.

      The  s{ ([^/]+) /[.]{2}/ \1 } /$1/gx; regex only handles the 'foo/../foo' case; it does not handle something like 'foo/bar/../../baz'.

      Purely as a mental exercise, a deprecated regex approach to handling repeated return to a parental directory might be something like:

      >perl -wMstrict -le "my $parent = qr{ \.\. / }xms; my $dir = qr{ (?> [^/]+ /) (?<! $parent) }xms; print 'output:'; for my $f (@ARGV) { 1 while $f =~ s{ $dir $parent }{}xmsg; print $f } " /vobs/foo/../foo/bar/me/my/../../me/my/moo /vobs/foo/../bar/me/my/../../ma/mo/moo /vobs/x/y/z/../../../x/y/z/filename /vobs/x/y/z/../../../a/b/c/filename /a/../b/c/d/../../e/f/g/h/i/../../../j/k/l/m/n/o/p/../../../../q output: /vobs/foo/bar/me/my/moo /vobs/bar/ma/mo/moo /vobs/x/y/z/filename /vobs/a/b/c/filename /b/e/f/j/k/l/q
      However, this falls into the class of Stupid Regex Tricks because:
      • I doubt it covers all possible variations of *nix path;
      • Even if it covers all such variations (or the subset thereof that the OPer is interested in), extensive testing would be needed to verify this;
      • It does not handle symlinks (see Re^3: 2 substitution problems below);
      • It is not portable to other file systems;
      • And finally, I'm sure there are a few other objections one could think of given a little time.

      As Amphiaraus said, the preferred solution would be some module like Cwd that would overcome all these objections.

      Update: Added symlink objection to list of regex solution objections per parv.

        Given a path actually exists, problem with a solution not actually changing directories is specified in File::Spec/METHODS ...

        canonpath

        No physical check on the filesystem, but a logical cleanup of a path.

        $cpath = File::Spec->canonpath( $path ) ;

        Note that this does *not* collapse x/../y sections into y. This is by design. If /foo on your system is a symlink to /bar/baz, then /foo/../quux is actually /bar/quux, not /quux as a naive ../-removal would give you. If you want to do this kind of processing, you probably want "Cwd"'s "realpath()" function to actually traverse the filesystem cleaning up paths like this.

        Otherwise, OP should test|improve on AnomalousMonk's and/or graff's solutions.

Re: 2 sed problems
by graff (Chancellor) on Mar 28, 2009 at 14:36 UTC
    TMTOWTDI:
    $_ = my $original = "/one/two///three/four/..//fourX/five////six/seven +"; # you can do job 1 alone like this: tr!/!/!s; print "1: $original -> $_\n"; # OR you can do jobs 1 and 2 together like this; my @srcpaths = split m!/+!, $original; my @outpaths = (); for my $p ( @srcpaths ) { if ( $p eq '..' ) { pop @outpaths; } else { push @outpaths, $p; } } $_ = join( "/", @outpaths ); print "2: $original -> $_\n";
    The latter is more verbose, but it's easier to see and understand what's being done; also, it will work on things like /one/two/three/../../two/three (which the regex solutions suggested above don't handle). If you want, you can even test for existence as you go (with -d join("/",@outpaths) (*)), and test for paths that are just wrong (too many "/.."):
    pop @outpaths or die "Bad path: $original\n";

    The regex solutions were assuming you only want to remove the "/.." in cases where the next part of the path matched the previous part of the path, but it seems to me that you really should have a solution that handles all the "/.." cases -- it's entirely possible that you'll run into a lot of different ones.

    (*) Update: footnote about -d join("/",@outpaths) -- with an absolute path like "/one/two", $p is an empty string on the first iteration, and you don't want to use "-d" on that. Putting full error checks into the loop would be something like this:

    my @outpath = (); for my $p ( split m!/+!, $original ) { if ( $p eq '..' ) { pop @outpath or die "Bad path: $original"; } else { if ( @outpath ) { my $chk = join "/", @outpath, $p; die "Path not found: $chk" unless ( -d $chk ); } push @outpath, $p; } } my $goodpath = join "/", @outpath;

      If the original path started out with ../, program will die too soon for no good reason. In this case, ../ should be preserved.

        Yup, good point. And the best way to fix it would be to "use Cwd", which makes all that code unnecessary anyway:
        #!/usr/bin/perl use Cwd qw/abs_path/; print abs_path( $_ ) for ( @ARGV );
        That prints an empty string (no error message) whenever any non-final path component is non-existent or unreadable (due to permissions).
Re: 2 sed problems
by Anonymous Monk on Mar 28, 2009 at 07:26 UTC
    Hi,
    Try this..
    First problem:   $var=~ s/\/+/\//g;

    Second Problem:   $var=~s/(.*?)\/(.*?)\/\.\.\/\2\/(.*)/$1\/$2\/$3/g;