Yet another substitution problem

Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:

PROBLEMS BELOW APPEAR SOLVED

I have one problem left to solve, appears straightforward, but my code is doing incomplete job of getting rid of bogus "current dir" characters -- i.e. changing "/./" to "/"

n00b perlmonk humbly requests grizzled old perlmonks if they can help. What the heck is the correct terminology for a n00b perlmonk anyway?

My comment and code follow

# D) Rid input files of all instances of the unix current directory symbol "/./"
# Example: /vobs/synergy_core_apps/code/./././personalize/src would be changed to
# /vobs/synergy_core_apps/code/personalize/src

$file_vep_cleaned =~ s/\/\.\//\//g;

=======================

I am getting started in adding sed to my Perl programs and would appreciate some help with these 2 sed problems.

I need to write a number of sed statements to clean up errors from input files consisting of clearcase directories and files of the form:

/vobs/dirname1/dirname2/dirname3/filename
or
/vobs/dirname1/dirname2/dirname3

I have completed all the needed sed statements except for two that are needed to clean up the following errors:

1) Convert duplicated front-slash markers to single front-slash markers, i.e.

/vobs/dirname1////dirname2//dirname3////filename
becomes
/vobs/dirname1/dirname2/dirname3/filename

I have tried to replace 2 or more "/"'s with a single "/" using

$filename =~ s/\/[\/]+/\//;

but the above statement does nothing.

What Perl sed syntax might be used to make this transformation?

2) Another error is to rid my input file of all instances of the unix parent directory symbol ".." - i.e.

/vobs/synergy_core_apps/code/../code/personalize/src

would be changed to

/vobs/synergy_core_apps/code/personalize/src

Does anyone know how this might be done using Perl sed syntax?

Thank You,
Amphaiaraus

Comment on Yet another substitution problem Download Code

Replies are listed 'Best First'.
Re: 2 substitution problems by parv (Parson) on Mar 28, 2009 at 07:31 UTC
s/// stands for substitution operation, not sed. sed(1) is not synonymous to substitution, even though that may be its most used feature. Could you please update the title of your post to "2 substitution problems"? `/vobs/dirname1////dirname2//dirname3////filename` becomes `/vobs/dirname1/dirname2/dirname3/filename` I have tried to replace 2 or more "/"'s with a single "/" using `$filename =~ s/\/[\/]+/\//;` [download] but the above statement does nothing That last statement is false, which you would have noticed if you had printed the file name before & after `s///` operation. You would have seen that at least first set of consecutive '/' is squashed to single one. To flatten all the rest, all you need to do is globally replace by using `/g` flag ... `# Used a delimiter other than '/' to avoid escaping '/'. $p =~ s!//+!/!g;` [download] As for your second problem, the subdirectory needs to be captured to reference in the pattern (and later in substitution) ... `$p = '/vobs/synergy_core_apps/code/../code/personalize/src'; print 'before: ' , $p , "\n"; # Besides the /g flag, used /x in order to make the pattern stand out +. $p =~ s{ ([^/]+) /[.]{2}/ \1 } /$1/gx; print 'after: ' , $p , "\n";` [download] If the paths being cleansed actually exist on the file system, I would rather use &Cwd::realpath. See also: perlretut, perlre, YAPE::Regex::Explain, perlfaq6, Mastering Regular Expressions, etc.	[reply] [d/l] [select]
Re^2: 2 sed problems by AnomalousMonk (Archbishop) on Mar 28, 2009 at 15:23 UTC
The `s{ ([^/]+) /[.]{2}/ \1 } /$1/gx;` regex only handles the `'foo/../foo'` case; it does not handle something like `'foo/bar/../../baz'`. Purely as a mental exercise, a deprecated regex approach to handling repeated return to a parental directory might be something like: >perl -wMstrict -le "my $parent = qr{ \.\. / }xms; my $dir = qr{ (?> [^/]+ /) (?<! $parent) }xms; print 'output:'; for my $f (@ARGV) { 1 while $f =~ s{ $dir $parent }{}xmsg; print $f } " /vobs/foo/../foo/bar/me/my/../../me/my/moo /vobs/foo/../bar/me/my/../../ma/mo/moo /vobs/x/y/z/../../../x/y/z/filename /vobs/x/y/z/../../../a/b/c/filename /a/../b/c/d/../../e/f/g/h/i/../../../j/k/l/m/n/o/p/../../../../q output: /vobs/foo/bar/me/my/moo /vobs/bar/ma/mo/moo /vobs/x/y/z/filename /vobs/a/b/c/filename /b/e/f/j/k/l/q [download] However, this falls into the class of Stupid Regex Tricks because: I doubt it covers all possible variations of nix path; Even if it covers all such variations (or the subset thereof that the OPer is interested in), extensive testing would be needed to verify this; It does not handle symlinks (see Re^3: 2 substitution problems below); It is not portable to other file systems; And finally, I'm sure there are a few other objections one could think of given a little time. As Amphiaraus said, the preferred solution would be some module like Cwd that would overcome all these objections. Update:* Added symlink objection to list of regex solution objections per parv.	[reply] [d/l] [select]
Re^3: 2 substitution problems by parv (Parson) on Mar 28, 2009 at 16:24 UTC
Given a path actually exists, problem with a solution not actually changing directories is specified in File::Spec/METHODS ... canonpath No physical check on the filesystem, but a logical cleanup of a path. `$cpath = File::Spec->canonpath( $path ) ;` [download] Note that this does not collapse x/../y sections into y. This is by design. If /foo on your system is a symlink to /bar/baz, then /foo/../quux is actually /bar/quux, not /quux as a naive ../-removal would give you. If you want to do this kind of processing, you probably want "Cwd"'s "realpath()" function to actually traverse the filesystem cleaning up paths like this. Otherwise, OP should test\|improve on AnomalousMonk's and/or graff's solutions.	[reply] [d/l]
Re^4: 2 substitution problems by AnomalousMonk (Archbishop) on Mar 28, 2009 at 22:11 UTC
Re^5: 2 substitution problems by parv (Parson) on Mar 29, 2009 at 03:04 UTC
Re: 2 sed problems by graff (Chancellor) on Mar 28, 2009 at 14:36 UTC
TMTOWTDI: `$_ = my $original = "/one/two///three/four/..//fourX/five////six/seven +"; # you can do job 1 alone like this: tr!/!/!s; print "1: $original -> $_\n"; # OR you can do jobs 1 and 2 together like this; my @srcpaths = split m!/+!, $original; my @outpaths = (); for my $p ( @srcpaths ) { if ( $p eq '..' ) { pop @outpaths; } else { push @outpaths, $p; } } $_ = join( "/", @outpaths ); print "2: $original -> $_\n";` [download] The latter is more verbose, but it's easier to see and understand what's being done; also, it will work on things like `/one/two/three/../../two/three` (which the regex solutions suggested above don't handle). If you want, you can even test for existence as you go (with `-d join("/",@outpaths)` ()), and test for paths that are just wrong (too many "/.."): `pop @outpaths or die "Bad path: $original\n";` [download] The regex solutions were assuming you only want to remove the "/.." in cases where the next part of the path matched the previous part of the path, but it seems to me that you really should have a solution that handles all the "/.." cases -- it's entirely possible that you'll run into a lot of different ones. () Update: footnote about `-d join("/",@outpaths)` -- with an absolute path like "/one/two", $p is an empty string on the first iteration, and you don't want to use "-d" on that. Putting full error checks into the loop would be something like this: `my @outpath = (); for my $p ( split m!/+!, $original ) { if ( $p eq '..' ) { pop @outpath or die "Bad path: $original"; } else { if ( @outpath ) { my $chk = join "/", @outpath, $p; die "Path not found: $chk" unless ( -d $chk ); } push @outpath, $p; } } my $goodpath = join "/", @outpath;` [download]	[reply] [d/l] [select]
Re^2: 2 substitution problems by parv (Parson) on Mar 28, 2009 at 16:10 UTC
If the original path started out with `../`, program will die too soon for no good reason. In this case, `../` should be preserved.	[reply] [d/l] [select]
Re^3: 2 sed problems by graff (Chancellor) on Mar 28, 2009 at 16:33 UTC
Yup, good point. And the best way to fix it would be to "use Cwd", which makes all that code unnecessary anyway: `#!/usr/bin/perl use Cwd qw/abs_path/; print abs_path( $_ ) for ( @ARGV );` [download] That prints an empty string (no error message) whenever any non-final path component is non-existent or unreadable (due to permissions).	[reply] [d/l]
Re: 2 sed problems by Anonymous Monk on Mar 28, 2009 at 07:26 UTC
Hi, Try this.. First problem: `$var=~ s/\/+/\//g;` Second Problem: `$var=~s/(.?)\/(.?)\/\.\.\/\2\/(.*)/$1\/$2\/$3/g;`	[reply] [d/l] [select]