Regex for ignoring paths

Amblikai has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, i'm having a problem coming up with a nice regex for a quick script i'm writing

Essentially i'm parsing a file for certain pieces of information, and contained in the file is a unix path which triggers my regex

probably best explained with simplified code:

my $line=do {
  local $/=undef;
  <DATA>;
};
my @substrings=$line=~/(\w+\.\w+)/g;
print "$_\n" foreach(@substrings);
__DATA__
      Path to file: /users/me/foo.baz/filename.ext
my_content(word.other)
[download]

Which obviously gives me:

foo.baz
filename.ext
word.other
[download]

The problem i have is that i'd like to only pick up "word.other"

I've tried: /(?<!\/)(\w+\.\w+) but that just (rightly) gives me "ord.other" and "oo.baz" etc

Obviously i could strip out the paths first etc but it seems like there should be a nice way to do it in a regex, i just can't think of one. Any thoughts? Thanks!

Comment on Regex for ignoring paths Select or Download Code

Replies are listed 'Best First'.
Re: Regex for ignoring paths by haukex (Archbishop) on Oct 31, 2018 at 11:21 UTC
As always with regexes, a single example is not really enough. For example, are the strings you're looking for always in parentheses? Is it really safe to assume that you don't want any matches that come after a slash? And so on. See Re: How to ask better questions using Test::More and sample data. Anyway, a negative lookbehind could work, as long as you set the conditions right: `(?<![\/\w])(\w+\.\w+)` (live demo)	[reply] [d/l]
Re^2: Regex for ignoring paths by Amblikai (Scribe) on Oct 31, 2018 at 13:34 UTC
Yeah i guess i really over-simplified it with the example My workaround was to look for only occurrences where the regex was in parentheses, which was fine. However i couldn't guarantee that they would only ever occur in parentheses, whereas i could guarantee that i never wanted a match with a preceding backslash, so that seemed like the elegant approach which to me is better practice. Anyway, long story short, you answered my question! I can't believe i missed having the first part of the regex in the negative look behind. Seems so simple now! Thanks for your help!	[reply]
Re: Regex for ignoring paths by harangzsolt33 (Chaplain) on Oct 31, 2018 at 14:48 UTC
I would do something like this : `use strict; use warnings; my $fullname = '/windows/system32/cmd.exe'; my $p = rindex($fullname, '/'); my $filename = ++$p ? substr($fullname, $p) : $fullname; print "$fullname\n\n$filename"; exit;` [download] This is a simple & fast solution.	[reply] [d/l]
Re^2: Regex for ignoring paths by Anonymous Monk on Oct 31, 2018 at 20:39 UTC
how does this answer the question?	[reply]
Re^3: Regex for ignoring paths by harangzsolt33 (Chaplain) on Nov 01, 2018 at 02:44 UTC
He wants to extract file names from paths. I just want to point out that there is an easy way to do this without using regex.	[reply]
Re^4: Regex for ignoring paths by Anonymous Monk on Nov 01, 2018 at 06:25 UTC
Re: Regex for ignoring paths by dbuckhal (Chaplain) on Oct 31, 2018 at 18:05 UTC
Another simple, fast solution, based on a snippet found on page 23 of Dominus's book, Higher-Order Perl: `sub short { my $path = shift; $path = ˜ s{./}{}; $path; }` [download] ...or as a callback: `my $short = sub { my $path = shift; $path =~ s{./}{}; $path; };` [download] Callback example: perl -Mstrict -we ' + my $dir = shift or die "missing dir name...\n"; die "not a directory\n" unless -d $dir; my $short = sub { my $path = shift; $path =~ s{./}{}; $path; }; sub dosub { my $_dir = shift; opendir my $dh, $_dir or die "could not open $_dir\n"; while ( my $file = readdir($dh) ) { next if $file eq "." \|\| $file eq ".."; if ( -d "$_dir/$file" ) { dosub ("$_dir/$file"); } else { print "full: $_dir/$file\n"; print "shortened: ", $short->("$_ +dir/$file"), "\n\n\n"; } } } dosub($dir); ' temp01 __output__ full: temp01/subtemp01/subsubtemp01/file01 shortened: file01 full: temp01/subtemp01/subsubtemp02/file02 shortened: file02 full: temp01/subtemp01/subsubtemp03/file03 shortened: file03 full: temp01/subtemp01/subsubtemp04/file04 shortened: file04 [download] Edit:* shortened output a bit...	[reply] [d/l] [select]
Re^2: Regex for ignoring paths by Anonymous Monk on Oct 31, 2018 at 18:12 UTC
how does this help the OP?	[reply]
Re^3: Regex for ignoring paths by dbuckhal (Chaplain) on Oct 31, 2018 at 19:26 UTC
Stripping the path? ...or did I misinterpret the OP? If so, then move along, not much to see here... :)	[reply]


"be consistent"
	PerlMonks