Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Regex for ignoring paths

by Amblikai (Scribe)
on Oct 31, 2018 at 11:12 UTC ( [id://1224976]=perlquestion: print w/replies, xml ) Need Help??

Amblikai has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, i'm having a problem coming up with a nice regex for a quick script i'm writing

Essentially i'm parsing a file for certain pieces of information, and contained in the file is a unix path which triggers my regex

probably best explained with simplified code:

my $line=do { local $/=undef; <DATA>; }; my @substrings=$line=~/(\w+\.\w+)/g; print "$_\n" foreach(@substrings); __DATA__ Path to file: /users/me/foo.baz/filename.ext my_content(word.other)

Which obviously gives me:

foo.baz filename.ext word.other

The problem i have is that i'd like to only pick up "word.other"

I've tried: /(?<!\/)(\w+\.\w+) but that just (rightly) gives me "ord.other" and "oo.baz" etc

Obviously i could strip out the paths first etc but it seems like there should be a nice way to do it in a regex, i just can't think of one. Any thoughts? Thanks!

Replies are listed 'Best First'.
Re: Regex for ignoring paths
by haukex (Archbishop) on Oct 31, 2018 at 11:21 UTC

    As always with regexes, a single example is not really enough. For example, are the strings you're looking for always in parentheses? Is it really safe to assume that you don't want any matches that come after a slash? And so on. See Re: How to ask better questions using Test::More and sample data.

    Anyway, a negative lookbehind could work, as long as you set the conditions right: (?<![\/\w])(\w+\.\w+) (live demo)

      Yeah i guess i really over-simplified it with the example

      My workaround was to look for only occurrences where the regex was in parentheses, which was fine.

      However i couldn't guarantee that they would only ever occur in parentheses, whereas i could guarantee that i never wanted a match with a preceding backslash, so that seemed like the elegant approach which to me is better practice.

      Anyway, long story short, you answered my question! I can't believe i missed having the first part of the regex in the negative look behind. Seems so simple now!

      Thanks for your help!

Re: Regex for ignoring paths
by harangzsolt33 (Chaplain) on Oct 31, 2018 at 14:48 UTC
    I would do something like this :
    use strict; use warnings; my $fullname = '/windows/system32/cmd.exe'; my $p = rindex($fullname, '/'); my $filename = ++$p ? substr($fullname, $p) : $fullname; print "$fullname\n\n$filename"; exit;
    This is a simple & fast solution.
      how does this answer the question?
        He wants to extract file names from paths. I just want to point out that there is an easy way to do this without using regex.
Re: Regex for ignoring paths
by dbuckhal (Chaplain) on Oct 31, 2018 at 18:05 UTC

    Another simple, fast solution, based on a snippet found on page 23 of Dominus's book, Higher-Order Perl:

    sub short { my $path = shift; $path = ˜ s{.*/}{}; $path; }
    ...or as a callback:
    my $short = sub { my $path = shift; $path =~ s{.*/}{}; $path; };
    Callback example:
    perl -Mstrict -we ' + my $dir = shift or die "missing dir name...\n"; die "not a directory\n" unless -d $dir; my $short = sub { my $path = shift; $path =~ s{.*/}{}; $path; }; sub dosub { my $_dir = shift; opendir my $dh, $_dir or die "could not open $_dir\n"; while ( my $file = readdir($dh) ) { next if $file eq "." || $file eq ".."; if ( -d "$_dir/$file" ) { dosub ("$_dir/$file"); } else { print "full: $_dir/$file\n"; print "shortened: ", $short->("$_ +dir/$file"), "\n\n\n"; } } } dosub($dir); ' temp01 __output__ full: temp01/subtemp01/subsubtemp01/file01 shortened: file01 full: temp01/subtemp01/subsubtemp02/file02 shortened: file02 full: temp01/subtemp01/subsubtemp03/file03 shortened: file03 full: temp01/subtemp01/subsubtemp04/file04 shortened: file04

    Edit: shortened output a bit...

      how does this help the OP?
        Stripping the path? ...or did I misinterpret the OP? If so, then move along, not much to see here... :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1224976]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2024-04-23 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found