jacques has asked for the wisdom of the Perl Monks concerning the following question:

This has been dogging me for some time. Here is the data:

$data = 'Win32::TieRegistry(Delimiter=>"/")'; $data = "Acme::Beatnik"; $data = "CGI qw(:all)"; $data = "Win32::API::Type"; $data = "DBI";
I want to match the module name (bareword) and not the list, which may or may not be present as in the last scalar. The problem is the double colons. Here was my first attempt:

$data =~ /(\w+.*?)(\s*|\()/;

The problem is the dot symbol doesn't match a colon. So here was my second attempt:

$data =~ /(\w+[.:]*?)(\s*|\()/;

Still doesn't match the colon. Third attempt:

$data =~ /(\w+[.*:*]?)(\s*|\()/;

But this only matches the first colon. Getting closer! So I thought: I want to match every alphanumeric character or double colons followed by more alphanumeric characters (and possibly more double colons followed by ...), so maybe I should use the pipe symbol. Fourth attempt:

$data =~ /(\w+(.|::.*)*?)(\s*|\()/;

That's when I realized this is getting silly. Should I be using look aheads?

Replies are listed 'Best First'.
Re: Regexp Question
by simonm (Vicar) on Aug 12, 2003 at 23:40 UTC
    Even simpler: $data =~ /([\w\:]+)(.*)/;
      I saw that sneaky update. I guess I was thinking too hard without first properly examining the problem.

      Thanks.

Re: Regexp Question
by hossman (Prior) on Aug 12, 2003 at 23:45 UTC

    First of all...

    The problem is the dot symbol doesn't match a colon. So here was my second attempt:

    ...that's not true, . will match a ":" -- but you've said match zero or more of any character, and be non-greedy about it, so it is happily matching "zero" characters for the .*?, and zero white-space charaters for the \s*, and calling it a day.

    Second of all: you said you "want to match the module name (bareword) and not the list" -- but it really looks like what you ment is that you want to "capture" the module name in $1, your regexps are making effor to match on other things after the name.

    It seems like the most straight forward way to accomplish what you wnat is to ignore the multitudes of ways that a list might be put on the end, and just look for what you want: word characters, or colons...

    bester:~> perl -le 'print $1 if "Win32::TieRegistry(Delimiter=>\"/\")" + =~ /([\w:]+)/;' Win32::TieRegistry

    ...if you have some reason to really be strict about only accepting words seperated by double-colons, then be strict...

    bester:~> perl -le 'print $1 if "Win32::Tie::Reg:istry(Delimiter=>\"/\ +")" =~ /(\w+(::\w+)+)/;' Win32::Tie::Reg
    bester:~> perl -le 'print $1 if "Win32::Tie::Reg:istry(Delimiter=>\"/\ +")" =~ /(\w+(::\w+)*)/;' Win32::Tie::Reg

    UPDATE: Albannach pointed out that there probably should be a * in my second example to make it work on package names without any colons at all, like CGI.

      It seems like the most straight forward way to accomplish what you wnat is to ignore the multitudes of ways that a list might be put on the end, and just look for what you want: word characters, or colons...

      Good point.

Re: Regexp Question
by Aristotle (Chancellor) on Aug 13, 2003 at 01:17 UTC
    #!/usr/bin/perl -wl print / ( \w+ (?: :: \w+ )* ) /x while <DATA> __END__ Win32::TieRegistry(Delimiter=>"/") Acme::Beatnik CGI qw(:all) Win32::API::Type DBI

    Makeshifts last the longest.