tdruttenberg has asked for the wisdom of the Perl Monks concerning the following question:

My string is in this format

'DHEC(optional dot)(unspecified number of alphanumerics'

for example:

$if_descr = 'DHEC.177628'
or
$if_descr = 'DHEC177628'

I want to grab everything but the dot and assign it to $circuit_id. I've been trying to use a lookaround to grab the string before and after the dot, but have been unsuccessful.

Here is the latest thing I tried:

($circuit_id) = $if_descr =~ / .*? ( DHEC (?: \.? ) [^\s]+ ) /xims;

I've also tried simply

($circuit_id) = $if_descr =~ /.*? ( DHEC (?=\.) [^\s]+ ) /xims;

No dice.

How do I do this?

TDR

Replies are listed 'Best First'.
Re: regexp help -- grab almost a whole string
by ikegami (Patriarch) on May 19, 2009 at 18:08 UTC
    my $circuit_id = ( $if_descr =~ /([^.]*)(?:\.(.*))?/ ? $1 . (defined($2) ? $2 : '') : undef );
    But it can be shortened to
    (my $circuit_id = $if_descr) =~ s/\.//;
Re: regexp help -- grab almost a whole string
by moritz (Cardinal) on May 19, 2009 at 18:06 UTC
    if ($if_descr =~ m/DHEC\.?(\w+)/) { print "'$1'\n"; $circuite_id = $1; }

    Nice and simple, no need for look-arounds whatsoever.

Re: regexp help -- grab almost a whole string
by kennethk (Abbot) on May 19, 2009 at 18:09 UTC
    Since your desired capture is essentially two strings, you would have to join the two halves together after the capture. How about:

    $circuit_id = join q{}, $if_descr =~ /([^.]*)\.?([^.]*)/;

    Easier, though, is just to do:

    $circuit_id = $if_descr; $circuit_id ~= s/\.//;
Re: regexp help -- grab almost a whole string
by tdruttenberg (Initiate) on May 19, 2009 at 19:31 UTC
    What I need is a single regexp that will get the result the first time. It is to be returned from a subroutine kinda like this:
    sub get_regexp { if ($blah) $regexp = qr {}; elsif ($blahblah) $regexp = qr {}; }

    calling program

    $REGEXP = get_regexp($ip); ($circuit_id) = $if_descr =~ $REGEXP;

    I want to get the correct result with one regexp without having to do any cleanup after the fact (ie removing the dot in a different step).

    There must be a way to do this? If not, I will simply leave the dot in.

    TDR

      Since your strings are separated, they must be captured in separate buffers and there is no way to join them with just a matching regular expression. If you change the calling program to implement the join solution I suggested above, it will not break existing behavior, except that failed matches will store a null string ("") in $circuit_id in place of an undef (both of which still evaluate false). If you cannot change the calling program, then I see no solution.

      $REGEXP = get_regexp($ip); $circuit_id = join q{}, $if_descr =~ $REGEXP;

      I think I did similar mistake while designing new test system at work. The assumption was, that every value can be tested and checked with a regexp.

      $regexp = ...; if($value !~ /$regexp/) { $err .= "wrong value ($value)" }

      Everything was fine. And then came the issue with integer range, i.e. value must be integer between 0 and 12800... Now I think about redesigning the engine to accept sub reference instead of regexp:

      $subref = ...; if(! &$subref($value)) { $err .= "wrong value ($value)" }

      because the only reasonable way of fixing it is to abandon assumption about "regexp can match even the universe" (pity, I like regexps...)

      I'm not certain why it must be a "single regexp", but this will do the job in a single line of execution:

      if ($if_desc =~ /(DHEC)\.?(\w+)/) {$circuit_id = "$1$2"};

      Would this satisfy your requirement?

      Blessings,

      ~Polyglot~