strfry() has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
i was wondering, what kind of regex (or split? or both (: ) would i need to use to match "anything.whatever" ? i mean, i want to match the "anything", that ends in .whatever, and return only "anything"..

example:
my @i = "l-12345.in.some.domain.com";

I want to return only "domain", but only if it ends in .<something> (dot something) (:

PS: I'd also like to find a way to match that pattern, no matter what comes after it.. like
my @i = "l-12345.in.some.domain.com/blargh/index.html";
and only return "domain", once again.
thanks!

strfry()

edit mirod: changed the title

Replies are listed 'Best First'.
Re: pattern matching
by davorg (Chancellor) on Jun 06, 2001 at 17:45 UTC

    This matches the longest string of word characters that is followed by '.com',

    my $domain; if ($str =~ /(\w+)\.com/) { $domain = $1; } else { # no match $domain = ''; }
    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: pattern matching
by japhy (Canon) on Jun 06, 2001 at 17:55 UTC
    Well, putting aside your desire to match the "important" part of a domain name, you probably want to use a regex that says "match and save a set of non-. characters that are followed by a ., then non-. and non-/ characters, and then either a / or the end of the string".

    Basically, you want to ensure you're a) only looking at the domain name, and b) getting the penultimate .-separated sequence.

    It would probably be more intuitive to use two split()s:
    ($domain) = split '/', $string; $wanted = (split /\./, $domain)[-2]; # or $wanted = (split /\./, (split '/', $string)[0])[-2];
    Here's the regex approach:
    ($wanted) = $string =~ m{ ( [^.]+ ) # save the non-. sequence to $1 \. # . [^./]+ # the final non-. non-/ sequence (?: / | $) # / or the end of the string }x;


    japhy -- Perl and Regex Hacker
      hmm i used part of your code in a subroutine, and it's giving me the error "Use of uninitialized value at ./index.cgi line 29."
      here's the function:
      sub getd { my $string = @_; my $wanted; my $domain; ($domain) = split '/', $string; $wanted = (split /\./, $domain)[-2]; return $wanted; } my $variable = "www.google.com"; print &getd($variable); # this is line 29.

      any ideas?

      strfry()
        You're not doing any rudimentary data-checking, or you'd see that my $string = @_ was assigning a number to your variable.
        # try one of these: my ($string) = @_; my $string = shift; my $string = $_[0];


        japhy -- Perl and Regex Hacker
      yes yes yes yes! that's it! thank you! (:
      now all i have to do is fiddle with it until i understand exactly what's taking place hehe
      gracias

      strfry()
Re: pattern matching
by mirod (Canon) on Jun 06, 2001 at 17:47 UTC

    you can use the ($result)= ($string=~ m/pattern/); idiom this way:

    #!/usr/bin/perl -w use strict; while( my $string=<DATA>) { my( $domain)= ($string=~ m{(?:^|\.) # the beginning of the s +tring or . ([^.]*) # anything but . (and st +ore it in $1) \.com # .com (?:\/|$) # a / or the end of the s +tring }x); print "domain: $domain\n"; } __DATA__ l-12345.in.some.domain.com l-12345.in.some.domain.com/blargh/index.html domain.com/blargh/index.html domain.com domain.com/ l-12345.in.some.domain.com/blargh/foo.com nope.com.domain.com/blargh/foo.com l-12345.in.some.domain.com/blargh/nope.foo.com
Re: pattern matching
by tachyon (Chancellor) on Jun 06, 2001 at 18:09 UTC

    This will work and allows domains like foo-bar to capture which using \w does not. This regex looks to the left of the .com and stops grabbing chars at the first dot or forward slash.

    Also note that you assign to @i which is an array rather than $i which is a scalar and what you probably had in mind.

    tachyon

    my $i = "l-12345.in.some.domain.com/blargh/index.html"; my ($result) = $i =~ m|([^./]+)\.com|; print $result; # if you want to allow several endings like .com .gov etc my ($result) = $i =~ m#([^./]+)\.(?:com|org|gov|edu|etc)#;
      hmm but i want to be able to match it with *anything*...disregarding named compliancies
      (eg: www.google.somethingnotnormal)
      mainly because i'd like to know how, not because i need it (the regex you just showed me works perfectly, actually, and i'm really greatful and all... but i'm curious) (:
Re: pattern matching
by shotgunefx (Parson) on Jun 06, 2001 at 17:52 UTC
    first problem is your assigning a scalar to an an array

    Should be my $i = "l-12345.in.some.domain.com";

    Second, your question seems a bit vague, do you know which two things you need to match?

    If so you should be able to say

    $i =~/($first)\.($second)/;


    Now $first is in $1 and $second is in $2 if they where found.

    Of course this doesn't take into account boundaries.
    If $first = "dog" and $second ="com" this will match fogdog.com as well. You need to determine what your boundaries are going to be.

    -Lee

    "To be civilized is to deny one's nature."
      well, say i have www.google.com; i want to match "google".. not "www", and not "com"..
      another example.. if i have ww2.mirror.google.com, i still want it to match only "google"
      does that help any?

      strfry()