in reply to if/else loop prints extra values

G'day myfrndjk,

You have a number of issues with your regex.

($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;

As $domain is assigned at the start for each iteration, and used within the body of the loop, it would make sense to this get this part fixed first.

I'm guessing that, if the input was "www.example.com.au", the expected output should be either "example.com.au" or "example.com". Please clarify. (FYI: your code produces "example.co", see below.)

[Please take a look at the guidelines in "How do I post a question effectively?" for information on useful materials to include with your post. (In this specific instance, sample input and expected output would have been on the list.)]

This test code:

#!/usr/bin/env perl -l use strict; use warnings; my $url = 'www.example.com.au'; my ($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x; print '$domain=[', defined $domain ? $domain : '<undef>', ']';

produces this output:

$domain=[example.co]

Adding these additional lines of code:

my ($alpha_num, $any_three, $final_dot) = $url =~ m|www.([A-Z a-z 0-9]+)(.{3})(.)|x; print '$alpha_num=[', defined $alpha_num ? $alpha_num : '<undef>', ']' +; print '$any_three=[', defined $any_three ? $any_three : '<undef>', ']' +; print '$final_dot=[', defined $final_dot ? $final_dot : '<undef>', ']' +;

and the output now shows which parts of the regex are capturing which parts of the domain:

$alpha_num=[example] $any_three=[.co] $final_dot=[m]

The dot ('.') (meta)character is special in regexes: matching any character except newline [including newline if the \s modifier is used].

You seem to have used it, expecting a literal dot, in "m|www.". I'm not sure what's intended with ".{3}).", hence the request for clarification earlier. Anyway, this problem needs fixing.

You also have a problem with spaces in "[A-Z a-z 0-9]". I suspect this is the result of a misunderstanding about the \x modifier:

"/x tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a character class." [my emphasis]

Decide whether you want domains with spaces or not; modify the character class to have no spaces or just one space.

[See also: perlrequick, perlretut, perlre, strict, warnings, autodie, open()]

-- Ken

Replies are listed 'Best First'.
Re^2: if/else loop prints extra values
by AnomalousMonk (Archbishop) on Jun 28, 2014 at 13:05 UTC
    ... issues with your regex.

    ($domain) = $url =~ m|www.([A-Z a-z 0-9]+.{3}).|x;

    kcott: Hi Ken! The discussion stemming from Re: Perl prints only last line of array indicates the regex in question is working just fine for myfrndjk although I share your puzzlement as to how it could. Anyhoo... ++ for a valiant explanatory effort.

      G'day AnomalousMonk,

      [Sorry for the late reply. As you may have noticed, I haven't been here much in recent weeks. Various other things are taking up my time at the moment — none of which are any cause for concern :-)]

      I found it curious that mangling 5 valid URLs would result in 3 different valid URLs.

      I found it curiouser that those mangled URLs all turned out to be shopping sites.

      I found it curiousest that some of those names sounded familiar: perhaps from "Nodes To Consider".

      -- Ken

Re^2: if/else loop prints extra values
by myfrndjk (Sexton) on Jun 28, 2014 at 09:14 UTC

    thanks for your suggestion