http://qs1969.pair.com?node_id=869331

iphone has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have log file where I have lines like below.Can anyone help in just getting only the link in the link?I have tried the following but it misses the first line.

($resolution_link) = $match =~ /(http:.*(\d+))/;

Bug found in the build. Please check check https://web.com/fluent/x/JIOUAQ for more details.

Bug found in your build please check http://web.com/fixedbuglink/CR2745 for the fix

............

Replies are listed 'Best First'.
Re: Getting only the link in a line
by Your Mother (Archbishop) on Nov 04, 2010 at 02:08 UTC

    You might also look at URI::Find and friends.

Re: Getting only the link in a line
by Utilitarian (Vicar) on Nov 03, 2010 at 21:56 UTC
    You're potentially matching everything including spaces with the .* if there are numbers in the line after the URL , the first line doesn't have decimal digits \d at the end of the URL and you need to allow for the possibility of SSL (https) There are better (more precise) solutions, but try
    ($resolution_link) = $match =~ {(https?://\S+)};
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      there seems to be some syntax error with the code you provided.I used the below code it worked,but now the problem is there is "."(dot) at the end of somelinks .I need to remove that.How do I do that?

      ($resolution_link) = $match =~ /(https?:\/\/(\S+))/;
        That's why I said there were better solutions available ;), this was a quick hack to solve a specific case.

        One solution would be to remove punctuation at the end of the link in a second pass

        print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Getting only the link in a line
by kcott (Archbishop) on Nov 04, 2010 at 01:11 UTC

    This regex handles the data you've provided and the case where the URL has a terminal ".":

    /(https?:\S+?)[.]?\s/

    -- Ken

      Thanks it worked.Can you pls explain how did it take care of the "."(dot)? Thanks

        I'll give a quick breakdown here. Refer to perlre for details (I've indicated the appropriate sections).

        • You were originally missing your first line because it was https and you'd only specified http. The s? means zero or more 's's (see Quantifiers).
        • \S+? says match all non-whitespace non-greedily which stops it capturing the terminal period if it exists (further down under Quantifiers).
        • [.] stops '.' being a special (match anything) character by placing it in a character class (see Metacharacters).
        • [.]? just says zero or one non-special '.' (that's Quantifiers again).
        • \s at the end anchors the URL (and optional '.') to the whitespace that follows it (see Character Classes and other Special Escapes).

        -- Ken

Re: Getting only the link in a line
by poolpi (Hermit) on Nov 04, 2010 at 10:20 UTC

    See Regexp::Common::URI::http

    #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /URI/; while(<DATA>){ chomp; /$RE{URI}{HTTP}{-keep}/ and print "Contains an HTTP URI.\nHost=$3\n"; } __DATA__ http://192.168.1.10/index.html http://www.delicious.com/search?p=perl&chk=&context=main|&fr=del_icio_ +us&lc=

    Output:
    Contains an HTTP URI.
    Host=192.168.1.10
    Contains an HTTP URI.
    Host=www.delicious.com


    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb