Re: string matching

Use URI. URIs are more complicated than they seem and this makes handling them easier. Here's an example with a couple of surprise cases that show why/how a regular expression can be much more difficult.

use strict;
use warnings;
use URI;

URI:
for my $raw ( <DATA> )
{
    my $uri = URI->new($raw);

    if ( $uri->scheme ne "https" )
    {
        warn "$uri is not secure, skipping\n";
        next URI;
    }
    if ( $uri->path =~ m,/\z, )
    {
        warn "$uri has a trailing slash, skipping\n";
        next URI;
    }
    print "GOOD: $uri\n";
}

__DATA__
http://perlmonks.org/?node_id=825405
https://gmail.com
https://gmail.com/
https://perlmonks.org/?
https://mail.google.com/mail/#inbox
[download]

Comment on Re: string matching Download Code

Replies are listed 'Best First'.
Re^2: string matching by ungalnanban (Pilgrim) on Feb 27, 2010 at 09:53 UTC
We can match this requirement in single line. Example: `use strict; use warnings; open(FH,"data"); foreach ( <FH>){ if ( $_ =~ m/^https.*[^\/]\n$/ ) { print $_; } }` [download]	[reply] [d/l]
Re^3: string matching by Your Mother (Archbishop) on Feb 27, 2010 at 20:25 UTC
I think you missed the point and an `i` modifier. `while ( <DATA> ) { print if /\Ahttps.*[^\/]\n\z/; } __DATA__ http://perlmonks.org/?node_id=825405 HTTPS://gmail.com https://gmail.com/ httpsux https://perlmonks.org/? https://mail.google.com/mail/#inbox` [download] Gives these which are either completely invalid or "end" with a trailing slash since the fragment and the empty query string are irrelevant to the URI path. `httpsux https://perlmonks.org/? https://mail.google.com/mail/#inbox` [download] If you know for a fact that your data set is simple/normalized enough, you could use a straightforward regular expression. URI is simple and robust however so not using it is just sloth and it will eventually bite you or the dev who inherits your code. Trusting input data to be well-formed is risky and only appropriate in one-offs.	[reply] [d/l] [select]