Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I have the following strings sample:
human.NT_113898 human.contig.1 human.2 human.IV
What I want to do is to remove word up to first period, and capturing the all after first period. Yielding:
NT_113898 contig.1 2 IV
How come my regex below doesn't work:
/\.?(\S+)/; print "$1\n";
What's the right way to do it?

Replies are listed 'Best First'.
Re: Removing first part of string with regex
by akho (Hermit) on Jan 02, 2009 at 10:08 UTC
    The ? makes the period optional, so your regex matches the whole word. /[^.]*\.(\S*)/ would work.
Re: Removing first part of string with regex
by ikegami (Patriarch) on Jan 02, 2009 at 10:11 UTC
    You made the period optional. Get rid of the question mark.
    /\.(\S+)/; print "$1\n";
Re: Removing first part of string with regex
by linuxer (Curate) on Jan 02, 2009 at 11:02 UTC

    I would use split() (with LIMIT) for this:

    my ( $first, $rest ) = split m{\.}, $line, 2;
Re: Removing first part of string with regex
by pdcawley (Hermit) on Jan 02, 2009 at 14:21 UTC
    Let's rewrite your regular expression with the (?x) flag and comment it shall we? That gives us
    qr{(?smx) \.? # Match the first instance of 0 or 1 one periods ( # then capture \S+ # one or more non whitespace characters ) }
    So, if we follow that recipe on the string human.NT_113898, the matcher looks at the start of the string, sees that the test for 0 or 1 periods succeeds there, so it scarfs up the rest of the string into $1. Which isn't what you wanted. What you actually want depends a little on what you're expecting as input. Assuming that there's always going to be at least one period in the input string, something like
    qr{(?smx) \. # Find a full stop (.*) # and capture everything after it }
    will do. However, f there's the possibility of there not being a period, you might have to do
    qr{(?smx) ^ # From the beginning (?: # In a group... [^.]* # Match any character except period, any number of times \. # followed by a period )? # math the group 0 or 1 times (.*) # then capture everything else }
    If there's a trick to understanding why a regular expression doesn't do what you want, it's to break it down like this and go through each subexpression and explain to yourself what it's trying to match. Most of the time this narrative approach will lead you to your bug and to its fix remarkably quickly.
Re: Removing first part of string with regex
by andreas1234567 (Vicar) on Jan 02, 2009 at 10:05 UTC
    Would /\w*\.(\S+)/ work? See perlre.
    --
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
Re: Removing first part of string with regex
by sathiya.sw (Monk) on Jan 02, 2009 at 13:41 UTC
    /(\.?)(\S+)/; print ">$1<>$2<\n";
    If you have modified your code like the above, then you could be able to understand by yourself that, first group does not matches anything, and it is an optional match.
    All the input are matched by second group.
    Sathiyamoorthy
Re: Removing first part of string with regex
by balakrishnan (Monk) on Jan 02, 2009 at 13:31 UTC
    U can also go by this,
    open(FH,"1"); map { s/^[^.]*\.// && print; } (<FH>);
Re: Removing first part of string with regex
by Anonymous Monk on Jan 02, 2009 at 18:03 UTC
    =) this should be as simple as $string =~ s/^\w+?\.//; Best regards, --les
Re: Removing first part of string with regex
by Anonymous Monk on Jan 06, 2009 at 04:20 UTC
    you can also use this
    @array=('human.NT_113898','human.contig.1','human.2','human.IV'); foreach (@array) { if(/(\w+\.)(\w+|\d+|\.)/){print $2."\n";} }