Alessandro has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am trying to use a small perl script where I want to connect to a webpage, download the content and print it. So far so well... the thing is, I don't want to print all the content, but only the sentences that start with a ">". The data look like this:

>protein_name1
blablabla
>protein_name2
blablabla
>protein_name3
blablabla

Here is the code I have:
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; print "Please, enter an IPR number: "; chomp (my $IPR_numb = <STDIN>); my $data = get("http://www.uniprot.org/uniprot/?query=database:(type:i +nterpro+id:IPR$IPR_numb)&format=fasta"); my (@header) = $data =~ /^>(.+)/; print "@header\n";
The problem is, I only get the first sentence starting with ">" and all the others are ignored. So in my example, I only get as output:

>protein_name1".

I would like to have instead:

>protein_name1
>protein_name2
>protein_name3

I don't understand why Perl stops after finding the first match of the regex and don't keep looking for it.
Thanks a lot for your help.

Replies are listed 'Best First'.
Re: Manipulating data retrieved with LWP
by toolic (Bishop) on Nov 20, 2015 at 17:35 UTC
    You want your regular expression to match many times (g) and look past the 1st newline (m):
    use strict; use warnings; my $data = '>protein_name1 blablabla >protein_name2 blablabla >protein_name3 blablabla '; my (@header) = $data =~ /^>(.+)/gm; print "@header\n"; __END__ protein_name1 protein_name2 protein_name3
      I had tried to use "g" but was unaware of "m". Something more learned.
      Thanks a lot.