Manipulating data retrieved with LWP

Alessandro has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am trying to use a small perl script where I want to connect to a webpage, download the content and print it. So far so well... the thing is, I don't want to print all the content, but only the sentences that start with a ">". The data look like this:

>protein_name1
blablabla
>protein_name2
blablabla
>protein_name3
blablabla

Here is the code I have:

#!/usr/bin/perl

use strict; 
use warnings; 
use LWP::Simple;

print "Please, enter an IPR number: ";
chomp (my $IPR_numb = <STDIN>);
my $data = get("http://www.uniprot.org/uniprot/?query=database:(type:i
+nterpro+id:IPR$IPR_numb)&format=fasta");
my (@header) = $data =~ /^>(.+)/; 
print "@header\n";
[download]

The problem is, I only get the first sentence starting with ">" and all the others are ignored. So in my example, I only get as output:

>protein_name1".

I would like to have instead:

>protein_name1
>protein_name2
>protein_name3

I don't understand why Perl stops after finding the first match of the regex and don't keep looking for it.
Thanks a lot for your help.

Comment on Manipulating data retrieved with LWP Download Code

Replies are listed 'Best First'.
Re: Manipulating data retrieved with LWP by toolic (Bishop) on Nov 20, 2015 at 17:35 UTC
You want your regular expression to match many times (g) and look past the 1st newline (m): `use strict; use warnings; my $data = '>protein_name1 blablabla >protein_name2 blablabla >protein_name3 blablabla '; my (@header) = $data =~ /^>(.+)/gm; print "@header\n"; __END__ protein_name1 protein_name2 protein_name3` [download] perlre http://www.bioperl.org/wiki/Main_Page?	[reply] [d/l]
Re^2: Manipulating data retrieved with LWP by Alessandro (Acolyte) on Nov 20, 2015 at 18:10 UTC
I had tried to use "g" but was unaware of "m". Something more learned. Thanks a lot.	[reply]