MiriamH has asked for the wisdom of the Perl Monks concerning the following question:

I think this might be a basic issue, but I don't know what to do.

I have a massive list of data that summarizes different events at my university. I need to print out all the titles of the events. The data is structured in that it says "summary" and gives title and then "description" immediately after. This happens at least 50x. I have a script that will print out the title, but only for the FIRST time it seems "summary" and "description". How do I make it continue through the data?

if($string=~m/.+?(SUMMARY.+?DESCRIPTION).+?$/i){ print 'Summary: ',$1,$/; ## prints title of event }

Replies are listed 'Best First'.
Re: Foreach Loop in Data Scraping
by davido (Cardinal) on Jun 19, 2012 at 17:19 UTC

    It could be as simple as this:

    while( $string =~ m/(SUMMARY.+?DESCRIPTION)/ig ) { print "Summary: $1\n" }

    The while() loop continues looping as long as its condition is true. In this case, the condition is whether or not a match occurs. And by using the /g modifier, multiple matches can be detected within the same string. Each time a match occurs the internal position marker advances so that the next match will come after the current match. perlre and perlretut describe the use of the /g modifier in greater detail.

    I removed the leading and trailing instances of .+?, because I don't think they were actually doing anything useful for you, and they potentially get in the way of repeated matches (the /g modifier). As you're probably aware, .+? says to non-greedily match one or more characters. If your only reason for that is to assure that there is at least one character separating matches, you could just do it with a single dot to the left and right of the capture. We're not trying to anchor to anything outside of the capture anyway, from what I can tell.


    Dave

      Good point. I guess I was over thinking it. Thank you!

      But your code doesn't cut off after the description, it just breaks the code into lines that each start with the event title. Definitely a step in the right direction!

Re: Foreach Loop in Data Scraping
by jwkrahn (Abbot) on Jun 19, 2012 at 17:21 UTC

    You need to use a while loop, like this:

    while ( $string =~ /(SUMMARY.+?DESCRIPTION)/ig ) { print "Summary: $1\n"; ## prints title of event }
Re: Foreach Loop in Data Scraping
by ckj (Chaplain) on Jun 20, 2012 at 07:02 UTC
    MiriamH, I think you just want to display all the titles through regular expression, so to find out multiple data you should use while/foreach loop. For this I'll go for while loop, try your code this way:
    while($string=~m/SUMMARY(.*?)DESCRIPTION/gi){ print 'Summary: ',$1; ## prints title of event }
    Let me know if you face any problem after this also
      I did what you said, but it only gave me the first event, however there are 15+ events that I want to capture. I indexed the words summary and description and then began printing out what is between the indexes, however I am putting the numbers in manually each time. Is there a way for Perl to automatically go to the next index? (Instead of $My Sum1 and $My Sum2 being specific numbers)
      my $substr = 'SUMMARY'; my $offset = 0; my $result = index($string, $substr, $offset); while ($result != -1) { print "$result\n"; $offset = $result + 1; $result = index($string, $substr, $offset);} my $SUM1= "298"; my $SUM2="877"; my $length = $SUM2-$SUM1; my $fragment = substr $string, 298, $length; #print " string: <$string>\n"; print " <$fragment>\n";