Dr.Avocado has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Masters,
I have a problem I need to solve in making a script that prints the information between the first two commas after a designated string in a text file I'm looking through. For example, if I'm searching:

So, I went to the $string and I wanted to buy a book, but it was too expensive,

would print " but it was too expensive". The commas may be on different lines, and I may need to print several instances of this per document I'm searching through.
Does anyone know a good way of doing this? Thanks in advance for any help.

Replies are listed 'Best First'.
Re: Getting Data from Between Two Commas
by bart (Canon) on Aug 19, 2007 at 19:53 UTC
    You don't say what substring it should come after, and this does smell suspiciously much like homework, but you should definitely take a look at split:
    $text = 'So, I went to the $string and I wanted to buy a book, but it +was too expensive,'; @phrases = split /,/, $text;
    The phrase you're looking for is one of the item in the @phrases array.
Re: Getting Data from Between Two Commas
by bobf (Monsignor) on Aug 19, 2007 at 20:08 UTC

    split is likely the most appropriate solution here, as bart mentioned. However, in the spirit of TIMTOWTDI, here is a regex solution:

    use strict; use warnings; my @text = ( "So, I went to the store and I wanted to buy a book, but it was to +o expensive, so\nI went home", "So, I went to the store and I wanted to buy a book, but it was\nt +oo expensive, so I went home", "So, I went to the store and I wanted to\nbuy a book, but it was t +oo expensive, soI went home", ); my $query = 'store'; foreach my $teststr ( @text ) { $teststr =~ m/ $query # query string [^,]* # zero or more chars that are not a co +mma , # followed by one comma ( # begin capture [^,]* # zero or more chars that are not a co +mma ) # end capture , # followed by one comma /x; print "found: [$1]\n" if $1; }
    Output:
    found: [ but it was too expensive] found: [ but it was too expensive] found: [ but it was too expensive]

    I added a few comments to help you understand what was going on, but please take some time to read perlre. It will serve you well in the future. Extending this example to trim extra white space and to find multiple matches in the source text is left as an exercise to the reader.

    Update: Also, think about how else a comma might be used and how that will affect how you find and process matches in the text. For example, could commas be used as separators in numbers? What if the text read "...buy a book that cost $1,234.00, but..."?

    If, as bart suspects, this is homework (and I hope it is not), please please please make sure you fully understand how whatever code you use actually works, and the reasons for choosing that method over another. It will come back to bite you if you don't.

Re: Getting Data from Between Two Commas
by dsheroh (Monsignor) on Aug 19, 2007 at 21:40 UTC
    Splitting on commas is nice and obvious and all, but it seems likely to be simpler to split on $string, ignore the first returned section of text (since it was before the first occurrence of $string), and then use a regex to extract the text between the first and second commas in each of the remaining sections returned by split.

    Regardless of whether you split on commas or on $string, there's a potential for trouble if $string appears between a pair of commas, but I think that splitting on $string is more likely to do the Right Thing. (Can't say for sure, though, as the correct behaviour in this case isn't clearly defined in the original question.)

Re: Getting Data from Between Two Commas
by GrandFather (Saint) on Aug 19, 2007 at 20:23 UTC

    If your document is not huge (say, less than a few 100 megabytes) you could just slurp it into a variable:

    my $text = do {local $/; <$InFile>};

    After that all you need to do is interpolate the string to be matched into your regex (remembering to use \Q and \E to quote meta characters), skip all non-comma (and probably full stop) characters until the first comma, then capture characters from the first comma to the next.

    Have a read through perlretut and perlre for the details.


    DWIM is Perl's answer to Gödel
Re: Getting Data from Between Two Commas
by FunkyMonk (Bishop) on Aug 19, 2007 at 22:35 UTC
    If I've interpreted the question correctly, this should do the trick:

    my $search = "So, I went to the shops and I wanted to buy a book, but it was too expensive, and it sucked. Bleedin' Harry bleedin' Potter, who'd have thought!"; my $start = "shops"; my ( $match ) = $search =~ m/$start.*?,(.*?),/s; print $match;

    Output:

    but it was too expensive