Getting Data from Between Two Commas

Dr.Avocado has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Getting Data from Between Two Commas by bart (Canon) on Aug 19, 2007 at 19:53 UTC
You don't say what substring it should come after, and this does smell suspiciously much like homework, but you should definitely take a look at split: `$text = 'So, I went to the $string and I wanted to buy a book, but it +was too expensive,'; @phrases = split /,/, $text;` [download] The phrase you're looking for is one of the item in the `@phrases` array.	[reply] [d/l] [select]
Re: Getting Data from Between Two Commas by bobf (Monsignor) on Aug 19, 2007 at 20:08 UTC
split is likely the most appropriate solution here, as bart mentioned. However, in the spirit of TIMTOWTDI, here is a regex solution: use strict; use warnings; my @text = ( "So, I went to the store and I wanted to buy a book, but it was to +o expensive, so\nI went home", "So, I went to the store and I wanted to buy a book, but it was\nt +oo expensive, so I went home", "So, I went to the store and I wanted to\nbuy a book, but it was t +oo expensive, soI went home", ); my $query = 'store'; foreach my $teststr ( @text ) { $teststr =~ m/ $query # query string [^,]* # zero or more chars that are not a co +mma , # followed by one comma ( # begin capture [^,]* # zero or more chars that are not a co +mma ) # end capture , # followed by one comma /x; print "found: [$1]\n" if $1; } [download] Output: `found: [ but it was too expensive] found: [ but it was too expensive] found: [ but it was too expensive]` [download] I added a few comments to help you understand what was going on, but please take some time to read perlre. It will serve you well in the future. Extending this example to trim extra white space and to find multiple matches in the source text is left as an exercise to the reader. Update: Also, think about how else a comma might be used and how that will affect how you find and process matches in the text. For example, could commas be used as separators in numbers? What if the text read "...buy a book that cost $1,234.00, but..."? If, as bart suspects, this is homework (and I hope it is not), please please please make sure you fully understand how whatever code you use actually works, and the reasons for choosing that method over another. It will come back to bite you if you don't.	[reply] [d/l] [select]
Re: Getting Data from Between Two Commas by GrandFather (Saint) on Aug 19, 2007 at 20:23 UTC
If your document is not huge (say, less than a few 100 megabytes) you could just slurp it into a variable: `my $text = do {local $/; <$InFile>};` [download] After that all you need to do is interpolate the string to be matched into your regex (remembering to use \Q and \E to quote meta characters), skip all non-comma (and probably full stop) characters until the first comma, then capture characters from the first comma to the next. Have a read through perlretut and perlre for the details. DWIM is Perl's answer to Gödel	[reply] [d/l]
Re: Getting Data from Between Two Commas by dsheroh (Monsignor) on Aug 19, 2007 at 21:40 UTC
Splitting on commas is nice and obvious and all, but it seems likely to be simpler to split on $string, ignore the first returned section of text (since it was before the first occurrence of $string), and then use a regex to extract the text between the first and second commas in each of the remaining sections returned by split. Regardless of whether you split on commas or on $string, there's a potential for trouble if $string appears between a pair of commas, but I think that splitting on $string is more likely to do the Right Thing. (Can't say for sure, though, as the correct behaviour in this case isn't clearly defined in the original question.)	[reply]
Re: Getting Data from Between Two Commas by FunkyMonk (Bishop) on Aug 19, 2007 at 22:35 UTC
If I've interpreted the question correctly, this should do the trick: `my $search = "So, I went to the shops and I wanted to buy a book, but it was too expensive, and it sucked. Bleedin' Harry bleedin' Potter, who'd have thought!"; my $start = "shops"; my ( $match ) = $search =~ m/$start.?,(.?),/s; print $match;` [download] Output: `but it was too expensive` [download]	[reply] [d/l] [select]