in reply to Trying to find all items in between quotation and speech marks

The basic idea below is using "match global". There will be problems with contractions, like this isn't perfect.. have fun..hope it helps..
#!/usr/bin/perl -w use strict; my $text =""; while (<DATA>) { s/\n/ /; #not graceful way of \n $text = $text.$_; #not either, but not main point... } #main point is to use match global (/g) #think about /m and /s options also #this is just an example of an idea.... # my @quotes = $text =~ m/["'](.*?)["']/g; print join("\n",@quotes),"\n"; __DATA__ "Mary and a little lamb", she said. She thought 'Hang on a tick'. this 'is a line spanning quote' and course not! I really do not know about the really stange things, "But this is another line span quote" Nobody loves me "Mary and a large lamb" this is just nonsense the PM said 'something' __END__ OUTPUT: Mary and a little lamb Hang on a tick is a line spanning quote But this is another line span quote Mary and a large lamb something
  • Comment on Re: Trying to find all items in between quotation and speech marks
  • Download Code

Replies are listed 'Best First'.
Re^2: Trying to find all items in between quotation and speech marks
by AnomalousMonk (Archbishop) on Jan 24, 2009 at 19:25 UTC
    Note that the regex  m/["'](.*?)["']/g (and other, similar regexes in this thread) will match unbalanced quotes; e.g., it will match the substring  q{bar} in the string  q{foo 'bar" baz}.

    Although he or she does not say so, what the OPer probably wants is something to match balanced quotes.

      Ok, there is also the idea of contractions. "don't", etc. another refinement...
      m/(["'])(.+?)\1/;
      If you tag the ["'], then using \1 looks for which ever quote character matched at the beginning. I think some \W also needed in some fashion. For a don't in the middle of a sentence. There may a special case when line beings with no character at all in front of the quote. Not sure what the requirements are when 'foo" is encountered or other non-standard english constructions.
      m/\W(["'])(.+?)\1\W/;
      Its tricky to think of all cases!