comment on

FWIW and just as a matter of interest, the reason your OPed regex
my @strings = $data =~ /\"[^\"]+\"/g;
was "... extracting almost every line..." may be because it will not handle an empty (i.e., zero-length) string properly: the [^\"]+ regex sub-expression requires at least one non-double-quote character. If there is any "" empty string in the text, parsing would get "out of sync" by taking the end quote of the empty quote as the start of the spurious body of a quote.

use warnings;
use strict;

use Data::Dump qw(dd);

my $data = do { local $/; <DATA> };

my @strings = $data =~ /\"[^\"]+\"/g;
dd \@strings;

__DATA__
nothing
"hello"
foo "bar" quz
"hello2" "world"
foo2 "bar2" quz2 "baz" blah
blah2 "" blah3
many
lines
of
unquoted stuff
"example 1 for instance"
[download]

Output:

c:\@Work\Perl\monks\kepler>perl extract_double_quote_bodies_2.pl
[
  "\"hello\"",
  "\"bar\"",
  "\"hello2\"",
  "\"world\"",
  "\"bar2\"",
  "\"baz\"",
  "\" blah3\nmany\nlines\nof\nunquoted stuff\n\"",
]
[download]

Note that [^"] "not a double-quote" includes the newline character.

Update: Also note that /"[^"]+"/g and /"[^"]*"/g will not properly handle a double-quoted string containing an escaped double-quote (e.g., "x\"y") and will end up "out of sync" in the same way as /"[^"]+"/g with an empty string.

Give a man a fish: <%-{-{-{-<

In reply to Re: Extract pattern match from file by AnomalousMonk
in thread Extract pattern match from file by kepler

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.