comment on

Though you might be able to parse this with a fancy regex, a simple state machine would in my opinion be easier to read and less complicated to write and debug. If I understand your post correctly, your string has four tokens:

quoted strings - opaque between the quotes
runs of non-whitespace/non-close parenthesis that begin with anything but an open parenthesis or double quote, /[^)("][^)\s]*/.
runs of whitespace used as a separator between the first two types of tokens
parenthesized strings that may contain any of the first three types of tokens.

Assuming there are no parenthesized tokens within parenthesized tokens, you could use something like this:

use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;

  # store tokens other than separators
  my @aTokens;

  # state: are we inside or outside of a parenthesized token?
  my $bParen;
  my $sInParens='';

  while ($line =~ /("[^"]+"|\(|\)|[^)\s]+|\s+)/g) {
    my $sToken = $1;
    if ($sToken eq '(') {
      #starting a parenthesized token
      $bParen=1;
    } elsif ($sToken eq ')') {
      #ending a parenthesized token: add it to the list
      $bParen=0;
      push @aTokens, "($sInParens)";
      $sInParens='';
    } elsif ($bParen) {
      # in the middle of a parenthesized token
      $sInParens .= $sToken;
    } elsif ($sToken =~ /^\S/) {
      # not a parenthesized token
      # either a quoted or unquoted non-whitespace token
      # add it to the list
      push @aTokens, $sToken;
    }
  }
  local $"='> <';
  printf "input : %s\n%s", "<$line>", "tokens: <@aTokens>";
}

__DATA__
xxx "()" ("charset" "ISO-8859-1") (")") "xxx"
[download]

If you also need parenthesized tokens within parenthesized tokens, they the loop is only slightly more complicated. You would need to change the flag $bParen to a counter that was incremented for each '(' and decremented for each ')' found. You would then build the token until $iParenCount returned to 0. Parentheses within quotes will have no effect on this count because the "[^"] run insures that only parentheses outside of quotes will get parsed into separate tokens:

use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;
  my @aTokens;
  my $sInParens='';
  my $iParenCount;

  while ($line =~ /("[^"]+"|\(|\)|[^)\s]*|\s+)/g) {
    my $sToken = $1;
    if ($sToken eq '(') {
      if ($iParenCount) {
        $sInParens .= $sToken;
      }
      $iParenCount++;
    } elsif ($sToken eq ')') {
      $iParenCount--;
      if ($iParenCount) {
        $sInParens .= $sToken;
      } else {
        push @aTokens, "($sInParens)";
        $sInParens='';
      }
    } elsif ($iParenCount) {
      $sInParens .= $sToken;
    } elsif ($sToken =~ /^\S/) {
      push @aTokens, $sToken;
    }
  }
  local $"='> <';
  print "paren count: $iParenCount\n";
  printf "input : %s\n%s", "<$line>", "tokens: <@aTokens>\n";
}

__DATA__
xxx "()" ("charset" "ISO-8859-1") (")") "xxx" ((a)(b)(c)) yyy
[download]

Best, beth

Update: added some discussion about handling nested parenthesized tokens.

Update: Fixed overly greedy regex

In reply to Re: Extracting a parenthesized fragment from a string by ELISHEVA
in thread Extracting a parenthesized fragment from a string by fce2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.