in reply to extracting a substring from a string - multiple variables

Is the data some sort of home-grown imitation of XML? If it was "real" XML, there wouldn't be a slash before the first close-angle-bracket. (I guess since it isn't real XML, it wouldn't help to recommend an XML parsing module.)

Do you mean something like this?

my $string = '...blah...<file fiop="foo" length="bar"/>baz</file>...bl +ah...'; my ( $foo, $bar, $baz ); if ( $string =~ s{<file fiop="([^"]+)" length="([^"]+)"/>([^<]+)</file +>}{} ) { ( $foo, $bar, $baz ) = ( $1, $2, $3 ); print "extracted $foo, $bar, $baz; left $string\n"; }

Replies are listed 'Best First'.
Re^2: extracting a substring from a string - multiple variables
by walinsky (Scribe) on Oct 27, 2007 at 23:10 UTC
    Actually you hit it right on the spot; it's home-grown XML from Cupertino...
    The baz part is raw binary data, inserted in the XML; that's why I want to extract it before parsing the valid XML.
    I hadn't even noticed the close-angle-bracket (thanks - but it's really there).

    I've tried your code; but it doesn't seem to get me there.
    Any further suggestions ?
      When I run my snippet as posted, I get the following output:
      extracted foo, bar, baz; left ...blah......blah...
      Do you get something different when you run it? Or do you want something different from that?

      When you try to use the "s{...}{}" expression in your own code, is it possible that your "raw binary data" (in "the baz part") might contain a byte value of 0x3C? This would be treated as a "<" character in the regex match, which would cause trouble. Something like this might work better in that case:

      s{<file fiop="([^"]+)" length="([^"]+)"/>(.*?)</file>}{}s
      (update: added the "s" modifier at the end, in case the raw binary stuff might contain a line-feed)

      Note the question mark after ".*" -- that's the important thing that was missing from your initial attempt: it makes the wildcard match non-greedy (stops matching as soon as possible).

        where do I send the flowers ;)

        the 's' modifier at the end did the trick!

        thanks for your continuous effort (and updating your comment ;)
Re^2: extracting a substring from a string - multiple variables
by duff (Parson) on Oct 29, 2007 at 13:46 UTC

    I find it funny that the right solution ("use a parser") is shot down because this isn't exactly XML. Are you sure an HTML parser wouldn't parse it properly? Weird how everyone gets stuck on regex.