Re^2: extracting a substring from a string

Replies are listed 'Best First'.
Re^3: extracting a substring from a string - multiple variables by mwah (Hermit) on Oct 28, 2007 at 00:24 UTC
Ohh, if there is some binary within the tag and if the "length" field says sth. about its length you could easily construct a regex that extracts binary data of that length: my $binary = pack 'F', (3.141592) x 10; # make binary vector of len +gth 80 bytes my $string = '...blah...<file fiop="foo" length="' . length($binary) +.'"/>' . $binary . '</file>...blah...'; my ($fiop, $length, $data) = $string =~ m{<file # tag anchor \s+ fiop="([^"]+)" # (fiop) \s+ length="([^"]+)" # (length) /> # end: start file tag ((??{ "\\C{$2}" })) # self modifying regex for +binary stuff </file> # end: file tag }sx; print "$fiop, $length (data comes below)\n"; print join ',', unpack("F", $data); # extract binary data again (my $notags = $string) =~ s{<file.+</file>}{}; print "\n$notags\n"; [download] In the above I pack a binary sequence of 10 Pi-Numbers (double, 10 x 8 bytes) into the tag, match a binary sequence of its length ($2) and unpack it afterwards. Regards mwa	[reply] [d/l]
Re^4: extracting a substring from a string - multiple variables by walinsky (Scribe) on Oct 28, 2007 at 01:07 UTC
seems like graff was faster again ;) I did like your approach though! I'm not quite sure if (in your solution) $binary is known - in my case I'm handling POSTed data, where $length is declared in the data, and $binary just sits between the <file/> tags. Still; if $length and $binary could be extracted, unpack sounds more logical to me.	[reply]
Re^5: extracting a substring from a string - multiple variables by mwah (Hermit) on Oct 28, 2007 at 08:39 UTC
walinsky: in my case I'm handling POSTed data, where $length is declared in the data, and $binary just sits between the <file/> tags. If thats so you definitely can't use any approach other than blindly extracting a byte sequence of given length (as in Example 2) because the data might at some point contain the sequence `\x00€µ</file>³á>>~` which would break your program otherwise (if you'd use the regex like *`... =~m{<file>.?</file>} ...`**). Regards mwa	[reply] [d/l] [select]
Re^4: extracting a substring from a string - multiple variables by walinsky (Scribe) on Oct 28, 2007 at 12:41 UTC
Seems like we're getting somewhere now; the code throws an error though, when applying the regex: Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/\\C{ <-- HERE 304507}/ any idea ?	[reply]
Re^5: extracting a substring from a string - multiple variables by mwah (Hermit) on Oct 28, 2007 at 16:19 UTC
Hmmm .. you are hitting the "quantifier length limit" of your perl implementation (which should be 0xffff) (?). (1) How long is your binary chunk at all (above message says "304507" - dooh!) and (2) what number is in the `... length="xxx" ...` field? Really that large? update: How to read arbitary big binary chunks from within regular expressions ... You could advance until you hit the data (after the closing of the start tag) and simply read the data that follow. This implies you have one ... `...<file>..</file> ...` entry per string at this point. ... my $binary = pack 'F', (3.141592) x 8001; # this will dump a 64K+ bi +nary chunk my $string = '...blah...<file fiop="foo" length="' . length($binary) +.'"/>' . $binary . '</file>...blah...'; my ($fiop, $length, $data); if( $string =~ m{<file \s+ fiop="([^"]+)" \s+ length="([^"]+)" />}gx +) { ($fiop, $length) = ($1, $2); # extract tag prop +erties as usual $data = substr $string, pos($string), $length # extract data by +direct string copy } print "$fiop, $length\n"; print join ',', unpack("F", $data); (my $notags = $string) =~ s{<file.+</file>}{}; print "\n$notags\n"; ... [download] Regards mwa	[reply] [d/l] [select]
Re^6: extracting a substring from a string - multiple variables by walinsky (Scribe) on Oct 28, 2007 at 18:57 UTC
Re^7: extracting a substring from a string - multiple variables by graff (Chancellor) on Oct 28, 2007 at 20:30 UTC
Re^7: extracting a substring from a string - multiple variables by mwah (Hermit) on Oct 28, 2007 at 20:25 UTC
Re^7: extracting a substring from a string - multiple variables by walinsky (Scribe) on Oct 28, 2007 at 21:46 UTC