Capturing text between literal string delimiters (was: Regular expressions)

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regular expressions by broquaint (Abbot) on Jun 24, 2002 at 12:46 UTC
`use Data::Dumper; my $string = "sp Hello there sp \n Hey hey sp How are you? sp"; my @result = $string =~ m< [ ]? # optional space sp # literal 'sp' [ ]? # optional space (.*?) # non-greedy capture [ ]? # optional space sp # literal 'sp' [ ]? # optional space >xg; print Dumper(\@result); __output__ $VAR1 = [ 'Hello there', 'How are you?' ];` [download] That code does the trick, although you may want to make it more generic if you're working on more complex strings. Also checkout using `split()` if possible. HTH `_________ broquaint`	[reply] [d/l]
Re: Re: Regular expressions by aersoy (Scribe) on Jun 24, 2002 at 13:05 UTC
Hello, I think your Regular Expression is unnecessarily complex. Besides it chokes on a string like this: "sp Hello there! spelling spoiler spooky asp wizard! sp \n Hey hey sp How are you? sp". A simpler and better solution would be `@result = split /\bsp\b/, $string;` [download] (where b is for boundary) -- Alper Ersoy	[reply] [d/l]
Re: Re: Re: Regular expressions by broquaint (Abbot) on Jun 24, 2002 at 13:21 UTC
I think your Regular Expression is unnecessarily complex. Indeed it is, but it does give the specified output in the root node. Besides it chokes on a string like this: "sp Hello there! spelling spoiler spooky asp wizard! sp \n Hey hey sp How are you? sp". Unfortunately so which is why I recommended it to be made more generic (i.e not rely on space being around 'sp'). A simpler and better solution would be That would be nice but unfortunately it gives this incorrect output `$VAR1 = [ '', ' Hello there ', ' Hey hey ', ' How are you? ' ];` [download] As outlined below it's splitting the string on 'sp' as opposed to grabbing the text between it (as though the first 'sp' were a `<sp>` and the second a `</sp>` and so on) `01 2 3 sp Hello there sp \n Hey hey sp How are you? sp` [download] `_________ broquaint`	[reply] [d/l] [select]
Re: Re: Re: Re: Regular expressions by aersoy (Scribe) on Jun 24, 2002 at 13:48 UTC
Re: Re: Regular expressions by kidd (Curate) on Jun 24, 2002 at 12:53 UTC
Thanks for your reply, that works great...	[reply]
Re: Regular expressions by robobunny (Friar) on Jun 24, 2002 at 12:39 UTC
although your example result doesn't demonstrate this behavior, i think this is what you want: `@array = split(/\ssp\s/, $string);` [download] Update: well now that i think about it, that probably isn't want you want at all (it will match sp's that occur inside words). sp probably isn't be best delimiter in the world...	[reply] [d/l]
Re: Re: Regular expressions by joealba (Hermit) on Jun 24, 2002 at 13:49 UTC
Split will work fine, but you have to do some funny schtuff. `my @strings = map {/\w/ ? /^\s(.?)\s*$/ : ()} split /\bsp\b/, $string;`	[reply] [d/l]
Re: Regular expressions by flounder99 (Friar) on Jun 24, 2002 at 13:14 UTC
I think you had better add some word boundries in there to avoid catching words that start or end with "sp". `use Data::Dumper; $string = "sp Hello there sp \n Hey hey sp How are you? sp I need a ds +p chip and have a spelling test."; @results = $string =~ m/\bsp\s+(.*?)\s+sp\b/g; print Data::Dumper->Dump([\@results]); __OUTPUT__ $VAR1 = [ 'Hello there', 'How are you?' ];` [download] -- flounder	[reply] [d/l]
Re: Regular expressions by robobunny (Friar) on Jun 24, 2002 at 12:53 UTC
aaaahhhh now i see :) you can probably do this with a single regular expression, but i believe this will work (assumming there is only a single set of delimiters on each line). `for(split("\n", $string)) { push @array, (/sp (.*) sp/); }` [download]	[reply] [d/l]
Re: Regular expressions by neilwatson (Priest) on Jun 24, 2002 at 12:44 UTC
`m/sp\s([\w\|\s\|\?\|\!]?)\ssp.*sp\s([\w\|\s\|\?\|\!]?)\ssp/i` So that $1 is "Hello there" and $2 is "How are you" (I think). What's the newline character for? Of course, regular expressions sometimes elude me so this could be wrong. Try it, and see what the other monks say. Neil Watson watson-wilson.ca	[reply] [d/l]
Re: Re: Regular expressions by kidd (Curate) on Jun 24, 2002 at 12:48 UTC
I just created the account, im the one who posted the message, the thing is that $string its just an example, in reality its a big text with a lot of lines, wich have delimiters...but what I want its to get only the words inside the delimiters, wich can be anything.	[reply]
Re: Re: Re: Regular expressions by little (Curate) on Jun 24, 2002 at 12:59 UTC
If the Source File you mean does not change very often it might be easier to preprocess that once and convert its contents into a more handy format (plain text file where the line separator, separates your terms, or an xml file)or even save those into a faster format (eg. a BerkleyDB file or aven an RDBMS), so you might save some cpu and memory ressources when processing actaully the data. Have a nice day All decision is left to your taste	[reply]