Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

If I have a line like this:

$string = "sp Hello there sp \n Hey hey sp How are you? sp";

How can I get only the content between "sp", so I can get a result like this:

@result = ('Hello there', 'How are you?');

edited: Mon Jun 24 14:39:26 2002 by jeffa - title change

Replies are listed 'Best First'.
Re: Regular expressions
by broquaint (Abbot) on Jun 24, 2002 at 12:46 UTC
    use Data::Dumper; my $string = "sp Hello there sp \n Hey hey sp How are you? sp"; my @result = $string =~ m< [ ]? # optional space sp # literal 'sp' [ ]? # optional space (.*?) # non-greedy capture [ ]? # optional space sp # literal 'sp' [ ]? # optional space >xg; print Dumper(\@result); __output__ $VAR1 = [ 'Hello there', 'How are you?' ];
    That code does the trick, although you may want to make it more generic if you're working on more complex strings. Also checkout using split() if possible.
    HTH

    _________
    broquaint

      Hello,

      I think your Regular Expression is unnecessarily complex. Besides it chokes on a string like this: "sp Hello there! spelling spoiler spooky asp wizard! sp \n Hey hey sp How are you? sp".

      A simpler and better solution would be

      @result = split /\bsp\b/, $string;

      (where b is for boundary)

      --
      Alper Ersoy

        I think your Regular Expression is unnecessarily complex.
        Indeed it is, but it does give the specified output in the root node.
        Besides it chokes on a string like this: "sp Hello there! spelling spoiler spooky asp wizard! sp \n Hey hey sp How are you? sp".
        Unfortunately so which is why I recommended it to be made more generic (i.e not rely on space being around 'sp').
        A simpler and better solution would be
        That would be nice but unfortunately it gives this incorrect output
        $VAR1 = [ '', ' Hello there ', ' Hey hey ', ' How are you? ' ];
        As outlined below it's splitting the string on 'sp' as opposed to grabbing the text between it (as though the first 'sp' were a <sp> and the second a </sp> and so on)
        01 2 3 sp Hello there sp \n Hey hey sp How are you? sp

        _________
        broquaint

      Thanks for your reply, that works great...
Re: Regular expressions
by robobunny (Friar) on Jun 24, 2002 at 12:39 UTC
    although your example result doesn't demonstrate this behavior, i think this is what you want:
    @array = split(/\s*sp\s*/, $string);
    Update: well now that i think about it, that probably isn't want you want at all (it will match sp's that occur inside words). sp probably isn't be best delimiter in the world...
      Split will work fine, but you have to do some funny schtuff. my @strings = map {/\w/ ? /^\s*(.*?)\s*$/ : ()} split /\bsp\b/, $string;
Re: Regular expressions
by flounder99 (Friar) on Jun 24, 2002 at 13:14 UTC
    I think you had better add some word boundries in there to avoid catching words that start or end with "sp".
    use Data::Dumper; $string = "sp Hello there sp \n Hey hey sp How are you? sp I need a ds +p chip and have a spelling test."; @results = $string =~ m/\bsp\s+(.*?)\s+sp\b/g; print Data::Dumper->Dump([\@results]); __OUTPUT__ $VAR1 = [ 'Hello there', 'How are you?' ];

    --

    flounder

Re: Regular expressions
by robobunny (Friar) on Jun 24, 2002 at 12:53 UTC
    aaaahhhh now i see :)
    you can probably do this with a single regular expression, but i believe this will work (assumming there is only a single set of delimiters on each line).
    for(split("\n", $string)) { push @array, (/sp (.*) sp/); }
Re: Regular expressions
by neilwatson (Priest) on Jun 24, 2002 at 12:44 UTC
    m/sp\s([\w|\s|\?|\!]?)\ssp.*sp\s([\w|\s|\?|\!]?)\ssp/i

    So that $1 is "Hello there" and $2 is "How are you" (I think). What's the newline character for? Of course, regular expressions sometimes elude me so this could be wrong. Try it, and see what the other monks say.

    Neil Watson
    watson-wilson.ca

      I just created the account, im the one who posted the message, the thing is that $string its just an example, in reality its a big text with a lot of lines, wich have delimiters...but what I want its to get only the words inside the delimiters, wich can be anything.
        If the Source File you mean does not change very often it might be easier to preprocess that once and convert its contents into a more handy format (plain text file where the line separator, separates your terms, or an xml file)or even save those into a faster format (eg. a BerkleyDB file or aven an RDBMS), so you might save some cpu and memory ressources when processing actaully the data.

        Have a nice day
        All decision is left to your taste