tpane has asked for the wisdom of the Perl Monks concerning the following question:

How do I extract a substring from the middle of a larger string? For example, if I have a string (one of many in a list with different sizes): WXYZ 100 THIS IS THE STRING I WANT 56.78 0905 and I would like to extract "THIS IS THE STRING I WANT". I can't use substr() as the offsets are not always the same (for example, the first 2 words could be "WXY 10"). I tried substr() with a combination of index(), but couldn't figure out how to do a alpha or numeric wildcard for the search substring. The first two 'words' and the last 2 'words' always have the same characteristics (i.e. all letters or all digits).

Replies are listed 'Best First'.
Re: substring extraction
by tlm (Prior) on Jul 20, 2005 at 23:18 UTC

    I have a hard time understanding your question. Why would you want to "extract" a known substring from another: you already have it! Maybe you meant "remove"? If that's the case, then the simplest thing IMO is to use a regular expression:

    $string =~ s/\Q$substring//;
    Alternatively, you can use substr, index, and length:
    my $offset = index( $string, $substring ); substr( $string, $offset, length $substring, '' ) if $offset > -1;

    the lowliest monk

      The way that I understood the question was that if you consider the string to be broken up into `words' by the spaces, the OP requires all words apart from the first two and the last two.
Re: substring extraction
by gellyfish (Monsignor) on Jul 20, 2005 at 22:29 UTC

    You mean like using the regular expression thingies as described in perlre ?

    /J\

Re: substring extraction
by shemp (Deacon) on Jul 20, 2005 at 22:30 UTC
    Use a regular expression. From your description, it seems that you want everything except the first and last 2 'words'. I assume that by word you mean something without whitespace. This will work for my interpretation of the problem:
    use strict; use warnings; ... my $source_string; # set the value of $source_string to the original string $source_string =~ /^\S+\s+\S+\s+(.+)\s+\S+\s+\S+$/; my $wanted_string = $1;
Re: substring extraction
by ishnid (Monk) on Jul 20, 2005 at 23:00 UTC
    TIMTOWDTI - again assuming that `words' are seperated by whitespace:
    while(<DATA>) { my @pieces = split; my $extracted = join ' ', @pieces[ 2 .. @pieces - 3 ]; print $extracted, "\n"; } __DATA__ WX 200 A string of interest 86.34 0906 WTTS 320 Peer here my peer 17.34 1001 XGR 400 Please take me with you 19.87 1201
Re: substring extraction
by mikeraz (Friar) on Jul 20, 2005 at 22:42 UTC

    Do you want to remove the substring from the larger string or pull it out for processing? I'm getting hung up on your choice of "extract."

    If your data is consistent with:

    WX 200 A string of interest 86.34 0906
    WTTS 320 Peer here my peer 17.34 1001
    XGR 400 Please take me with you 19.87 1201
    

    You could:
    while(<STRING_SRC>) {
      /[A-Za-z]+ [0-9]+ (.*)?[0-9.]+ [0-9]+/;
      $dotoit = $1;
      do_func $dotoit;
    }
    
    The regular expression assigns your string to $1. If you need to extract, as in pull out, the value then
    while(<STRING_SRC>) {
      /[A-Za-z]+ [0-9]+ (.*)?[0-9.]+ [0-9]+/;
      $dotoit = $1;
      s/$dotoit//;
      fund_do $dotoit;
    }
    

    Be Appropriate && Follow Your Curiosity