New Novice has asked for the wisdom of the Perl Monks concerning the following question:

Enlightened Perlmonks,

I want to extract a piece of information from a string. I know that the bit I am interested in is always preceded by a certain sequence of characters. However, the position of the characters changes.

Substr asks for a precise position, which I do not have as it is changing for each string. Using $` to read out the remaining string following the given sequence (as part of an if-clause) is not working either. This seems to pick up a part of my count variable instead of the substring I am interested in.

What wisdom can you offer your humble novice, honoured monks?

Replies are listed 'Best First'.
Re: Alternative to Substr?
by tachyon (Chancellor) on Sep 29, 2004 at 11:59 UTC

    You can do it with an RE or with substr and index

    $str = "abcdefghijklmnopqrstuzwxyz"; $find = "efg"; if ( $str =~ m/\Q$find\E(.*)\z/s ) { printf "Remainder after $find is %s\n", $1; } my $i = index $str, $find; if ( $i != -1 ) { printf "Remainder after $find is %s\n", substr $str, ($i + length( +$find)); }

    cheers

    tachyon

Re: Alternative to Substr?
by JediWizard (Deacon) on Sep 29, 2004 at 11:58 UTC

    It sounds to me like you need a regular expression. Check out perlre and pay special attention to (?<= . . . ). I'd be happy to help with the Regex, but I would need more specific examples of what you are trying to accomplish.

    May the Force be with you
      Hi,

      Thanks for this.

      I am trying to read out the 6 characters following "DosID" in a longer string into a new variable ($dosid). Here is the code using the position. As the position is not always the same (sometimes it starts at 81, sometimes at 44, etc.), I have to use another if-clause to correct the value. This can get rather bothersome, if the position jumps around all the time.

      foreach $input (@input) { if (($input=~'DosID=') && ($count=~/[02468]$/)) { $count2++; $dosid=substr($input,81,6); if ($dosid=~'detai') { $dosid=substr($input,44,6); } if ($dosid=~'"') { chop($dosid); } }
      The two count variables ($count, $count2) are counting the lines of the input file and the number of lines containing the search string "DosID" respectively. I am only interested in every second occurence.

      I tried it with $', but this seems to pick up the value of one of the count variables instead of the remainder of the string.

      Another thing: I would like not having to specify the number of characters of the substring. Instead I would like to use a delimiter as a stop signal (in my example: a " character should delimit the substring). I could use a loop checking every character and then appending it to my variable, but I thought there might be a more elegant way of doing things.

      I guess I am looking for a function like substr which includes regular expressions and if-clauses instead of fixed parameters. Thus, the start of the substring would be defined as "begin after" and the end as "stop before".

        You should really take another (closer) look at perlre (or perhaps perlretut). Particularly the bits about using the match operator (m//).

        my $count = 0; foreach (@input) { if (/DosID(.{6})/ and $count++ % 2) { $dosid = $1; } }

        Of course, this overwrites the value of $dosid each time it finds a match, so you might need to change the logic a bit.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

        You are making hard work of it. Modulus % is good for skipping N items and you can increment at the same time. You have $count and $count2 FWIW -> use strict to get told the error of your ways by Perl.

        my $count = 0; for my $input (@input) { next unless $count++ % 2 == 0; print "Got $1\n" if $input =~ m/DosID=(.{6})/; }
Re: Alternative to Substr?
by TheEnigma (Pilgrim) on Sep 29, 2004 at 12:07 UTC
    Including your code would make it easier to answer your question. But I have to assume your using a regex, not substr, if you reference $`. However, if you are using a regex, $` contains the part of the string before the matched text, not after. Also, $` incurs a performance hit.

    I think you want to do something like this:

    $match #what you want to match $string # the string to extract from if($string =~ /$match(.+)/){ ... do something with $1 ... }

    This assumes that $match only occurs once in the string, and that you want everthing after that point.

    TheEnigma

      This assumes that $match only occurs once in the string

      No it does not. Perl will take the first match.

      and that you want everthing after that point.

      Your RE won't deliver the rest of the string in a very common circumstance as . does not match a newline until you add /s

        Both very good points. But like I said, I had to make some assumptions, not knowing what his data looks like or what he wants to extract from it.

        If $match did occur more than once, I don't know which one he wants to use; maybe he wants to use all of them. That would require a change to my code. So I assumed the simplest scenario. Same with your second point, I was assuming simple lines of text with a newline only at the end.

        TheEnigma

Re: Alternative to Substr?
by Anonymous Monk on Sep 29, 2004 at 12:34 UTC
    $bit = substr((split($data,"\Q$sequence\E"))[1],0,$length); ($bit) = $data =~ /sequence(.{$length})/; $bit = subst($data,index("$sequence") + length("$sequence"),$length) $bit = "$data"; $bit =~ s/.*sequence//; $bit =~ s/(.{$length}).*/$1/; /sequence/; $bit = substr($',0,$length);