kevind0718 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Wise and Kind Monks:

I am trying to build a regular Expression that will extract the data between slashes within a string of course. Played with this for a while, but I have come to a wall.

Please consider this code
my $str = '%/%/ISIN/US1252691001'; $str =~ m{/(.+)/} ; print $str . "\n"; print $1;

produces the following:
C:\KBD\dev>perl -w tstRegExp.pl
%/%/ISIN/US1252691001
%/ISIN
C:\KBD\dev>

I need to get just "ISIN". But I am obviously not smart enough to build the correct Reg Expression.

Your assistance is requested.

kd

Replies are listed 'Best First'.
Re: extract text between slashes
by mwah (Hermit) on Oct 31, 2007 at 17:26 UTC
    You said:
    extract the data between slashes within a string of course

    which might be more than one item.

    my $str = '%/%/ISIN/US1252691001'; my @strings = $str =~ m{(?<=/)[^/]+}g ; print join ', ', @strings, "\n"; print $strings[1];

    Regards

    mwa

      print join ', ', @strings, "\n";

      I'm not sure whether you intended a trailing ', ' in the output. If not, you could use parentheses

      print join(', ', @strings), "\n";

      I might instead tinker with the list separator

      print do { local $" = q{, }; qq{@strings\n}; };

      Cheers,

      JohnGG

        I'm not sure whether you intended a trailing ', ' in the output. If not, you could use parentheses

        .oO you got me ...

        If I'd say now "I didn't care if a trailing comma would show up" nobody would believe me. If I'd say "I tried to make the output of print "@strings\n"; more distinct for a beginner", the same would happen.

        Therefore the best would be to take the blame and admit failure ;-)

        Thanks & Regards

        mwa

Re: extract text between slashes
by halley (Prior) on Oct 31, 2007 at 16:26 UTC
    Your current attempt is good, but the match is greedy. It looks for the longest possible match, not the shortest. Adding a ? after the .+ would work fine. To understand "greedyness," check the perldocs for regular expressions: perlre
    m{/(.*?)/}

    A second issue is if the string has empty slots between slashes, such as the string "%///US1252691001". You probably want to be able to return an empty result in this case, so I changed your use of .+ (one or more) to .* (zero or more) characters. Otherwise, you might get a match back of "/" for strings like my example.

    Update: As others mentioned but I didn't parse correctly, to get the THIRD field (e.g., "~/~/THIS/~") takes a little more work. Instead of a bunch of complicated lookaheads and lookbehinds, or switching to a split() instead, I would just parse through. This has the advantage of easily changing the pattern to capture the other fields if the requirements change.

    m{/.*?/(.*?)/}

    --
    [ e d @ h a l l e y . c c ]

      Adding a ? after the .+ would work fine

      I might be missing something but I don't think that will work as desired. It will return the first item between slashes which is %, not ISIN. I think split might be better here. Something like

      my $str = '%/%/ISIN/US1252691001'; my @elems = split m{/}, $str; my $isin = $elems[2];

      Cheers,

      JohnGG

      Update:

      You need a more complex regex to do this without split using zero-width look-around assertions, an alternation of two look-behinds and a look-ahead with an alternation.

      my @elems = $str =~ m{(?(?<=\A)|(?<=/))(.*?)(?=/|\z)}g;

      Also, keep in mind that he wants the contents of the *second* pair of slashes. Assuming that the first one with the percent sign is static, m{/\%/(.*?)/} might work. otherwise, he could grab all matches and filter out the wrong ones, or split the whole string beforehand:
      # method 1 @matches = $string =~ m{/(.*?)/}g; # method 2 @matches = split m{/}, $string; # print the one you want print $matches[1];

      __________
      Systems development is like banging your head against a wall...
      It's usually very painful, but if you're persistent, you'll get through it.

        Unfortunately, your method 1 isn't going to do the trick because the regex is going to consume %/%/ when doing the first match and the next attempted match is left with ISIN/US1252691001 to work with so the match fails.

        $ perl -le ' > $string = q{%/%/ISIN/US1252691001}; > @matches = $string =~ m{/(.*?)/}g; > print for @matches;' % $

        Cheers,

        JohnGG

      I think we don't know enough about what he's looking for. It was said that he's looking for the text between the second pair of slashes. What if he is looking for the string between the last %/ and the very next / ? I think the input string is not described well enough in the original question. For all I can say, he could be looking for %/%/ as a fixed token, suck out all of the following characters until the first /, but this assumes all his input strings begin with %/%/ followed by what he needs to extract, which may not be a correct assumption.
Re: extract text between slashes
by Anonymous Monk on Oct 31, 2007 at 19:41 UTC
    won't /ISIN/ work for you?
      won't /ISIN/ work for you?

      I'd guess it won't work because that would fail to catch the other candidates like CUSIP, NSIN and probably even more - depending on the intention of the O.P.

      Regards

      mwa