extract text between slashes

kevind0718 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: extract text between slashes by mwah (Hermit) on Oct 31, 2007 at 17:26 UTC
You said: extract the data between slashes within a string of course which might be more than one item. `my $str = '%/%/ISIN/US1252691001'; my @strings = $str =~ m{(?<=/)[^/]+}g ; print join ', ', @strings, "\n"; print $strings[1];` [download] Regards mwa	[reply] [d/l]
Re^2: extract text between slashes by johngg (Canon) on Oct 31, 2007 at 19:36 UTC
`print join ', ', @strings, "\n";` I'm not sure whether you intended a trailing `', '` in the output. If not, you could use parentheses `print join(', ', @strings), "\n";` [download] I might instead tinker with the list separator `print do { local $" = q{, }; qq{@strings\n}; };` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^3: extract text between slashes by mwah (Hermit) on Oct 31, 2007 at 20:10 UTC
I'm not sure whether you intended a trailing ', ' in the output. If not, you could use parentheses .oO you got me ... If I'd say now "I didn't care if a trailing comma would show up" nobody would believe me. If I'd say "I tried to make the output of print "@strings\n"; more distinct for a beginner", the same would happen. Therefore the best would be to take the blame and admit failure ;-) Thanks & Regards mwa	[reply]
Re: extract text between slashes by halley (Prior) on Oct 31, 2007 at 16:26 UTC
Your current attempt is good, but the match is greedy. It looks for the longest possible match, not the shortest. Adding a `?` after the `.+` would work fine. To understand "greedyness," check the perldocs for regular expressions: perlre `m{/(.?)/}` [download] A second issue is if the string has empty slots between slashes, such as the string `"%///US1252691001"`. You probably want to be able to return an empty result in this case, so I changed your use of `.+` (one or more) to `.` (zero or more) characters. Otherwise, you might get a match back of `"/"` for strings like my example. Update: As others mentioned but I didn't parse correctly, to get the THIRD field (e.g., `"~/~/THIS/~"`) takes a little more work. Instead of a bunch of complicated lookaheads and lookbehinds, or switching to a `split()` instead, I would just parse through. This has the advantage of easily changing the pattern to capture the other fields if the requirements change. `m{/.?/(.?)/}` [download] -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re^2: extract text between slashes by johngg (Canon) on Oct 31, 2007 at 16:44 UTC
Adding a ? after the .+ would work fine I might be missing something but I don't think that will work as desired. It will return the first item between slashes which is `%`, not `ISIN`. I think split might be better here. Something like `my $str = '%/%/ISIN/US1252691001'; my @elems = split m{/}, $str; my $isin = $elems[2];` [download] Cheers, JohnGG Update: You need a more complex regex to do this without `split` using zero-width look-around assertions, an alternation of two look-behinds and a look-ahead with an alternation. `my @elems = $str =~ m{(?(?<=\A)\|(?<=/))(.*?)(?=/\|\z)}g;` [download]	[reply] [d/l] [select]
Re^2: extract text between slashes by EvanK (Chaplain) on Oct 31, 2007 at 17:00 UTC
Also, keep in mind that he wants the contents of the second pair of slashes. Assuming that the first one with the percent sign is static, `m{/\%/(.?)/}` might work. otherwise, he could grab all matches and filter out the wrong ones, or split the whole string beforehand: `# method 1 @matches = $string =~ m{/(.?)/}g; # method 2 @matches = split m{/}, $string; # print the one you want print $matches[1];` [download] __________ Systems development is like banging your head against a wall... It's usually very painful, but if you're persistent, you'll get through it.	[reply] [d/l] [select]
Re^3: extract text between slashes by johngg (Canon) on Oct 31, 2007 at 19:50 UTC
Unfortunately, your method 1 isn't going to do the trick because the regex is going to consume `%/%/` when doing the first match and the next attempted match is left with `ISIN/US1252691001` to work with so the match fails. `$ perl -le ' > $string = q{%/%/ISIN/US1252691001}; > @matches = $string =~ m{/(.*?)/}g; > print for @matches;' % $` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^2: extract text between slashes by RaduH (Scribe) on Oct 31, 2007 at 17:21 UTC
I think we don't know enough about what he's looking for. It was said that he's looking for the text between the second pair of slashes. What if he is looking for the string between the last %/ and the very next / ? I think the input string is not described well enough in the original question. For all I can say, he could be looking for %/%/ as a fixed token, suck out all of the following characters until the first /, but this assumes all his input strings begin with %/%/ followed by what he needs to extract, which may not be a correct assumption.	[reply]
Re: extract text between slashes by Anonymous Monk on Oct 31, 2007 at 19:41 UTC
won't /ISIN/ work for you?	[reply]
Re^2: extract text between slashes by mwah (Hermit) on Oct 31, 2007 at 20:00 UTC
won't /ISIN/ work for you? I'd guess it won't work because that would fail to catch the other candidates like `CUSIP`, `NSIN` and probably even more - depending on the intention of the O.P. Regards mwa	[reply] [d/l] [select]