To Match the text

amexmythili has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: To Match the text by AnomalousMonk (Archbishop) on Mar 22, 2009 at 14:08 UTC
Here is an approach using split and avoiding a substitution: `>perl -wMstrict -le "my $var= 'sclerosing 1954, 5-7, 54, 59f-60d, 90, 114'; my (undef, $tail) = split m{\s* , \s*}xms, $var, 2; my $tag = qq{<t> $tail </t>}; print $tag; " <t> 5-7, 54, 59f-60d, 90, 114 </t>` [download] If you feel you must use a regex substitution, read perlre as suggested by Perlbotics, and also perlretut and perlrequick, and take a look at the regex tutorials on this site: Pattern Matching, Regular Expressions, and Parsing.	[reply] [d/l]
Re: To Match the text by johngg (Canon) on Mar 22, 2009 at 19:01 UTC
How about using split with a third argument limiting the resultant fields to two, splitting on the first comma and space that is followed by a digit. It would of course fail if the item itself contained <comma><space><digit>. Here, I printf'ed both fields so you can see where the demarcation lies. `use strict; use warnings; while( <DATA> ) { chomp; my( $item, $index ) = split m{,\s(?=\d)}, $_, 2; printf qq{%-42s%s\n}, $item, $index; } __END__ sclerosing 1954, 5-7, 54, 59f-60d, 90, 114 cribriform carcinoma, invasive, 89, 91-94, 112 comedo-type DCIS, 25-26 comedo-type necrosis, LCIS with, 55, 59, 65 complex sclerosing lesions (radial scar), 8-9, 54, 59, 90` [download] Here is the output. `sclerosing 1954 5-7, 54, 59f-60d, 90, 114 cribriform carcinoma, invasive 89, 91-94, 112 comedo-type DCIS 25-26 comedo-type necrosis, LCIS with 55, 59, 65 complex sclerosing lesions (radial scar) 8-9, 54, 59, 90` [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re: To Match the text by Perlbotics (Archbishop) on Mar 22, 2009 at 12:00 UTC
Hi. What have you tried? Basically regular expressions (perlre) and split are likely the key to the solution of your problem.	[reply]
Re^2: To Match the text by amexmythili (Novice) on Mar 22, 2009 at 12:16 UTC
Hi i tried the below but it's cating the number 1954. I want to catch the "5–7, 54, 59f-60d, 90, 114" this only. $var=~s#\b(\d(?:a-z{1})?\-?\d(?:a-z{1})?)\b#'<t>'.$1.'</t>'#ges; How to do this?	[reply]
Re^3: To Match the text by Perlbotics (Archbishop) on Mar 22, 2009 at 14:00 UTC
Hi. Please put your code fragments between `<code>...</code>` tags. Then, your RE becomes `$var=~s#\b(\d(?:[a-z]{1})?\-?\d(?:[a-z]{1})?)\b#'<t>'.$1.'</t>'#ges;`. Note that the `[..]` becomes a link when used outside code sections. From what you describe, I guess you want to do a match only, but your code fragment tells something different (substitution). I guess you want to match the text after the first comma (maybe without the surrounding whitespaces?), so you could try `#... # alt1: match only, leave $var unmodified: my $var = "sclerosing 1954, 5–7, 54, 59f-60d, 90, 114"; if ( $var =~ /^[^,]+,\s(.+?)\s$/ ) { print "Match: ==>$1<==\n"; # ==>5–7, 54, 59f-60d, 90, 114<== } # alt2: add <t>markup</t>, substitute $var2: my $var2 = "sclerosing 1954, 5–7, 54, 59f-60d, 90, 114"; if ( $var2 =~ s{^([^,]+,\s)(.+?)(\s)$}{$1<t>$2</t>$3} ) { print "Markup: ==>$var2<==\n"; # ==>sclerosing 1954, <t>5–7, 54, 59f-60d, 90, 114</t><== } #...` [download] Hi i tried the below but it's cating the number 1954. Here, your code fragment returns: `5–7, <t>54</t>, <t>59f</t>-<t>60d</t>, <t>90</t>, 114` ??? It is hard to guess what you really want, so if the explanation above doesn't help, please explain your problem more clearly. E.g. by providing examples of what your input is and what output you expect (a few lines each within `<code>..</code>` tags). BTW: You don't need the `e`-modifier. Instead of `s{(pattern)}{'<t>'.$1.'</t>'}e` a simple `s{(pattern)}{<t>$1</t>}` would be sufficient.	[reply] [d/l] [select]
Re^4: To Match the text by Anonymous Monk on Mar 22, 2009 at 17:12 UTC
Re^3: To Match the text by graff (Chancellor) on Mar 22, 2009 at 14:15 UTC
First, when you post perl code, put "<c>" or "<code>" at the beginning of the code, and "</c>" or "<code>" at the end, so that it will display correctly. Your regex has a few problems: try putting "+" after each "\d" in the captured region, so that you can match "54" as well as "5"; you are using the "e" modifier at the end, which says that the replacement string should be executed as a snippet of perl code, but your replacement string is not perl code; you say you want to capture a whole string like "5-7, 54, 59f-60d, 90, 114", but your regex (when it works) will only capture the pieces between commas, and return these as a list. If you just want to remove the initial part, that's a simpler job: `$var = "sclerosing 1954, 5–7, 54, 59f-60d, 90, 114"; $var =~ s/^\S+\s+\d+,\s//; # remove first word and digit string` [download] If you also want the remainder to be broken into separate pieces, use split: `my @pieces = split /,\s/, $var;` [download] In fact, split by itself could do both things at once: `$var = "sclerosing 1954, 5–7, 54, 59f-60d, 90, 114"; my ( $junk, @pieces ) = split /,\s*/, $var;` [download] (updated to fix last snippet)	[reply] [d/l] [select]