Re: How to remove roman numbers
by moritz (Cardinal) on Jun 25, 2012 at 10:29 UTC
|
Why do you want another approach? Is yours not working? If yes, please show the code you've written, the result you got and what you wanted instead.
My first crude approach would be s/\b[IVXLCDM]+\s+//g.
| [reply] [d/l] |
|
|
Thank u mr. moritz for your reply. But Don't you think that if author name start with I , V , M ,etc then it will remove that letter from that name. and author name will be change.
As \w Match a "word" character , \d Match a decimal digit character, i want to know is there any special character
for roman number also
| [reply] |
|
|
But Don't you think that if author name start with I , V , M ,etc then it will remove that letter from that name. and author name will be change.
Why wonder about that if you can simply try?
As \w Match a "word" character , \d Match a decimal digit character, i want to know is there any special character for roman number also
Even if it existed it would only help you if the roman numerals were written with special character, for example Ⅰ U+2160 ROMAN NUMERAL ONE instead of I U+0049 LATIN CAPITAL LETTER I
| [reply] |
|
|
As long as there's no word boundry touched, it won't remove the "I" in the author name; however, it will remove the personal pronoun "I". I'd do something like this:
#!/usr/bin/perl -l
use strict;
use warnings;
my(@data) = q(
1. Iilliam H. Schneider, IV
2. William Vassilakis, II
3. Alessandro Calvi, I
);
foreach my $data (@data) {
$data =~ s/\b[IVXLCDM]+\b//g;
chomp $data;
print "$data\n";
}
I used "Iilliam" instead of "William" for demonstration purposes. | [reply] [d/l] |
|
|
Re: How to remove roman numbers
by roboticus (Chancellor) on Jun 25, 2012 at 11:25 UTC
|
Priti24:
If it's just people's names, then you don't need to do anything particularly heroic. You could have a small hash table of a reasonable range of roman numerals and look for a match in the correct location. Or, if the names are all formatted as in your examples, you could look for a comma followed by a regex. A simplified version would be would be something like: s/, I?V?I*//;. Extending to a larger range is left as an exercise for the reader.
Note: The regex will match some strings that aren't standard Roman numerals, and there's at least one other string it will match that it shouldn't. Generate *plenty* of test cases (especially degenerate cases) to tune your code against.
Have fun with it!
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] [d/l] |
Re: How to remove roman numbers
by zentara (Cardinal) on Jun 25, 2012 at 13:26 UTC
|
#> How do I write a pattern for removing roman numerals? The first 10
+is
#> enough.
#Well, the first ten roman numerals are:
# I, II, III, IV, V, VI, VII, VIII, IX, X
# Just put those in a regex.
s/\b(I|II|...)\b//g;
# would remove roman numerals, provided they aren't touching any word
+
# characters.
| [reply] [d/l] |
Re: How to remove roman numbers
by CountZero (Bishop) on Jun 25, 2012 at 13:45 UTC
|
Regexp::Common::number has all you need to identify Roman numbers.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |