propellerhat has asked for the wisdom of the Perl Monks concerning the following question:
I have several hundred text files in which I need to copy a 3-digit serial number from a representation in Arabic numerals ([0-9]) to a representation in English (['zero' - 'nine']).
Thus, article number "345" needs also the label "threefourfive"; article number "004" needs also the label "zerozerofour".
The serial number appears in a single instance in the text of each file with the label "No.", as in "No. 345".
The English representation is a LaTeX command, prefixed by "\" as in "\threefourfive". As a placeholder for the English representation, each file contains the string "\zerozerozero".
If it can be useful, it happens that the serial number appears also in the filename, as in "abstract-345.tex".
The English representation allows (using the LaTeX package "catchfile") a single article title to be maintained in a separate file, so that it may be used in several documents (catalogue, abstract, article).
I do not know how to approach this; perhaps using a substitution with "s///" ? In occurs to me that matching with the greedy modifier "/g" could also match against ordinary English words in the text files.
Re: multiple-pass search?
by choroba (Cardinal) on Dec 09, 2021 at 19:01 UTC
|
Yes, substitution is the right tool. You can use Lingua::EN::Numbers to turn digits into words.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Lingua::EN::Numbers qw{ num2en };
my $text = 'abc No. 347 xyz';
$text =~ s/No\. \K(\d+)/join "", "\\", map num2en($_), split m{}, $1/g
+e;
print $text; # abc No. \threefourseven xyz
I used /e which evaluates the replacement part as code. The regex matches "No. " followed by a number, but replaces just the number due to \K. It splits the number into digits, replaces each with the word (via num2en) and joins them together with a \ at the beginning.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
Re: multiple-pass search?
by jdporter (Paladin) on Dec 09, 2021 at 20:29 UTC
|
use Tie::File;
sub replace_serialnumbers_in_file($)
{
my @word = qw( zero one two three four five six seven eight nine )
+;
my $filename = shift;
my $serno = join '', map $word[$_], $filename =~ /(\d)/; # assumin
+g no other digits in the filename
tie my @lines, 'Tie::File', $filename or die;
s/\\zerozerozero/\\$serno/g for @lines;
}
You don't need multipass if you take the serial number from the filename.
I reckon we are the only monastery ever to have a dungeon staffed with 16,000 zombies.
| [reply] [d/l] |
|
use File::Find;
my $dir = "documents";
find( sub {
my $filename = $_;
return unless ( $filename =~ /abstract-([0-9][0-9][0-9]).tex/
&& -f $filename );
my $serialnumber = $1 ;
| [reply] [d/l] |
|
So I take it the filename will have the exact pattern abstract-NNN.tex.
If so, the regex you gave is too broad. It will match, for example, nonabstract-000stexts.
You need to anchor the beginning and end, and escape the dot: /^abstract-(\d{3})\.tex$/
| [reply] [d/l] [select] |
|
|
|
|
Re: multiple-pass search?
by LanX (Saint) on Dec 09, 2021 at 20:22 UTC
|
this should get you started, I kept it flexible so that you can adjust it.
DB<49> sub english_num { my ($pre,$num) = @_; my $eng = join "-", ma
+p {(qw/zero one two three four five six seven \
eight nine/)[$_] } split //,$num; return "$pre \\$eng"}
DB<50> $txt =" some text No. 345 other text No. 123 end text"
DB<51> $txt =~ s/(No.) (\d{3})/english_num($1,$2)/ge
DB<52> say $txt
some text No. \three-four-five other text No. \one-two-three end text
edit
In case you are sure that it's always exactly 3 digits, you can also use a hardwired regex, with a lookup array
s/(No.) (\d)(\d)(\d)/$1 \\$nums[$2]-$nums[$3]-$nums[$4]/g
DB<94> $_ =" some text No. 345 other text No. 123 end text"
DB<95> p
some text No. 345 other text No. 123 end text
DB<96> s/(No.) (\d)(\d)(\d)/$1 \\$nums[$2]-$nums[$3]-$nums[$4]/g
DB<97> p
some text No. \three-four-five other text No. \one-two-three end text
update
after reading the OP again, please provide an SSCCE clarifying input and expected output.
| [reply] [d/l] [select] |
|
This is about the best I can do by way of providing a SSCCE:
1) files:
a) title files (one title per file):
title-001.tex
title-002.tex
...
title-999.tex
b) catchfile index (one file of a thousand lines; a thousand titles
+ is about three or four times the number needed):
\CatchFileDef{\zerozerozero}{title-000.tex}{}
\CatchFileDef{\zerozeroone}{title-001.tex}{}
...
\CatchFileDef{\nineninenine}{title-999.tex}{}
c) document files (several categories, having same title):
article-001.tex
article-002.tex
...
article-003.tex
abstract-001.tex
abstract-002.tex
...
abstract-003.tex
catalogue-001.tex
catalogue-002.tex
...
catalogue-003.tex
2) In the head of each document file is a placeholder for the English
+representation of the serial number of the title: "\zerozerozero". I
+f the placeholder is not useful, I can delete it.
3) In the head of each document file is the serial number of the title
+, in Arabic representation: "No. 345".
4) The serial number of the title appears also in the filename of the
+document file: "article-345".
5) The objective is to write in the document file the English represen
+tation of the serial number of the title: "\threefourfive".
6) Once the English representations are in place, I can use Perl to ma
+ke necessary adjustments.
| [reply] [d/l] |
|
Unfortunately you have not provided so much as a single line of Perl here. As such it is impossible to know at which point you are encountering a problem, let alone what that problem is.
Here is the sort of SSCCE you could have written:
use strict;
use warnings;
use Test::More tests => 3;
my $filename = 'abstract-345.tex';
my $have = <<'EOT';
foo
Here: \zerozerozero
bar
No. 345
baz
EOT
my $want = <<'EOT';
foo
Here: \threefourfive
bar
No. 345
baz
EOT
my @digits = qw/zero one two three four five six seven eight nine/;
my ($arabic) = $filename =~ /-([0-9]{3})\.tex/;
(my $eng = $arabic) =~ s/([0-9])/$digits[$1]/g;
$have =~ s/\\zerozerozero/\\$eng/;
is $arabic, '345', 'Digits extracted';
is $eng, 'threefourfive', 'Converted to English';
is $have, $want, 'Replaced in text';
Now you can see how to perform these three operations. If that doesn't solve your problem you need to provide some runnable code which demonstrates the problem which you are having (ideally with a test such as shown here). In that way we will know what it is you are actually asking.
There's a detailed rationale at How to ask better questions using Test::More and sample data.
| [reply] [d/l] |
Re: multiple-pass search?
by jwkrahn (Abbot) on Dec 10, 2021 at 07:00 UTC
|
In occurs to me that matching with the greedy modifier "/g"
From perlop:
g Match globally, i.e., find all occurrences.
| [reply] |
|
|