kemuel has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to convert a huge textfile into "OSIS XML markup". One thing that i have to do is replace all the qotation-marks with tags so that:
He said, »Someone once said ›This is what they said‹ but I say something else.«
is turned into something like:
He said, <q marker="»" sID="1234" />Someone once said <q marker="›" sID="3456"/>This id what they said<q marker="‹" eID="3456"/> but I say something else.<q marker="«" eID="1234"/>
I wrote the following code and it works. But it works extremely slow. Especially with the huge multiline textfiles i feed it:
#!/usr/bin/perl use warnings; use Getopt::Long; my $input =""; my $output =""; my $datamat =""; my $doodle =""; GetOptions ('infile=s' => \$input, 'outfile=s' => \$output) or die $!; open my $in_fh, '<', $input or die "Can't open $input: $!"; open my $out_fh, '>', $output or die "Can't open $output: $!"; while (<$in_fh>){ $datamat .=$_ } my $i = 0; while ($datamat=~ m/»(.*?)«/gs) { $i++; $datamat=~ s/»(.*?)«/<q marker="»" sID="$i"\/>$1<q marker="«" eID= +"$i"\/>/s; } while ($datamat=~ m/›(.*?)‹/gs) { $i++; $datamat=~ s/›(.*?)‹/<q marker="›" sID="$i"\/>$1<q marker="‹" eID= +"$i"\/>/s; } print { $out_fh } $datamat or die $!; close $in_fh or die $!; close $out_fh or die $!;
This script took about an hour to work through one of my files
Is there a way to do this that is more effective and not so time/CPU-demanding?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Replace quotation-marks with tags in a huge text-file
by Eily (Monsignor) on Sep 12, 2015 at 14:52 UTC | |
by kemuel (Novice) on Sep 12, 2015 at 15:51 UTC | |
|
Re: Replace quotation-marks with tags in a huge text-file
by AnomalousMonk (Archbishop) on Sep 12, 2015 at 15:25 UTC | |
|
Re: Replace quotation-marks with tags in a huge text-file
by poj (Abbot) on Sep 12, 2015 at 14:20 UTC | |
by kemuel (Novice) on Sep 12, 2015 at 14:40 UTC | |
by poj (Abbot) on Sep 12, 2015 at 14:48 UTC | |
by kemuel (Novice) on Sep 12, 2015 at 15:59 UTC | |
by poj (Abbot) on Sep 12, 2015 at 16:11 UTC |