Re: convert tags to punctuation

If you know which tag corresponds to which punctuation mark, it should be a cinch to convert each one via a substitution. Something like this should work:


my $line = 'Text with unusual punctuation<91><91><91>  I<92>m not goin
+g to lie<93> this is odd text<94>';

$line =~ s/<91>/./g;
$line =~ s/<92>/'/g;
$line =~ s/<93>/,/g;
$line =~ s/<94>/!/g;
# etc.
[download]

Or, if processing the entire file, instead of line by line, you could try it this way:

my $source = 'my_filename.txt';
my $target = 'new_filename.txt'; #THIS FILE WILL BE OVERWRITTEN

open SOURCE, "<$source" or die "Can't open $source. $!\n";
@array = <SOURCE>;
close SOURCE;

s/<91>/./g for @array;
s/<92>/'/g for @array;
s/<93>/,/g for @array;
s/<94>/!/g for @array;

open TARGET, ">$target" or die "Can't open $target. $!\n";
print TARGET @array;
close TARGET;
[download]

Blessings,

~Polyglot~

Comment on Re: convert tags to punctuation Select or Download Code

Replies are listed 'Best First'.
Re^2: convert tags to punctuation by BillKSmith (Monsignor) on Jan 13, 2021 at 15:24 UTC
As a variation on Polyglot's solution, you can define the tags in a hash. The advantage is that it is more easily expanded if more tags are needed. I have chosen to specify the characters by name (charnames) because I find single punctuation marks, embedded in quotes, hard to read. `use strict; use warnings; my %tags = ( 91 => "\N{FULL STOP}", # '.' 92 => "\N{APOSTROPHE}", # ''' 93 => "\N{COMMA}", # ',' 94 => "\N{EXCLAMATION MARK}", # '!' ); my $line = 'Text with unusual punctuation<91><91><91>' .'I<92>m not going to lie<93> this is odd text<94>' ; $line =~ s/<(\d\d)>/$tags{$1}/ge; print $line, "\n";` [download] Bill	[reply] [d/l]
Re^3: convert tags to punctuation by Anonymous Monk on Jan 15, 2021 at 19:01 UTC
Bill -- I think your code is more maintainable. The document I am messing with is about 600,000 lines long. Is there a way to speed this up? Is there a way to get a complete list of <ab> tags ?	[reply]
Re^4: convert tags to punctuation by BillKSmith (Monsignor) on Jan 16, 2021 at 16:47 UTC
You should ask the person who prepares your input file if he can direct you to either a specification of the file format or to the documentation of the program that created it. If this fails, I would write a perl program to list all the tags. The only way I know to get the values, is use an editor to examine the tags in context and make your best guess. (It usually will be obvious.) It is nearly impossible to guess what will or will not make a Perl program faster. The usual advice is to profile your program. Only work on those parts which are using the most time. Use benchmark to measure possible improvement. In your case, I/O is probably taking much longer than processing. Slurping the entire file into memory is probably not an option. Reading the file in large blocks may help, but it is not easy to get right. I recommend against any optimization unless it is absolutely necessary. Bill	[reply]
Re^5: convert tags to punctuation by Anonymous Monk on Jan 16, 2021 at 17:26 UTC
Re^6: convert tags to punctuation by haukex (Archbishop) on Jan 16, 2021 at 18:07 UTC
Re^4: convert tags to punctuation by LanX (Saint) on Jan 16, 2021 at 16:59 UTC
> Is there a way to speed this up? what makes you think it's not fast enough? Update quoting davido from the CB: who cares about how fast Perl runs; it's almost always the network or IO that are standing in the way. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]

Update