Latest challenge: file is in Rich Text Format. I don't need to parse it, or convert it into HTML. I just want the data, absent any formatting.
Here is my end result. The regexes could no doubt be tightened up.
Update: applied suggestions to regexes #1, #2use strict; use warnings; my ($wholefile,$line); my $outfilename='orderdata.out'; {open INFILE, '<', "orderform.rtf"; local $/; $wholefile=<INFILE>; close(INFILE); $wholefile =~ s/^\s*{.*\n//g; # remove lines starting wi +th braces $wholefile =~ s/\\[\w-]+\b//g; # remove RTF commands $wholefile =~ s/(\n)\s+(\S+)/$1$2/g; # remove extra spaces $wholefile =~ s/([^\\\n]+)\\(\n)/$1$2/g; # get rid of \ at the ends + of lines $wholefile =~ s/\\\n//g; # get rid of lines with a +\ only open OUTFILE,'>',$outfilename or die "Can't open $outfilename!\n"; print OUTFILE $wholefile; close(OUTFILE); }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Remove RTF Formatting
by jdporter (Paladin) on Feb 23, 2011 at 14:51 UTC | |
Re: Remove RTF Formatting
by TomDLux (Vicar) on Feb 28, 2011 at 23:35 UTC | |
by jdporter (Paladin) on Mar 01, 2011 at 15:00 UTC | |
by toolic (Bishop) on Mar 02, 2011 at 01:11 UTC | |
by jdporter (Paladin) on Mar 02, 2011 at 02:37 UTC | |
by toolic (Bishop) on Mar 02, 2011 at 01:27 UTC | |
by GotToBTru (Prior) on Mar 01, 2011 at 14:59 UTC |