SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:
How do I disable or prevent Perl from processing special regex characters such as \, * and ? in a quoted string so I can process them as regular characters?
I'm stuck on this one. I'm pulling data from a couple of sources and amongst the many chunks of information I'm sucking up, the darn thing kept choking. Come to find out there are a couple of "illegal" characters that's causing problems for me. For example, some characters like \, * and ? (amongst others) are triggering problems elsewhere within the code. Unfortunately, I can't just send these characters on their merry way since the some of this data is being converted into files on an NTFS partition.
In other words. I'm slurping down data from files (roughly 30GB worth of data). The data within is digested and new NTFS palatable files are created. Perl did exactly what I wanted it to do with the exception of the file naming convention. To my surprise, Linux happily wrote illegal Windows characters to the NTFS partition causing Windows to balk when trying to do anything with them. In other words, Linux (and Perl) created files such as:
C:\random\location\ab\delta.txt
and
C:\random\location\ab*delta.txt
(Technically, Linux wrote to /mnt/c/random/location/. Windows sees it as C:\random\location\).
Where C:\random\location\ is the full path and ab\delta.txt or ab*delta.txt is the file name. So, thinking I was smart, I just did a s/// for all the illegal characters and just replaced them all with _. That worked until I got to ab\delta.txt and ab*delta.txt where both would be renamed to ab_delta.txt, one overwriting the other. OK, so I tried to be a little smarter and tried to use iteration creating ab_delta-1.txt and ab_delta-2.txt but if Perl dies for any reason, I get a bunch of files ending in -3 -4 -5 and no idea what was what.
OK..... Looking back and my internal file structure, I finally decided to do a bit of substitution making \, * and ? into [bs], [a], [q]. Yaaay! It started working. Until I ran into files that needed to be ab\delta.txt and ab\\delta.txt. I was getting ab[bs]delta.txt. DOH!!
So here's what I've come up with so far (with all the extranous crusty stuff removed):
my $test = 'illegal\characters*example?'; my @illegal = qw(\ * ?); my @legal = qw(bs a q); my $c = 0; foreach my $val (@illegal) { $test =~ s/\Q$val\E/[$legal[$c]]/g; $c++; } print $test."\n";
Please, enlighten me and direct me to the proper solution.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Disable Regex
by roboticus (Chancellor) on Aug 26, 2009 at 10:04 UTC | |
|
Re: Disable Regex
by james2vegas (Chaplain) on Aug 26, 2009 at 05:59 UTC | |
by AnomalousMonk (Archbishop) on Aug 26, 2009 at 06:20 UTC | |
by SavannahLion (Pilgrim) on Aug 26, 2009 at 07:04 UTC | |
by james2vegas (Chaplain) on Aug 26, 2009 at 07:11 UTC | |
by SavannahLion (Pilgrim) on Aug 26, 2009 at 07:26 UTC | |
by ig (Vicar) on Aug 27, 2009 at 09:05 UTC | |
|
Re: Disable Regex
by jwkrahn (Abbot) on Aug 26, 2009 at 08:33 UTC |