Esteemed Monks,
I am having a problem with parsing a Windows text file with regular expressions. Somehow, the file won't match regexes that clearly should be matched by the contents of the file. I assume the problem is due to file encoding under Windows but simply can't get this to work OK.
The file contains hundreds of lines, some of whic in the format
Text.1 // Text.2. I have been using the following code:
#! /usr/bin/perl -w
use strict;
use locale;
use utf8;
while ( <> )
{
if ( /\/\/ / )
{
# Apply long list of regexe-based substitutions
print;
}
}
When I print the file in the console, all characters appear separated by a strange extra whitespace. I believe that as a result of this, the regexes don't match.
Since I could not get it to work under Windows, I tried to convert the Windows file to Unix format under Linux using the shell utility
dos2unix. Also, I tried to convert character encodings using
recode latin1..utf8. None of this worked.
Can you please advise how I can ensure that the Windows text file is read in and processed correctly?
Your help is much appreciated. Thanks in advance!
Cheers -
Pat
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.