Re: Why does my Perl regex substitution for linebreak fail?
by kyle (Abbot) on Mar 05, 2008 at 22:22 UTC
|
my $lines = join "", <DATA>;
$lines =~ s/\n\n=/=/gm;
print $lines;
__DATA__
line 1
======
line after break line
Produces:
line 1======
line after break line
That's what I'd expect, but maybe it's not what you wanted.
If you want to remove a blank line before the marker, do s/\n\n=/\n=/gm. Then the output is:
line 1
======
line after break line
You can do s/\n=/=/gm (which sounds like what you describe), but that will produce output like the first output when there's no blank line before the marker.
As an aside, you can avoid reading in the whole file by setting the input record separator.
$/ = '=';
while (<DATA>) {
s/\n\n=/\n=/m;
print;
}
__DATA__
line 1
======
line after break line
Produces...
line 1
======
line after break line
See perlvar for info about $/ (aka $INPUT_RECORD_SEPARATOR if you use English). | [reply] [d/l] [select] |
|
|
The m modifier is useless since ^, $, etc isn't used. In fact, why aren't you using \z when you change the IRS? And why not use "\n\n=" as the IRS?
local $/ = "\n\n=";
while (<DATA>) {
s/\n\n=\z/\n=/;
print;
}
| [reply] [d/l] [select] |
|
|
I agree. The original regex, as posed by pat mc, removes both \n rather than just the single one that pac mc said was wanted to be removed. I also presume that pat mc (based upon the inquiry) is looking for regex solutions; but several of the other nodelets in this thread have some good ideas for alternativies to the regex approach.
| [reply] |
|
|
Thanks, kyle, for drawing my attention to the use of the IRS, an aspect of file handling in Perl I was unaware of so far.
| [reply] |
Re: Why does my Perl regex substitution for linebreak fail?
by igelkott (Priest) on Mar 05, 2008 at 22:43 UTC
|
... OK when I print the result to the console but not when I redirect the output into a file ...
Could your file actually have \r\n (windows-based) line-endings? Could get different terminal behavior if running cygwin with unix line-endings?
If you have a unix-like system available, might try pushing a small bit of your processed and unprocessed file through "od". I sometimes use something like "tail -3 foo | od -bc" to keep from getting fooled by "friendly" systems. | [reply] [d/l] |
|
|
For mostly-printable files the output of "tail -3 foo | cat -A" is less cluttered.
| [reply] |
|
|
Thanks, igelkott, for adressing the console-part of my post. Can you please explain to me in more basic terms what your suggestion is? I am fairly new to Linux and hence don't quite understand what the issue is you are pointing at.
The file I intend to operate on, however, has been generated with the 'cat' command in the shell concatenating other files generated under Linux. Not sure, therefore, if the inter-operating-system issue applies here.
Thanks again -
Pat
| [reply] |
|
|
| [reply] [d/l] |
|
|
|
|
igelkott -
Your answer got right to the core of the issue. I searched for \r and got matches in exactly those lines which resisted the replacement. What exactly is this \r character, anyway?
I have no idea how that \r entered my fully Linux-based and Linux-generated file.
Any thoughts on this?
Thanks again for shedding some light on this.
Cheers -
Pat
| [reply] |
|
|
| [reply] |
|
|
|
|
Re: Why does my Perl regex substitution for linebreak fail?
by halfcountplus (Hermit) on Mar 05, 2008 at 23:18 UTC
|
what you are literally asking for ('I have a long file from which I want to remove a single linebreak before all lines starting with the string "= = = =".') is this:
#!/usr/bin/perl
use strict;
my $l;
while (<DATA>) {
if ($_ =~ /^====/) {chomp $l;}
print $l;
$l=$_;
}
print $l;
__DATA__
one
two
====three
four
| [reply] [d/l] |
Re: Why does my Perl regex substitution for linebreak fail?
by graff (Chancellor) on Mar 06, 2008 at 03:39 UTC
|
You say you want to remove a single linebreak before all lines starting with the string "= = = =", but your snippet would remove two linebreaks ("\n\n" is replaced with nothing). Just curious about that.
Anyway, I think others have already given good ideas. Here's another one, that doesn't require holding the entire file in memory at once (unless of course the file does not actually contain any instance of "\n===="):
#!/usr/bin/perl
use strict;
use warnings;
$/ = "\n====";
while (<>) {
s/\n====$/====/;
print;
}
Setting the INPUT_RECORD_SEPARATOR ($/, see perlvar) like that makes things very simple. If the file happens to have CRLF line termination, you may need to set $/ to "\r\n====" (and include "\r" in the s/// as well).
(updated upon realizing that a CRLF file would just need a modified s///; the original $/ setting above would still work fine -- oops! I just noticed that ikegami already posted this idea, as I should have known he would!) | [reply] [d/l] |
|
|
Yes, graff, you are right in observing that my regex contains two linebreaks - in contrast to what I actually intended to do. The curious thing is that the regex performs as expected when it should match one linebreak but not when it contains two linebreaks - in that case it appears to do NOTHING at all, although the file definitely does contain several consecutive linbreak-only lines.
I am still puzzled and am starting to believe the issue is not due to the Perl-side of things but rather an I/O or even a Linux problem.
Any conejectures on this one?
Thanks again -
Pat
| [reply] |
|
|
I'm not sure I follow what you are describing there. The best thing to do is to present a minimal script and data set that still (even after what you've learned) produces results that you consider to be unexpected, and point out how it differs from what you would expect.
| [reply] |