in reply to Re: Adding back missing newlines between records
in thread Adding back missing newlines between records

perl -pe 's/(?<!\n)\n# file/\n\n# file/g' record_file

I don't see how that is supposed to work. The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop). The s/// operator in your code is going to operate on the $_ variable, and the diamond operator(<>) will assign each line in the file to $_ one line at a time.

As far as I can tell, at some point $_ will be equal to the string "# file\n", and the previous string will have been "hello world\n" (i.e. not "\n" as desired). Your regex is looking for "\n# file" preceded by a "\n". First, because it seems to me that the diamond operator will produce the line "# file\n", your regex won't match because there is no "\n# file" in that line. Second, it looks to me like you are doing a negative lookbehind beyond the start of the string. How is that supposed to work?

Replies are listed 'Best First'.
Re^3: Adding back missing newlines between records
by johngg (Canon) on Nov 06, 2009 at 11:29 UTC
    I don't see how that is supposed to work.

    It's supposed to work because, as ikegami pointed out, you use the -0777 switch to make the interpreter slurp the whole file in one go, the equivalent of undefining $/ in a script. Thus, the global replace operates on a single string which is the whole file and the while implied by -p only iterates once.

    I hope this is helpful.

    Cheers,

    JohnGG

Re^3: Adding back missing newlines between records
by Anonymous Monk on Nov 06, 2009 at 11:27 UTC
    Its the magick -0777 option that sets input record separator, so instead of reading lines, it reads records of no more than oct(0777) (511) bytes, or if your platform doesn't have record oriented files, it reads the whole file.

      Ah. Sorry, I thought my browser was rendering something wrong when I saw -0777.

      Thanks.

      The -0 flag is described differently here:

      http://affy.blogspot.com/p5be/ch17.htm

      According to them, the -0 option is not the number of bytes to read. Instead, it's the character(in octal format) that is to be considered the end of a "line". It says 0777 is not a valid character, therefore it is never found in the file and your whole file gets slurped as one line.

      Is that accurate?
Re^3: Adding back missing newlines between records
by 7stud (Deacon) on Nov 06, 2009 at 12:06 UTC
    The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop)

    Actually, that's not quite accurate. According to what I read, the while loop looks like this:

    LINE: while (<>) { # your code goes here } continue { print or die "-p destination: $!\n"; }

    A continue block gets executed the instant before the loop condition is evaluated. So 'redo' does not cause the continue block to execute, but 'next' does, and a normal iteration of the loop causes the continue block to execute as well.

    This works for me:

    perl -pe 'if($_ eq "\n"){$n=1;next;} if($n){$n=0;next;}else{s/# file/\ +n# file/;}' data1.txt