in reply to Adding back missing newlines between records

My problem is that somehow I can't seem to search across the newline which is what I thought the /s was supposed to help with

The s modifier makes . match every character, including the newline which it doesn't match by default. Useless here since you don't use ..

The -p causes the expression to be applies to each line of input. You're trying to match something you haven't read yet! One way of fixing this is to change the definition of line so that the whole file is read at once. (-0777)

Then there's the issue that /(?=\w)\n/ will never match. How can the next character be both a word character and a newline?

perl -0777pe's/(?<!\n)\n# file/\n\n# file/g' record_file

Replies are listed 'Best First'.
Re^2: Adding back missing newlines between records
by 7stud (Deacon) on Nov 06, 2009 at 11:08 UTC
    perl -pe 's/(?<!\n)\n# file/\n\n# file/g' record_file

    I don't see how that is supposed to work. The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop). The s/// operator in your code is going to operate on the $_ variable, and the diamond operator(<>) will assign each line in the file to $_ one line at a time.

    As far as I can tell, at some point $_ will be equal to the string "# file\n", and the previous string will have been "hello world\n" (i.e. not "\n" as desired). Your regex is looking for "\n# file" preceded by a "\n". First, because it seems to me that the diamond operator will produce the line "# file\n", your regex won't match because there is no "\n# file" in that line. Second, it looks to me like you are doing a negative lookbehind beyond the start of the string. How is that supposed to work?

      I don't see how that is supposed to work.

      It's supposed to work because, as ikegami pointed out, you use the -0777 switch to make the interpreter slurp the whole file in one go, the equivalent of undefining $/ in a script. Thus, the global replace operates on a single string which is the whole file and the while implied by -p only iterates once.

      I hope this is helpful.

      Cheers,

      JohnGG

      Its the magick -0777 option that sets input record separator, so instead of reading lines, it reads records of no more than oct(0777) (511) bytes, or if your platform doesn't have record oriented files, it reads the whole file.

        Ah. Sorry, I thought my browser was rendering something wrong when I saw -0777.

        Thanks.

        The -0 flag is described differently here:

        http://affy.blogspot.com/p5be/ch17.htm

        According to them, the -0 option is not the number of bytes to read. Instead, it's the character(in octal format) that is to be considered the end of a "line". It says 0777 is not a valid character, therefore it is never found in the file and your whole file gets slurped as one line.

        Is that accurate?
      The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop)

      Actually, that's not quite accurate. According to what I read, the while loop looks like this:

      LINE: while (<>) { # your code goes here } continue { print or die "-p destination: $!\n"; }

      A continue block gets executed the instant before the loop condition is evaluated. So 'redo' does not cause the continue block to execute, but 'next' does, and a normal iteration of the loop causes the continue block to execute as well.

      This works for me:

      perl -pe 'if($_ eq "\n"){$n=1;next;} if($n){$n=0;next;}else{s/# file/\ +n# file/;}' data1.txt
Re^2: Adding back missing newlines between records
by puterboy (Scribe) on Nov 10, 2009 at 07:21 UTC
    Thanks for the code and the helpful explanation. I have read 'man perlre' many times but as you pointed out I missed several points there. Thanks for the clarification.