Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Peace be unto you o enlightened ones.

I have a large text file with many (~64000) similar lines in it. I would like to create a new file, differing from the original one in the following way:

If a line in the original file has 188 characters, followed by 8 numbers, followed by 16 characters, the corresponding line in the new file must have the 8 numbers replaced with 20020101, unless the original 8 numbers are 00000000, in which case the line is not copied into the new file.

I was thinking of something along the lines of:

perl -lne 'print unless /.{188}00000000.{16}/' original.txt > new.txt

This works for getting rid of the lines with 00000000 in them, but how do I adapt this in order to implement the replacement mentioned above?

Thanks in advance,

S

  • Comment on search and replace on a line by line basis

Replies are listed 'Best First'.
Re: search and replace on a line by line basis
by suaveant (Parson) on Apr 15, 2004 at 13:25 UTC
    First of all, you probably want /^.{188}00000000.{16}$/ so that you match from the beginning to the end of the string... what you have could match a longer line that matched those criteria, if there was one. If the 8 digit number will never start with a zero (assuming it is already a date), then
    perl -lne 'print if s/^(.{188})[1-9]\d{7}(.{16})$/${1}20020101$2/' ori +ginal.txt > new.txt
    would work. Theoretically you could use lookahead and such to negate the need of capturing parens (Upd: As ishnid shows).

    There is probably a way to do it in one regexp, but I can't think of it now, so if zero at the start is a possibility.

    while(<>) { next unless /^(.{188})(\d{8})(.{16})$/; print $1,"20020101$3\n" unless $2 eq '00000000'; }
    untested, but should work.

                    - Ant
                    - Some of my best work - (1 2 3)

      Or to minimize it one more character....

      s/^(.{188})&#91;^0&#93;\d{7}(.{16})$/${1}20020101$2/'

      Update: As the AM below notes, this does not solve the original question, just minimizes the previous poster's RE.

        that ^0 will skip anything starts with 0. it's not what the poster want.

        how about s/^(.{188})(?!0(8))(.{16})$/${1}20020101$4/

Re: search and replace on a line by line basis
by matija (Priest) on Apr 15, 2004 at 13:26 UTC
    I'm sure one of the other monks can make this into a one-liner, but let me propose a simple program:
    while (<>) { if (/^(\w{188})(\d{8})(\w{16})/) { next if $2 eq "00000000"; print "${1}20020101$3\n"; } }
    Call it as script original.txt >new.txt
Re: search and replace on a line by line basis
by ishnid (Monk) on Apr 15, 2004 at 13:23 UTC
    This seems to work, though I'm sure there'll be prettier ways:
    perl -lne 's/(?<=.{188})(\d{8})(?=.{16})/$1 eq "00000000"?next:"200201 +01"/e && print' original.txt > new.txt
      Using negative lookahead for the 00000000 case tightens it up:
      perl -lne 'print if s/(?<=^.{188})(?!0{8})(\d{8})(?=.{16})/20020101/' +original.txt > new.txt

      The PerlMonk tr/// Advocate
Re: search and replace on a line by line basis
by gsiems (Deacon) on Apr 15, 2004 at 13:32 UTC
    Does:
    perl -lne 'next if /.{188}00000000.{16}/; $d='20020101'; s/(.{188})\d{ +8}(.{16})/$1$d$2/; print' original.txt > new.txt
    do what you need?
      that assumes that all lines in the file are of the format he describes. your example will pass through blank lines, for example. i believe he wants anything that is invalid filtered.

      perl -e'$_="nwdd\x7F^n\x7Flm{{llql0}qs\x14";s/./chr(ord$&^30)/ge;print'

        Good point.

        How about:

        perl -lne 'next if /^.{188}0{8}.{16}$/; print if s/^(.{188})\d{8}(. +{16})$/${1}20020101$2/' original.txt > new.txt
Re: search and replace on a line by line basis
by pizza_milkshake (Monk) on Apr 15, 2004 at 15:42 UTC
    just change the "1, 8, 1" to "188, 8, 16" to match your case... those values are just for testing
    #!perl -w # cases above the blank line should be replaced # successfully, below should not use strict; my ($len1, $len2, $len3) = (1, 8, 1); my $valid = qr/^.{$len1}(\d{$len2}).{$len3}$/; while (<DATA>) { next unless $_ =~ $valid; next unless $1 ne "00000000"; substr($_, $len1, $len2) = "20020101"; print; } __DATA__ X12345678X X14322678X X12534654X X12345678X X00000000X 12345678X X12345678 XX 12345678

    perl -e'$_="nwdd\x7F^n\x7Flm{{llql0}qs\x14";s/./chr(ord$&^30)/ge;print'

Re: search and replace on a line by line basis
by Beechbone (Friar) on Apr 16, 2004 at 21:54 UTC
    perl -lne 'print $1."20020101$3\n" if /^(.{188})(\d{8})(.{16})$/ and $2 != 0' original.txt > new.txt

    Search, Ask, Know