newbiecali has asked for the wisdom of the Perl Monks concerning the following question:

Hi

i am newbie to Perl and need to learn from starting
i have very big file of data (3Gb)like below


00101100
00101101
11H00101
HHHHHHHH
01011001
LLLLLLLL
1011010L
0110111L
00000000
00111111


i need to replace H to 1 in column3 and L to 0 in column 8
only so the file looks like below

00101100
00101101
11100101
HHHHHHHH
01011001
LLLLLLL0
10110100
01101110
00000000
00111111


can it take text file "file1.txt"
and spit out the output in "output.txt"
something like that

i will appreciate any help

Thanks

Replies are listed 'Best First'.
Re: Replace bits
by choroba (Cardinal) on Mar 30, 2015 at 21:45 UTC
    perl -pe 's/^(..)H/${1}1/; s/^(.{7})L/${1}0/;' input > output

    The output is different on line 4, it prints

    HH1HHHHH

    Explanation:

    • -p reads the file line by line; it prints the line after running the code on it
    • s/REGEX/STRING/ is substitution, it replaces what's matched by the regular expression REGEX by STRING
    • ^ matches beginning of a line
    • . matches anything but newline
    • () create a "capture group", you can use $1 or ${1} to refer to the first capture group.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      can it take text file "file1.txt"
      and spit out the output in "output.txt"
      something like that

        Exactly like that. Take  input in choroba's command line above and replace it with  file1.txt and replace  output with  output.txt likewise.


        Give a man a fish:  <%-(-(-(-<

Re: Replace bits
by hankcoder (Scribe) on Apr 05, 2015 at 07:11 UTC
    Your data listed is by column or by line? I assume it is as columns per line:

    00101100 00101101 11H00101 HHHHHHHH 01011001 LLLLLLLL 1011010L 0110111L 00000000 00111111

    If by column, you will need to split them into "columns" (array) and process it one by one. Assume separator character is "space", use (@columns) = split(/ /, $dataline);

    Using regex may do the trick faster, but I would use my approach above for easier understanding and future expansion flexibility.

    If your data is in file, which I assume is text file with 10 columns per line, then you just need loop thru the file to process each line's columns.