Lhamo_rin has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I have a data file as such:

"1" { "data1" "test.V1_0"; "StrCatDone" "0"; "seq_name" "sequence.V5_0"; "seq_step" "22"; } "2" { "data1" "test.V1_0"; "StrCatDone" "1"; "seq_name" "sequence.V5_0"; "seq_step" "41"; }

I need to search through each file one block at a time and find instances in which StrCatDone is = 1. When that is true I need to change the following data point, seq_name, from sequence.V5_0 to sequence_$newtext.

I would like some opionions on how best to do this.

Thanks

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: Altering data files
by BrowserUk (Patriarch) on Oct 10, 2006 at 13:25 UTC

    This will do it. See perlrun for the explanation of the switches used. In particular, look at the documentation for -i.bak for in-place editting of files; -0ooo for setting 'paragraph mode'; -p for processing a file as a filter.

    If you don't understand, ask and someone will help you.

    C:\test>type 577397.dat "1" { "data1" "test.V1_0"; "StrCatDone" "0"; "seq_name" "sequence.V5_0"; "seq_step" "22"; } "2" { "data1" "test.V1_0"; "StrCatDone" "1"; "seq_name" "sequence.V5_0"; "seq_step" "41"; } ## Note: This should be typed as one line but is wrapped for posting. ## On *nix system, you'lll need 's instead of "s. C:\test>perl -0175 -i.bak -pe"/StrCatDone. .1./ and s/sequence.V5_0/sequence.V6_3/" 577397.dat C:\test>type 577397.dat "1" { "data1" "test.V1_0"; "StrCatDone" "0"; "seq_name" "sequence.V5_0"; "seq_step" "22"; } "2" { "data1" "test.V1_0"; "StrCatDone" "1"; "seq_name" "sequence.V6_3"; "seq_step" "41"; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Altering data files
by Hofmator (Curate) on Oct 10, 2006 at 13:31 UTC
    1. Open the file
    2. Read the file blockwise by using an input record separator, see perldoc perlvar, eg. '}' provided that character does not appear anywhere else in your data.
    3. Split each blocks into lines and look for StrCatDone
    4. Depending on the value after StrCatDone alter the following line.
    5. Output the (possibly altered) block.
    Transform this into code and post it here if you have any problems with it.

    -- Hofmator

    Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Altering data files
by davidrw (Prior) on Oct 10, 2006 at 13:27 UTC
    If the "seq_name" line is _always_ directly after the "StrCatDone" line, then this (untested) one-liner should do it (not very memory-efficient though):
    perl -0777 -pe 's/(?<="StrCatDone" "1";\s+"seq_name" ")sequence.V5_0(? +=";)/sequence_FOO/sg' datafile.txt
    Alternatively, you could set $/='}' (see perlvar) and apply that regex to each "line", writing out to another file as you go.
Re: Altering data files
by jdporter (Paladin) on Oct 10, 2006 at 15:51 UTC

    You don't really want to do it this way, of course... But I noticed that your data structure is very similar to Perl code, so...

    use strict; use warnings; $_ = do { local $/; <> }; s/" "/" => "/g; s/"\s{/" => {/g; s/}/},/g; s/;/,/g; my $ds = eval "{$_}"; # Danger, Will Robinson! my $new_sequence = "sequence.V6_0"; # or whatever. for ( values %$ds ) { $_->{'StrCatDone'} eq '1' and $_->{'seq_name'} = $new_sequence; } use Data::Dumper; # go as far as we can using Data::Dumper's output options... $Data::Dumper::Indent = 0; $Data::Dumper::Terse = 1; $Data::Dumper::Pair = " "; $_ = Dumper $ds; # ...and do the rest by hand: s/,/\n/g; s/ {/\n{\n/g; s/}/\n}/g; s/'/"/g; s/{(.*)}/$1/s; print

    Execute the above with perl -i.bak.

    We're building the house of the future together.
Re: Altering data files
by codeacrobat (Chaplain) on Oct 10, 2006 at 23:00 UTC
    In your special case a relativly simple solution would be.
    perl -pi.bak -e 's/sequence.V5_0/sequence.V6_3/ if $old =~ /StrCatDone +. .1./; $old=$_ ' data.dat
Re: Altering data files
by systems (Pilgrim) on Oct 12, 2006 at 16:16 UTC
    Below is my attempt. Some assumptions are made the most important is that "" "seq_name" will be one line below "StrCatDone". Also note I am a Perl newbie, so the code below is not of good quality. Any my logic is.
    1. Read the file into an array.
    2. Find the array index where you find: "StrCatDone" "1". Using the for(;;) loop.
    3. Modify the array element of the next index hence the ++$i.
    4. write the array back to the file.
    use strict; use warnings; my $file = shift @ARGV; open (my $fh1, $file) or die "Can't open file \"$file\", $!"; my @txt = <$fh1>; close $fh1; my $newtext = '"seq_name" "sequence.V6.0"'; my $i; for ($i=0; $i <= $#txt; $i++) { last if $txt[$i]=~ m/"StrCatDone" "1"/; #print $txt[$i]; } $txt[++$i]="$newtext\n"; open (my $fh2,">$file") or die "Can't open file \"$file\", $!"; print $fh2 @txt;