Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi!
I have strings like the following:
$string1='-----------------------------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-------------------------------NNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN--------NNNNN +NNNNNNNNNNNN------------------------NNNNNNNNNNNNNNNNNNNN------------- +-------------------------------BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBB-----'; $string2='-------------------------------------NNNNNNNNNNNNNNNNNNNNNNN +NNNNNNN--NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN--NNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNN--NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN--------- +-----------------BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'; $string3='--------------------------------NNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-------------------------- +---------------------------------BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB-------------NNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNN'; $string4='----------------------------------BBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBB-----------------------------------------------N +NNNNNNNNNNNNNNNNNNNNNNNNNNNN------------NNNNNN';

I want to expand the BBBBBBBB substring to cover all possible ---- until it either reaches the end of the string on its right (NOT on its left), or until it reaches an N, either on its left or its right. What I am looking for is if there is any way, with a pattern match obviously, to cover all these cases, because, the only way I know of is to have 4 if clauses, depending on the format of the initial string...

Replies are listed 'Best First'.
Re: How can I expand my substring?
by Corion (Patriarch) on Jun 30, 2014 at 11:47 UTC

    In your example, what is your expected result?

    Also, do the strings need to be as long as you've shown? This makes reading your example quite unwieldly. Maybe shortening each group of characters to four characters helps make the example data more understandable?

    Personally, I would solve the problem exactly as you described. Replace any - followed by a B by BB, and the same for the other direction. Keep doing that as long as you can:

    $string1='--NN--NNN--NN--NNN--BBB-----'; 1 while $string1 =~ s/B-/BB/ or $string1 =~ s/-B/BB/; print $string1;
      $string1='--------NNNNNNN--------NNNNN--------NNNNN------NNNNNNN------ +-------BBBBB-----';

      should become
      $string1_new='--------NNNNNNN--------NNNNN--------NNNNN------NNNNNNNBB +BBBBBBBBBBBBBBBBBBBBB';

      $string2='---------NNNNN-NNNNNNNNNNNNN----NNNNNNNNNNN---------BBBBBBBB +BB';
      should become
      $string2_new='---------NNNNN-NNNNNNNNNNNNN----NNNNNNNNNNNBBBBBBBBBBBBB +BBBBBB';

      $string3='-------NNNNNNN-----------------BBBBBBBBBBBBBBB-------------N +NNNNNN';

      should become
      $string3_new='-------NNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBNNNNNNN';

      $string4='--------BBBBBBBBBBBB-------NNNNNNNNNNNNN------NNNNNN';

      should become:
      $string4_new='--------BBBBBBBBBBBBBBBBBBBNNNNNNNNNNNNN------NNNNNN';

      Does this make more sense? Thank you for the answer, I just don't get this "1" that you have there... I tried the:
      $initial_protein=~s/B-/BB/g; $initial_protein=~s/-B/BB/g;

      but didn't work...

        The "1" has significance, in the sense that it is the loop body. You could rewrite my code as:

        while( $string1 =~ s/B-/BB/ or $string1 =~ s/-B/BB/ ) { # Replacement already happened in the while condition };

        There are other ways, but I find this approach to be closest to your description.

Re: How can I expand my substring?
by choroba (Cardinal) on Jun 30, 2014 at 12:50 UTC
    Instead of going the /e way, which is a bit hard to read, I'd remove the offending part from the string, do the replacements, and return it back:
    #!/usr/bin/perl use warnings; use strict; use Test::More tests => 4; sub fill { my $_ = shift; my $prefix = do { s/^(-*B?)// ; $1 }; 1 while s/-B/BB/g or s/B-/BB/g; return "$prefix$_" } is(fill('--------NNNNNNN--------NNNNN--------NNNNN------NNNNNNN------- +------BBBBB-----'), '--------NNNNNNN--------NNNNN--------NNNNN------NNNNNNNBBBBBBB +BBBBBBBBBBBBBBBB', 'test1'); is(fill('---------NNNNN-NNNNNNNNNNNNN----NNNNNNNNNNN---------BBBBBBBBB +B'), '---------NNNNN-NNNNNNNNNNNNN----NNNNNNNNNNNBBBBBBBBBBBBBBBBBB +B', 'test2'); is(fill('-------NNNNNNN-----------------BBBBBBBBBBBBBBB-------------NN +NNNNN'), '-------NNNNNNNBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBNN +NNNNN', 'test3'); is(fill('--------BBBBBBBBBBBB-------NNNNNNNNNNNNN------NNNNNN'), '--------BBBBBBBBBBBBBBBBBBBNNNNNNNNNNNNN------NNNNNN', 'test4');
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: How can I expand my substring?
by GrandFather (Saint) on Jul 23, 2014 at 12:29 UTC
    use strict; use warnings; my $maxLen = 20; my @tests = ( ['N' . ('-' x ($maxLen + 20)) . 'B', 'N' . ('-' x 20) . ('B' x ($m +axLen + 1))], ['-B' . ('-' x ($maxLen + 20)), '-B' . ('B' x $maxLen) . ('-' x 20 +)], ['-N-N--B--N', '-N-NBBBBBN'], ['N-N-B-', 'N-NBBB'], ['-B-N-B-N-B-N', '-BBNBBBNBBBN'], ['-B-N', '-BBN'], ['-B-', '-BB'], ['B-', 'BB'], ); for my $test (@tests) { my ($org, $ref) = @$test; my $str = $org; $str =~ s/((?=N-+)(?:N)(?:-*?))(-{1,$maxLen}B)/$1 . ('B' x length +$2)/eg; $str =~ s/(B-{1,$maxLen})(?=-*(N|$))/'B' x length $1/eg; next if $str eq $ref; print "Couldn't handle: '$org'\n"; print " Expected: '$ref'\n"; print " Got: '$str'\n" }

    which generates no output indicating all is well.

    The two regex sustitutions:

    $str =~ s/((?=N-+)(?:N)(?:-*?))(-{1,$maxLen}B)/$1 . ('B' x length +$2)/eg; $str =~ s/(B-{1,$maxLen})(?=-*(N|$))/'B' x length $1/eg;

    do the work. The rest is test code. Change maxLen to the maximum number of characters to replace.

    Perl is the programming world's equivalent of English
      Thank you very much for this!
      I tried it and it works, but I think there is a problem if the string does not contain NNNNN but only --- and BBBB.
      I tried this:
      $initial_string = '--------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +------------------------------------------------------------------BBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB-----------------'; print $initial_string."\n"; $maxLen = 200; $initial_string =~ s/((?=N-+)(?:N)(?:-*?))(-{1,$maxLen}B)/$1 . ('B' x +length $2)/eg; $initial_string =~ s/(B-{1,$maxLen})(?=-*(N|$))/'B' x length $1/eg; print $initial_string."\n";

      and it only expanded the BBBB substring to the right of it, not to the left.

        Your "specification" says: "... until it either reaches the end of the string on its right (NOT on its left), or until it reaches an N ..." so that is what I implemented.

        As a general the better you specify a task the easier it becomes to implement. A good approach in this sort of case is to provide tests for each case you are interested in then the tests become part of the specification and also allow you to validate the behaviour of the code.

        Also note that I already gave you a test framework so the appropriate way to check that case is to provide the start string and the expected string as I did for the other test cases. That allows us to see what you expected to happen and how that differs from what did happen. It also allows you to play with the substitution code and check that you haven't broken test cases that were previously working.

        This technique of writing tests first then writing the code so they pass is a very powerful way of developing software and, as can be seen from this example, works even for trivial code.

        Perl is the programming world's equivalent of English