Hena has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm trying to solve a dereference problem. Can I someway get substitution to accept progressive matching. Only way I could use the pos() command is below, but that is not a very good way (as far as speed is concerned and I don't like it 'couse it's doing things twice). So any better way to do this?
#!/usr/bin/perl use warnings; use strict; # use re "debug"; my %CSV = ('n' => 'n', 't' => 't', 'f' => 'f', '\\' => '\\', ); $_ = '\\\\\\\\\\'; print "$_\n"; my $pos = 0; while (m/\\(.)/g) { $pos=pos($_);} s/\\(.)/exists($CSV{$1}) ? "\\$1" : "\\\\$1"/ge; s/^(.{$pos})\\$/$1\\\\/; print "result:\n$_\n";

The first subst works fine until it hits the last character. If that is a single backstroke, then it needs to be dereferenced, but if not then not.

Replies are listed 'Best First'.
Re: Regex to dereference
by gaal (Parson) on Jan 18, 2007 at 13:30 UTC
    Are you trying to find the length of the string? length is much better than your pos loop for that purpose.

    Is the last line in your code meant to handle the case you're saying doesn't work in the main substitution? Well, sure the main one doesn't work, because /\\(.)/ doesn't match at the last character. It can't! $_ .= "\\" if /\\$/ is one way to do it -- no need for a length check at all.

    By the way, "dereference" is usually understood to mean something else that what you are using it as here. The term you are looking for is probably "escape" or colloquially "backwhack".

      The thing is that if there is only one '\' character, then it should be doubled, but if there is two then not. Basicly if there is even amount then no change, and odd, then change the last. So direct test on last char doesn't really work.

      But thanks for the term help. As all my perl (and most of my coding as well) is self-taught :).
        Ah, then you need what's called a zero-width negative look-behind.

        $str =~ s/(?<!\\)\\$/\\\\/; # match a backslash at the EOL # that isn't preceded by a backslash

        See perlre for more stuff like this.

Re: Regex to dereference
by ferreira (Chaplain) on Jan 18, 2007 at 14:09 UTC

    When you ask a question, the monks are happy to see your effort in the shape of the code you tried before asking for help. But when you're confused about how to write down the algorithm or the program or about Perl itself, it could help a prose description of what you're trying to do. Sometimes you'll be surprised about how people may offer simpler solutions you haven't thought about it.

    Well, after this unsolicited introduction, I am not sure what you're trying to do. Maybe replacing literal escape sequences like '\n' with their meaning "\n", which could be done with code like

    my %map = ( '\n' => "\n", '\t' => "\t" ); $_ = "whatever"; s/(\\[ntf\\])/$map{$1}||'?'/g
    Or maybe you're trying to do the opposite: turn newlines ("\n"), tabs ("\t") into escapes ('\n', '\t'). Where something like this might work
    my %imap = ( "\n" => '\n', "\t" => '\t' ); my $specials = join '|', keys %imap; $_ = "a\nb\tcksl"; s/($specials)/$imap{$1}/gms;

    But you may find useful a ready-to-use module like Text::Quote as well.

      Ah sorry. I tried and obviously didn't do a very good job at it :).

      Idea was to turn '\' into '\\'. Unless its in conjunction with 'n', 't', 'f', 'r', or '\'.
Re: Regex to dereference
by Moron (Curate) on Jan 18, 2007 at 14:39 UTC
    regexps are fine on their own for one-off use, but for iterative use they are often better wrapped up into a parser, e.g.:
    #untested use myParser; my %CSV = ('n' => 'n', 't' => 't', 'f' => 'f', '\\' => '\\', ); $_ = '\\\\\\\\\\'; my ($result,$status) = parse( LEXICON=>\%CSV ); package myParser; sub new { my $self = Interface( @_ ); my $class = ref( $self ) || $self; return (bless $self), $class; } sub parse { my $self = Interface( @_ ); my $bufadr = ($self -> { BUFREF } ||= \$_); # default buffer $_ my $lexicon = $self -> { LEXICON } or die; my $return = []; my $lexStatus = 1; while( length( $$bufadr ) && lexStatus ) { Throw( $self ); my $content; ( $content, $lexStatus ) = AntiLex( $self, keys %$lexicon ); $lexStatus and push @$return, $content; } return $return, $lexStatus; } sub Throw { AntiLex( shift(), '\S' ); } sub AntiLex { my $self = shift; my $bufadr = $self -> { BUFADR }; my $status; for my $pattern ( @_ || die() ) { if ( $$bufadr =~ /^\n*($pattern)(.*)$/m ) { $$bufadr = $2; return ( $1, 1 ); } } return ( undef(), 0 ); } sub Interface { return ( ref( $_[0] ) || !$#_ ) ? shift() : { @_ }; } 1;

    -M

    Free your mind

Re: Regex to dereference
by Thelonius (Priest) on Jan 18, 2007 at 23:53 UTC
    I think this (funcb) does what you want.
    #!/usr/bin/perl use warnings; use strict; my %CSV = ('n' => 'n', 't' => 't', 'f' => 'f', '\\' => '\\', ); my %CSV1 = ('\\n' => '\\n', '\\t' => '\\t', '\\f' => '\\f', '\\\\' => '\\\\', ); my @same = qw( \\ \\\\ a ab \a \a\b \a\n \n\a \n\n abc\a abc\\a ); my @diff = qw( a\\ \a\\ abc\a\\ abc\a\\\\ \a\b\c\d\e\x\g\h\\ ); print "These are the same as your function\n"; runtests(@same); print "These are different:\n"; runtests(@diff); sub runtests { for my $in (@_) { print " in: $in\n"; $_ = $in; funca(); print " funca: $_\n"; my $savea = $_; $_ = $in; funcb(); printf " funcb: %-30s %s\n\n", $_, $_ eq $savea ? "SAME" : "DIF +FERENT"; } } sub funca { my $pos = 0; while (m/\\(.)/g) { $pos=pos($_);} s/\\(.)/exists($CSV{$1}) ? "\\$1" : "\\\\$1"/ge; s/^(.{$pos})\\$/$1\\\\/; } sub funcb { my $out = ""; pos($_) = 0; s/\G(\\.|[^\\]+|\\$)/substr($1,0,1) eq '\\' ? $CSV1{$1} || "\\$1" : +$1/eg; } __END__ # output: These are the same as your function in: \ funca: \\ funcb: \\ SAME in: \\ funca: \\ funcb: \\ SAME in: a funca: a funcb: a SAME in: ab funca: ab funcb: ab SAME in: \a funca: \\a funcb: \\a SAME in: \a\b funca: \\a\\b funcb: \\a\\b SAME in: \a\n funca: \\a\n funcb: \\a\n SAME in: \n\a funca: \n\\a funcb: \n\\a SAME in: \n\n funca: \n\n funcb: \n\n SAME in: abc\a funca: abc\\a funcb: abc\\a SAME in: abc\a funca: abc\\a funcb: abc\\a SAME These are different: in: a\ funca: a\ funcb: a\\ DIFFERENT in: \a\ funca: \\a\ funcb: \\a\\ DIFFERENT in: abc\a\ funca: abc\\a\ funcb: abc\\a\\ DIFFERENT in: abc\a\\ funca: abc\\a\\\ funcb: abc\\a\\ DIFFERENT in: \a\b\c\d\e\x\g\h\ funca: \\a\\b\\c\\d\\e\\x\\g\\h\ funcb: \\a\\b\\c\\d\\e\\x\\g\\h\\ DIFFERENT NT
      Yes. This indeed does what I wanted. Thank you :D.