Regex to dereference

Hena has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex to dereference by gaal (Parson) on Jan 18, 2007 at 13:30 UTC
Are you trying to find the length of the string? `length` is much better than your `pos` loop for that purpose. Is the last line in your code meant to handle the case you're saying doesn't work in the main substitution? Well, sure the main one doesn't work, because /\\(.)/ doesn't match at the last character. It can't! `$_ .= "\\" if /\\$/` is one way to do it -- no need for a length check at all. By the way, "dereference" is usually understood to mean something else that what you are using it as here. The term you are looking for is probably "escape" or colloquially "backwhack".	[reply] [d/l] [select]
Re^2: Regex to dereference by Hena (Friar) on Jan 19, 2007 at 06:40 UTC
The thing is that if there is only one '\' character, then it should be doubled, but if there is two then not. Basicly if there is even amount then no change, and odd, then change the last. So direct test on last char doesn't really work. But thanks for the term help. As all my perl (and most of my coding as well) is self-taught :).	[reply]
Re^3: Regex to dereference by gaal (Parson) on Jan 19, 2007 at 07:43 UTC
Ah, then you need what's called a zero-width negative look-behind. `$str =~ s/(?<!\\)\\$/\\\\/; # match a backslash at the EOL # that isn't preceded by a backslash` [download] See perlre for more stuff like this.	[reply] [d/l]
Re: Regex to dereference by ferreira (Chaplain) on Jan 18, 2007 at 14:09 UTC
When you ask a question, the monks are happy to see your effort in the shape of the code you tried before asking for help. But when you're confused about how to write down the algorithm or the program or about Perl itself, it could help a prose description of what you're trying to do. Sometimes you'll be surprised about how people may offer simpler solutions you haven't thought about it. Well, after this unsolicited introduction, I am not sure what you're trying to do. Maybe replacing literal escape sequences like '\n' with their meaning "\n", which could be done with code like `my %map = ( '\n' => "\n", '\t' => "\t" ); $_ = "whatever"; s/(\\[ntf\\])/$map{$1}\|\|'?'/g` [download] Or maybe you're trying to do the opposite: turn newlines ("\n"), tabs ("\t") into escapes ('\n', '\t'). Where something like this might work `my %imap = ( "\n" => '\n', "\t" => '\t' ); my $specials = join '\|', keys %imap; $_ = "a\nb\tcksl"; s/($specials)/$imap{$1}/gms;` [download] But you may find useful a ready-to-use module like Text::Quote as well.	[reply] [d/l] [select]
Re^2: Regex to dereference by Hena (Friar) on Jan 19, 2007 at 06:43 UTC
Ah sorry. I tried and obviously didn't do a very good job at it :). Idea was to turn '\' into '\\'. Unless its in conjunction with 'n', 't', 'f', 'r', or '\'.	[reply]
Re: Regex to dereference by Moron (Curate) on Jan 18, 2007 at 14:39 UTC
regexps are fine on their own for one-off use, but for iterative use they are often better wrapped up into a parser, e.g.: #untested use myParser; my %CSV = ('n' => 'n', 't' => 't', 'f' => 'f', '\\' => '\\', ); $_ = '\\\\\\\\\\'; my ($result,$status) = parse( LEXICON=>\%CSV ); package myParser; sub new { my $self = Interface( @_ ); my $class = ref( $self ) \|\| $self; return (bless $self), $class; } sub parse { my $self = Interface( @_ ); my $bufadr = ($self -> { BUFREF } \|\|= \$_); # default buffer $_ my $lexicon = $self -> { LEXICON } or die; my $return = []; my $lexStatus = 1; while( length( $$bufadr ) && lexStatus ) { Throw( $self ); my $content; ( $content, $lexStatus ) = AntiLex( $self, keys %$lexicon ); $lexStatus and push @$return, $content; } return $return, $lexStatus; } sub Throw { AntiLex( shift(), '\S' ); } sub AntiLex { my $self = shift; my $bufadr = $self -> { BUFADR }; my $status; for my $pattern ( @_ \|\| die() ) { if ( $$bufadr =~ /^\n($pattern)(.)$/m ) { $$bufadr = $2; return ( $1, 1 ); } } return ( undef(), 0 ); } sub Interface { return ( ref( $_[0] ) \|\| !$#_ ) ? shift() : { @_ }; } 1; [download] -M Free your mind	[reply] [d/l]
Re: Regex to dereference by Thelonius (Priest) on Jan 18, 2007 at 23:53 UTC
I think this (funcb) does what you want. #!/usr/bin/perl use warnings; use strict; my %CSV = ('n' => 'n', 't' => 't', 'f' => 'f', '\\' => '\\', ); my %CSV1 = ('\\n' => '\\n', '\\t' => '\\t', '\\f' => '\\f', '\\\\' => '\\\\', ); my @same = qw( \\ \\\\ a ab \a \a\b \a\n \n\a \n\n abc\a abc\\a ); my @diff = qw( a\\ \a\\ abc\a\\ abc\a\\\\ \a\b\c\d\e\x\g\h\\ ); print "These are the same as your function\n"; runtests(@same); print "These are different:\n"; runtests(@diff); sub runtests { for my $in (@_) { print " in: $in\n"; $_ = $in; funca(); print " funca: $_\n"; my $savea = $_; $_ = $in; funcb(); printf " funcb: %-30s %s\n\n", $_, $_ eq $savea ? "SAME" : "DIF +FERENT"; } } sub funca { my $pos = 0; while (m/\\(.)/g) { $pos=pos($_);} s/\\(.)/exists($CSV{$1}) ? "\\$1" : "\\\\$1"/ge; s/^(.{$pos})\\$/$1\\\\/; } sub funcb { my $out = ""; pos($_) = 0; s/\G(\\.\|[^\\]+\|\\$)/substr($1,0,1) eq '\\' ? $CSV1{$1} \|\| "\\$1" : +$1/eg; } __END__ # output: These are the same as your function in: \ funca: \\ funcb: \\ SAME in: \\ funca: \\ funcb: \\ SAME in: a funca: a funcb: a SAME in: ab funca: ab funcb: ab SAME in: \a funca: \\a funcb: \\a SAME in: \a\b funca: \\a\\b funcb: \\a\\b SAME in: \a\n funca: \\a\n funcb: \\a\n SAME in: \n\a funca: \n\\a funcb: \n\\a SAME in: \n\n funca: \n\n funcb: \n\n SAME in: abc\a funca: abc\\a funcb: abc\\a SAME in: abc\a funca: abc\\a funcb: abc\\a SAME These are different: in: a\ funca: a\ funcb: a\\ DIFFERENT in: \a\ funca: \\a\ funcb: \\a\\ DIFFERENT in: abc\a\ funca: abc\\a\ funcb: abc\\a\\ DIFFERENT in: abc\a\\ funca: abc\\a\\\ funcb: abc\\a\\ DIFFERENT in: \a\b\c\d\e\x\g\h\ funca: \\a\\b\\c\\d\\e\\x\\g\\h\ funcb: \\a\\b\\c\\d\\e\\x\\g\\h\\ DIFFERENT NT [download]	[reply] [d/l]
Re^2: Regex to dereference by Hena (Friar) on Jan 19, 2007 at 06:47 UTC
Yes. This indeed does what I wanted. Thank you :D.	[reply]