in reply to Regex to extract/modify C source string literals?
To successfully deal with strings in C code, you'll also need to deal with a few non-string constructs. In this case, the list is rather small since C strings are always delimited by double quotes ("), so you only need to deal with valid C constructs that might contain such a character.
Characters: '"' Comments: /* " */ C++ comments: // "
So you can write a cheap little parser to pull out quoted strings:
Then you can extend that to replace strings as well.my $parser= qr{ \G # Don't skip anything (?: [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($1) | (.) # An error ($2) ) }xs; my $code= do { local($/); <CCODE> }; my @strings; while( $code =~ m/$parser/g ) { if( defined $1 ) { push @strings, $1; } elsif( defined $2 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } }
Update: Enlil was kind enough to point out that '[^']+' won't match '\''. I replaced that part. Note that I support the rather strange:
which I can't recall whether ANSI C officially allowed or disallowed. (:#define ctrl(char) ( 'char' & 31 )
And here is a hint at how to extend it to support replacing strings:
- tye#!/usr/bin/perl -p0777 -i.org my $parser; BEGIN { $parser= qr{ \G # Don't skip anything ( [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($2) | (.) # An error ($3) ) # Entire match ($1) }xs; } s{$parser}{ if( defined $3 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } if( defined $2 ) { my $string= $2; #... manipulate $string ... $string; } else { $1; } }g;
To see test script,
use strict; use warnings; BEGIN { @ARGV= map glob($_), @ARGV if "MSWin32" eq $^O } my $parser= qr{ \G # Don't skip anything (?: [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($1) | (.) # An error ($2) ) }xs; for my $file ( @ARGV ) { print "$file:\n"; # Note: Dangerous use of <> (until Perl gets fixed) my $code= do { local(*ARGV,$/); @ARGV= $file; <> }; my @strings; while( $code =~ m/$parser/g ) { if( defined $1 ) { push @strings, $1; } elsif( defined $2 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } } for my $i ( 0 .. $#strings ) { printf qq[%8d: "%s"\n], 1+$i, $strings[$i]; } }
|
|---|