To successfully deal with strings in C code, you'll also need to deal with a few non-string constructs. In this case, the list is rather small since C strings are always delimited by double quotes ("), so you only need to deal with valid C constructs that might contain such a character.

Characters: '"' Comments: /* " */ C++ comments: // "

So you can write a cheap little parser to pull out quoted strings:

my $parser= qr{ \G # Don't skip anything (?: [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($1) | (.) # An error ($2) ) }xs; my $code= do { local($/); <CCODE> }; my @strings; while( $code =~ m/$parser/g ) { if( defined $1 ) { push @strings, $1; } elsif( defined $2 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } }
Then you can extend that to replace strings as well.

Update: Enlil was kind enough to point out that '[^']+' won't match '\''. I replaced that part. Note that I support the rather strange:

#define ctrl(char) ( 'char' & 31 )
which I can't recall whether ANSI C officially allowed or disallowed. (:

And here is a hint at how to extend it to support replacing strings:

#!/usr/bin/perl -p0777 -i.org my $parser; BEGIN { $parser= qr{ \G # Don't skip anything ( [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($2) | (.) # An error ($3) ) # Entire match ($1) }xs; } s{$parser}{ if( defined $3 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } if( defined $2 ) { my $string= $2; #... manipulate $string ... $string; } else { $1; } }g;

                - tye

To see test script,

look below:
use strict; use warnings; BEGIN { @ARGV= map glob($_), @ARGV if "MSWin32" eq $^O } my $parser= qr{ \G # Don't skip anything (?: [^'"/]+ # Stuff we don't care about | '(?:[^\\']+|\\.)' # '"', '\'', '\\', 'param' | /\* .*? \*/ # A C comment | //[^\n]+ # A C++ comment | / # /, not a comment, division | "((?:[^\\"]+|\\.)*)" # A quoted string ($1) | (.) # An error ($2) ) }xs; for my $file ( @ARGV ) { print "$file:\n"; # Note: Dangerous use of <> (until Perl gets fixed) my $code= do { local(*ARGV,$/); @ARGV= $file; <> }; my @strings; while( $code =~ m/$parser/g ) { if( defined $1 ) { push @strings, $1; } elsif( defined $2 ) { my $char= $2; my $pos= pos($code)-5; $pos= 0 if $pos < 0; my $context= substr( $code, $pos, 10 ); warn "Ignoring unexpected character ($char) in ($context)" +; } } for my $i ( 0 .. $#strings ) { printf qq[%8d: "%s"\n], 1+$i, $strings[$i]; } }

In reply to Re: Regex to extract/modify C source string literals? (cheap) by tye
in thread Regex to extract/modify C source string literals? by rodent

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.