I started the work based on that regexp. Completely untested. Like that regexp, my code is for C (not C++), so some changes are required. Hopefully I got all the different types of escapes, but I didn't check any documentation.
sub handle_token { my ($type, $value, $pos) = @_; return if $type eq 'comment'; return if $value !~ /byFoo/; print("Found byFoo in $type starting at byte $pos\n"); } # ^ User code # v Lexer code my %str_escapes = ( t => chr(0x09), n => chr(0x0A), r => chr(0x0D), ... ); sub handle_char_escapes { my ($s, $pos) = @_; for ($s) { s/\\([a-wy-z])/ if (exists($str_escapes{$1})) { $str_escapes{$1} } else { warn("Unrecognized escape sequence \"\\$1\"\n"); $1 } /eg; s/\([0-9]+)/ die("Bad input: Octal escape sequence too big in string at po +s $pos\n"); if length($1) > 3 || oct($1) > 255; chr(oct($1)) /eg; s/\x([0-9a-fA-F]+)/ die("Bad input: Hex escape sequence too big in string at pos +$pos\n"); if length($1) > 2 || oct("0x$1") > 255; chr(hex("0x$1")) /eg; s/\\(.)/$1/sg; } return $s; } sub handle_comment { my ($raw_comment, $pos) = @_; handle_token('comment', substr($raw_comment, 2, -2), $pos); } sub handle_string { my ($raw_string, $pos) = @_; handle_token( 'string', handle_char_escapes(substr($raw_string, 1, -1), $pos), $pos ); } sub handle_char { my ($raw_char, $pos) = @_; handle_token( 'char', handle_char_escapes(substr($raw_char, 1, -1), $pos), $pos ); } sub handle_code { handle_token('comment', @_); } sub lex { for ($_[0]) { / \G ( \/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/ ) /xgc && do { handle_c +omment("$1", $-[0]); redo }; / \G ( "(?:\\.|[^"\\]*)" ) /xgc && do { handle_s +tring ("$1", $-[0]); redo }; / \G ( '(?:\\.|[^'\\])*' ) /xgc && do { handle_c +har ("$1", $-[0]); redo }; / \G ( .[^/"'\\]* ) /xgc && do { handle_c +ode ("$1", $-[0]); redo }; / \G \z / && last; die("Bad input\n"); } } # ^ Lexer code # v User code lex($c_source);

In reply to Re^2: parsing c++ and locating string in the code part by ikegami
in thread parsing c++ and locating string in the code part by szabgab

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.