I realized that this is bit too hard using regex because I need to know which one of those three character appears first and recheck that each time I found a string or comment.

I went back to plain scripting and it is actually pretty easier just use index and substr function. Here is my code, code writing service ? not for me.
#!/usr/bin/env perl use strict; use warnings; my $src = do {local $/; <DATA>}; my @strings = (); my @comments = (); my $off_set = 0; my $end_index = 0; while (my ($char, $start_index) = &next_char($off_set)) { last if ($char eq "" && $start_index == -1); if ($char eq '#') { $end_index = index $src, "\n", $start_index + 1; push @comments, substr($src, $start_index, $end_index-$start_index ++1); $off_set = $end_index + 1; } elsif (($char eq '"') || ($char eq "'")) { &capture_string($char, $start_index, $end_index); } } sub capture_string($ $ $) { my $quote = shift; my $start_index = shift; my $end_index = shift; $end_index = index ($src, $quote, $start_index+1); my $char_before = substr $src, $end_index-1, 1; while ($end_index > 0 && $char_before eq '\\') { $end_index = index $src, $quote, $end_index + 1; $char_before = substr $src, $end_index-1, 1; } push @strings, substr($src, $start_index, $end_index-$start_index+1) +; $off_set = $end_index + 1; } print "[Strings]\n"; foreach my $item (@strings) { print "$item\n"; } print "[Comments]\n"; foreach my $item (@comments) { print "$item"; } sub next_char { my %has; my $position = shift; my $s_index = index $src, "'", $position; my $d_index = index $src, '"', $position; my $c_index = index $src, '#', $position; return ("", -1) if ($s_index == -1 && $d_index == -1 && $c_index == -1); $has{$s_index} = "'" if ($s_index >= 0); $has{$d_index} = '"' if ($d_index >= 0); $has{$c_index} = '#' if ($c_index >= 0); my @sorted_keys = sort { $a <=> $b} keys %has; # print "Next char is $has{$sorted_keys[0]}, and position is $sorted +_keys[0]\n"; return ($has{$sorted_keys[0]}, $sorted_keys[0]); } __DATA__ # this is a comment, should be matched. # # "I am not a string" . 'because I am inside a comment' my $string = " #I am not a comment, because I am quoted"; my $another_string = "I am a multiline string with # on each line #, have fun!"; my $descap_string = "I am a \ escaped \" \"string"; # and some comment +s; my $sescap_string = 'I am a \ escaped \' \'string'; # and some comment +s; my $empty_d =""; my $empty_s ='';
And here is the result I wanted
[Strings] " #I am not a comment, because I am quoted" "I am a multiline string with # on each line #, have fun!" "I am a \ escaped \" \"string" 'I am a \ escaped \' \'string' "" '' [Comments] # this is a comment, should be matched. # # "I am not a string" . 'because I am inside a comment' # and some comments; # and some comments;

In reply to Re^2: Multiline string and one line comments by AskandLearn
in thread Multiline string and one line comments by AskandLearn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.