Akatsuki has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am new to Perl, and I am strugling to find a regex for this type of string $_ = '"", "2.90", "3.00", "3.10", "3.20", "3.30", "3.40", "3.50", "3.60", "3.70", "3.80", "3.90", "4.00", "4.10", "4.20", "5v"'. I need to find each value between the double quotes and replace them in the same file, so I need to initialise from $1 to $16 variables. The regex used by me is this one: \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\" Is there a shorter or concise expression?

  • Comment on Regex for multiple numbers between double quotes

Replies are listed 'Best First'.
Re: Regex for multiple numbers between double quotes
by Athanasius (Archbishop) on Oct 11, 2016 at 14:18 UTC

    Hello Akatsuki, and welcome to the Monastery!

    It looks like your data may be in CSV (comma-separated values) format. If so, you will be better off (in the long run) to parse it with a dedicated module rather than a regular expression. For example, the following code uses Text::CSV_XS to read the values, then it doubles those which are decimal numbers but leaves the others unchanged:

    use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new({ allow_whitespace => 1 }); my $string = '"", "2.90", "3.00", "3.10", "3.20", "3.30", "3.40", "3.5 +0", ' . '"3.60", "3.70", "3.80", "3.90", "4.00", "4.10", "4.20", "5v" +'; if ($csv->parse($string)) { my @fields = $csv->fields; /(\d+\.\d+)/ && ($_ *= 2) for @fields; print "$_\t" for @fields; } else { $csv->error_diag(); }

    Output:

    0:12 >perl 1707_SoPW.pl 5.8 6 6.2 6.4 6.6 6.8 7 7.2 + 7.4 7.6 7.8 8 8.2 8.4 5v 0:12 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thank you for your answer, but I think I have expressed myself wrong so I will post my entire code If you can help me. I want to open a file, read the values between the double quotes and replace some numbers from this file with their corresponding value from the first line. This is my code:

      #!/usr/bin/perl # matchtest2.plx use warnings; use strict; open (my $original, '<', 'data.txt') or die $!; open (my $fh, '>', 'output.txt') or die $!; while (my $line = <$original>) { if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/) { my $replace = $1; $line =~ s/[^.]00/$replace/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace1 = $2; $line =~ s/01/$replace1/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace2 = $3; $line =~ s/02/$replace2/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace3 = $4; $line =~ s/03/$replace3/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace4 = $5; $line =~ s/04/$replace4/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace5 = $6; $line =~ s/05/$replace5/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace6 = $7; $line =~ s/06/$replace6/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace7 = $8; $line =~ s/07/$replace7/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace8 = $9; $line =~ s/08/$replace8/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace9 = $10; $line =~ s/09/$replace9 /g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace10 = $11; $line =~ s/[^.]10/$replace10/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace11 = $12; $line =~ s/11/$replace11/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace12 = $13; $line =~ s/12/$replace12 /g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace13 = $14; $line =~ s/13/$replace13/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace14 = $15; $line =~ s/14/$replace14/g; } if ($line =~ m/\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", +\"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"( +.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\", \"(.*?)\"/g) { my $replace15 = $16; $line =~ s/15/$replace15/g; } print {$fh} $line; } close $original; close $fh;

      and this is my original file(just a short version, the actual one has thousands of numbers)

      "", "2.90", "3.00", "3.10", "3.20", "3.30", "3.40", "3.50", "3.60", "3 +.70", "3.80", "3.90", "4.00", "4.10", "4.20", "5v" 09 12 12 10 08 08 10 07 10 09 08 08 + 06 10 10 08 09 11 08 10 07 10 11 00 00 10 12 13 10 09 08 08 06 10 01 + 08 09 11 08 10 07 10 11 12 13 05

      I have to replace for example 00 with the value between the first "" so 00 would be empty string, 01 with the value of the second "2.90" so 01 would become 2.9 and so on... Thank you very much for all your help

        Did you look at my previous suggestion regarding the "g" regex modifier? Here's how to use it:

        #!/usr/bin/env perl use strict; use warnings; open (my $original, '<', 'data.txt') or die $!; open (my $fh, '>', 'output.txt') or die $!; my $first = <$original>; print $fh $first; my @replace = $first =~ /"([^"]*)"/g; while (my $line = <$original>) { $line =~ s/(\d{2})/$replace[$1]/g; print $fh $line; } close $original; close $fh;

        Is that concise enough for you?

Re: Regex for multiple numbers between double quotes
by hippo (Archbishop) on Oct 11, 2016 at 13:22 UTC
Re: Regex for multiple numbers between double quotes
by choroba (Cardinal) on Oct 11, 2016 at 13:52 UTC
    Replace them by what?

    Here's one way how to change the format of the lines:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; $_ = '"", "2.90", "3.00", "3.10", "3.20", "3.30", "3.40", "3.50", "3.6 +0", "3.70", "3.80", "3.90", "4.00", "4.10", "4.20", "5v"'; my $regex = join ', ', map '"(.*?)"', 1 .. 16; my $string = join '|', map qq(<$_>), /$regex/; say length $string ? $string : $_;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Regex for multiple numbers between double quotes
by AnomalousMonk (Archbishop) on Oct 11, 2016 at 19:35 UTC

    I think a Text::CSV approach as outlined by Athanasius here might be better in general, but if you're stuck with using regexes, here's another approach.

    I'm also stumped by the "replace by what?" question, so this solution just uses a list of one-for-one replacement strings. Note that due to the arcane demands of Windose, what appears as  [^^\"] below is really the  [^"] complemented character set, and  \" is really  " for the same reason. Note also that Perl version 5.10+ is needed due to the use of the  \K operator; I think a work-around would be | a work-around is fairly simple for pre-5.10 Perls.

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $old = '\"\", \"2.90\", \"3.00\", \"3.10\", \"3.20\", \"3.30\", \"3.40\", +\"4.20\", \"5v\"'; print qq{'$old'}; ;; my @repl = ('', qw(9.11 9.21 9.31 9.41 9.51 9.61 9.71 6wxyz)); my $new = replace_from_list($old, @repl); print qq{'$new'}; ;; ;; sub replace_from_list { my ($string, @list) = @_; ;; my $i = 0; $string =~ s{ \G (?: \" , [^^\"]*)? \" \K [^^\"]* (?= \") } {$repl[ $i++ ]}xmsg; $i == @list or die qq{replacement list mismatch}; return $string; } " '"", "2.90", "3.00", "3.10", "3.20", "3.30", "3.40", "4.20", "5v"' '"", "9.11", "9.21", "9.31", "9.41", "9.51", "9.61", "9.71", "6wxyz"'
    (For reasons of display and comparison, (almost) all the replacement strings are the same length as the sub-strings replaced, but there's no requirement that this be so.)


    Give a man a fish:  <%-{-{-{-<