agustina_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks... I am very confused with the quotes '"'in Perl.. Actually.. when I want to print or search '"' in some var, do I have to use \" instead of only ". Actually I have tried using \" but it still give me some error. Below are input and output file:
INPUT DATE 13-JUN-2000 COMMERCIAL SUPPLIERS SEQUENCE /exon="49-333" /intron="1-48;334-385" // DATE 14-JUN-2000 COMMERCIAL SUPPLIERS SEQUENCE /exon="1-120" /intron=" " // OUTPUT EXPECTED DATE "13-JUN 2002" Exon {Translation%49-333} Intron {Translation%1-48} Intron (Translation%334-385} DATE "14-JUN 2002" Exon {Translation -} Intron {Translation%1-120} REAL OUTPUT "ATE "13-JUN-2000 }xon {Translation%49-333 Intron {Translation%1-48} }ntron {Translation%334-385 "ATE "13-JUN-2000 }xon {Translation%1-120 }ntron {Translation%
This is my code:
#!/usr/local/bin/perl -w # A program that accept an input file: Scorpion database from Gen Bank # and will output the database in BioWare format my $file1="$ARGV[0]" #var to save the input database my $result=">".$ARGV[1]; my $counter=1; open(INFO1,$file1) or die "Can't open $file1.\n";#open file1 open(OUT,$result) or die "Can't open $result.\n"; #foreach line in the files foreach(<INFO1>) { if(/^DATE\s*(.*)-(.*)-(.*)/){ print 'DATE'."\t".'"'."$1-$2-$3".'"'."\n"; } elsif(/\s*\/intron=(.+)\n/) { my $item; my $local=$1; $local =~ s/\"//g; foreach $item (split('\;',$local)) { print "Intron\t \{Translation%$item\}\n"; } #end foreach } #end elsif elsif(/\s*\/exon=(.+)\n/) { my $item; my $local=$1; $local =~ s/\"//g; foreach $item (split('\;',$local)) { print "Exon\t", " \{Translation\%","$item\}","\n"; }#end foreach }#end elsif }
I have also tried to put the date together in one string:
print "DATE\t \"$1-$2-$3\"\n";
But it also give me the same result as above.

The Gory details of parsing quoted constructs ( in perldoc ) doesn't really help me. Is there something wrong in my code? Thanks so much...

Replies are listed 'Best First'.
Re: strange quotes
by jmcnamara (Monsignor) on Feb 04, 2002 at 09:05 UTC

    For the printing you should use the qq{} operator. See perlop for details. You could then write you code as:    print qq{DATE\t "$1-$2-$3"\n};

    When I ran your program I got the "Expected" ouput rather than the "REAL" output. Perhaps your data source was different.

    Just a few comments:

    It would be better to use \d and \w in the date matching regex.
    It would be better to use a negated character class to match the Inron/Exon data.
    On the main loop it is better to use while instead of foreach
    Braces don't have to be escaped in a quoted string.
    I would rewrite the main block of your code something like this:
    while (<INFO1>) { if (/^DATE\s*(\d+)-(\w+)-(\d+)/) { print qq{\nDATE\t"$1-$2-$3"\n}; } elsif (/\s*\/intron="([^"]+)/) { foreach my $item (split('\;',$1)) { print "Intron\t{Translation%$item}\n"; } } elsif (/\s*\/exon="([^"]+)/) { foreach my $item (split('\;',$1)) { print "Exon\t{Translation%$item}\n"; } } }
    --
    John.

Re: strange quotes
by hossman (Prior) on Feb 04, 2002 at 09:19 UTC
    It's accutally not a quote problem, (print "DATE\t \"$1-$2-$3\"\n"; is correct)

    The problem appears to be the format of your input file, and your regex. It looks like your files are using "\r\n" as the line terminator, but your script is running on a system that uses "\n" by default.

    The .* matches all characters EXCEPT "\n", but the \r matches -- and it gets slurped up in the $n variables of your regexes. The result is that you are printing those \r's back out -- and when you print a \r in a terminal, it causes the cursor to return to the begining of the currentline, and the reamining text on that line overrites the existing text (in each case in your script, theres only one letter left prior to the next \n -- the double quote character.)

    The simplest way to solve your problem, is probably to set $/ = "\r\n" prior to your for loop, and use chomp inside your loop (you should also get rid of those \n in your regexes -- they aren't needed.)

Re: strange quotes
by particle (Vicar) on Feb 04, 2002 at 14:27 UTC
    you've asked questions about this code before, at about regular expression. you took ryan's advice, which gets your regular expressions closer to working, but is still broken. it will fail if intron or exon have no value between quotes (eg.             /exon="").

    i think you should go back and read all the responses to your query. there's some good advice in there (okay, it's from me and trs80), including code samples *with comments!* that will help you understand what you are doing. also, there's some good recommended reading... no required reading. i'll put those links here again.

    some nodes you might want to read are:
    while or foreach?
    Opening files
    Use strict warnings and diagnostics or die
    Death to Dot Star!

    as well as shift, FileHandle, perlre, split, and while.

    please, use strict!

    ~Particle

Re: strange quotes
by very empty (Scribe) on Feb 04, 2002 at 09:50 UTC
    Hiho,
    I just tried to reproduce your problem and everything worked fine! I am not too good with perl regexps but I changed only things which connot have any relation with your problem. Here is some modified code:
    #!/usr/bin/perl -w #A program that accept an input file: Scorpion database from Gen Bank #and will output the database in BioWare format my $file1="$ARGV[0]"; #var to save the input database my $counter=1; open(INFO1,$file1) or die "Can't open $file1.\n";#open file1 foreach(<INFO1>) { if(/DATE\s*(.*)-(.*)-(.*)/){ print 'DATE'."\t".'"'."$1-$2-$3".'"'."\n"; } elsif(/\s*\/intron=(.+)\n/) { my $item; my $local=$1; $local =~ s/\"//g; foreach $item (split('\;',$local)) { $item = " - " if $item =~ " "; print "Intron\t \{Translation%$item\}\n"; } } #end elsif elsif(/\s*\/exon=(.+)\n/) { my $item; my $local=$1; $local =~ s/\"//g; foreach $item (split('\;',$local)) { $item = " - " if $item =~ " "; print "Exon\t", " \{Translation\%","$item\}","\n"; } } }
    The significant changes are
    • Added a semicolon after  my $file1="$ARGV[0]"
    • Removed the circonflex in   if(/^DATE\s*(.*)-(.*)-(.*)/)
    • Added the if-statement to handle empty transaction-strings
    Now I get the following output with your example
    michael@trul:~/> perl prog.pl input DATE "13-JUN-2000" Exon {Translation%49-333} Intron {Translation%1-48} Intron {Translation%334-385} DATE "14-JUN-2000" Exon {Translation%1-120} Intron {Translation% - }
    So in my opinion everything works as desired.

    Regards
Re: strange quotes
by moodster (Hermit) on Feb 04, 2002 at 09:21 UTC
    Except that I had to slap a ';' onto the second my declaration near the top, and there were some complaint about unintialized values in concatenation, you script worked fine and produced the desired results. I'm running v5.6.1 under cygwin.

    Whether you have to escape your quotes or not depends on what kind of surrounding quotes you're using. If the surrounding quotes are ", then you don't have to escape ' qutoes, but other " quotes MUST be escaped (otherwise they mark the end of the string and will probably cause compilation errors). And vice versa.

    If you are using a mixture of single and double quotes you may find it more comfortable the qq construct for strings:

    my $string = qq( A "string" with lots of 'quotes' );

    I notice, BTW, that you open an output file but never writes anything to it; I assume that you changed this for debugging purposes... :)

    Cheers,
    -- moodster