hmbscully has asked for the wisdom of the Perl Monks concerning the following question:

This is driving me crazy.

I have a flatfile with some data that I parse for display on a webpage. I do not create the flatfile and if I can fix this on my own without dealing the client, all the better.

The data is tab-delimited and for every line but one, it is happy data
EST-NY 234 5-Oct Springfield MA Springfield College +Townhouse Conference Room http://www.spfldcol.edu/home.nsf/welcome +/visit/directionsc EST-NY 923 18-Oct Salisbury MD Wor Wic Community Colleg +e http://www.worwic.edu/campus/directions.pdf EST-NY 886 19-Oct Frederick MD Hood College http +://www.hood.edu/welcome_to_hood/index.cfm?pid=_maps.htm#a1

But I have this one line... that is SW    328    19-Oct    Houston    TX    University of Houston - Clear Lake        "http://prtl.uhcl.edu/portal/page?_pageid=328,217631,328_217645&_dad=portal&_schema=PORTALP"
Note the "'s around the URL. This is causing me problems. I want to get rid of these stupid "'s.

My first instinct is $ddrul =~ s/\"//g; but that isn't working. I've tried searching on replacing quotes, I can find things with single quotes, but those examples don't seem to work.

This seems so obvious. Please prove me oblivious.

Replies are listed 'Best First'.
Re: Replacing a pesky pair of quotes.
by Eimi Metamorphoumai (Deacon) on Aug 11, 2005 at 20:19 UTC
    What do you mean it "isn't working"? Are you sure your data really is what you think it is? Note that you don't need to \ the " inside the regexp, though it doesn't hurt. Here's test code that works for me.
    #!/usr/bin/perl -lw use strict; use warnings; while(<DATA>){ chomp; my @data = split /\t/; my $ddrul = $data[7]; $ddrul =~ s/\"//g; print "'$ddrul'"; } __DATA__ EST-NY 234 5-Oct Springfield MA Springfield College +Townhouse Conference Room http://www.spfldcol.edu/home.nsf/welcome +/visit/directionsc EST-NY 923 18-Oct Salisbury MD Wor Wic Community Colleg +e http://www.worwic.edu/campus/directions.pdf EST-NY 886 19-Oct Frederick MD Hood College http +://www.hood.edu/welcome_to_hood/index.cfm?pid=_maps.htm#a1 SW 328 19-Oct Houston TX University of Houston - Clear +Lake "http://prtl.uhcl.edu/portal/page?_pageid=328,217631,328_ +217645&_dad=portal&_schema=PORTALP"
    Prints:
    'http://www.spfldcol.edu/home.nsf/welcome/visit/directionsc' 'http://www.worwic.edu/campus/directions.pdf' 'http://www.hood.edu/welcome_to_hood/index.cfm?pid=_maps.htm#a1' 'http://prtl.uhcl.edu/portal/page?_pageid=328,217631,328_217645&_dad=p +ortal&_schema=PORTALP'
      I don't know what else I can think my data is. The example data I supplied is what it is.

      The crux of my code is

      open(INFILE,"$ew_sites_file") || die "Cant open $ew_sites_file for + reading $!\n"; while($line = <INFILE>) { chomp $line; ($unused, $site, $date, $city, $state, $facility, $unused, $dd +url) = split(/\t/,$line); #tab-delimited file $ddrul =~ s/"//g; $ddurl =~ s/\r//; #lose that bad newline break #build the registration site lookup flatfile : state and javas +cript URL open(OUTFILE,">>$ew_regist_parse_file") || die "cant open $ew_ +regist_parse_file, $!\n"; print OUTFILE "$state|$start_tag$site$inbetween$city$inbetween +$state$inbetween$date$inbetween$ddurl$semicolon$city, $state : $date$ +endtag\n"; close OUTFILE; #build the locations flatfile open(OUT2FILE,">>$ew_locate_parse_file") || die "cant open $ew +_locate_parse_file, $!\n"; print OUT2FILE "$state|$city|$facility|$date|$ddurl\n"; close OUT2FILE; } close INFILE;

      The warnings say that the $ddrul =~ s/"//g; is using an initalized value, which I don't understand because it is. The output is two files, an example of one (the simpler one) is:

      TX|Austin|University of Texas at Austin|11-Oct| http://www.utexas.edu/ +cee/tcc/forms/tcclargemap.pdf TX|Houston|University of Houston - Clear Lake|19-Oct|"http://prtl.uhcl +.edu/portal/page?_pageid=328,217631,328_217645&_dad=portal&_schema=PO +RTALP" TX|El Paso|University of Texas - El Paso|25-Oct| http://www.utep.edu/s +earch/campusmaplarge.html
      Still with the extra quotes.
        a use strict would have caught the mispelled variable "ddrul" as opposed to what it should be - "ddurl"
        I'd probably have written it this way.
        while (<INFILE>) { my (undef, $site, $date, $city, $state, $facility, undef, $ddurl) += /\G"?([^\t]*?)"?(?:\t|[\r\n]+$)/g; }
Re: Replacing a pesky pair of quotes.
by samtregar (Abbot) on Aug 11, 2005 at 20:11 UTC
    Try this:

       $ddrul =~ s/"//g;

    You don't need to escape double-quotes in a regex. If that doesn't work you'll need to show us more of your code.

    Alternately you could use Text::CSV_XS or one of the other CSV modules on CPAN. They usually have a tab-delimited mode and should deal with quoted values fine.

    -sam

Re: Replacing a pesky pair of quotes.
by borisz (Canon) on Aug 11, 2005 at 20:10 UTC
    This should work fine. If not " is not part of your string.
    $ddrul =~ s/"//g;
    Boris
Re: Replacing a pesky pair of quotes.
by BUU (Prior) on Aug 11, 2005 at 20:08 UTC
    Eh? What's wrong with the quotes? Looks like perfectly valid CSV type stuff to me.
      I'm parsing the data and putting the URL's in an href based on there being no quotes around the URL I'm reading. <a href=""...some url..""> doesn't work.