Hans Castorp has asked for the wisdom of the Perl Monks concerning the following question:

Hello Venerable Ones:

I am having a difficult time figuring out why a split operator isn't working (the way I *thought* it should, anyway!).

I have a bunch of Library of Congress subject classifications (those capital letters at the beginning of call numbers on books) that I need applied to each subject heading. This is easy for most, because in most cases one subject goes with one class. However, in some cases, such as with PN, below:

"PN" => "English, Film, Theater",

more than one subject is part of the class. So, I applied split:

### If there is more than one discipline associated with the LC Class, ### build an array of disciplines @subjects = split(/, /, $call_subs{$call});

I want each item in the resulting array to be assigned the PN classification. However, instead of PN being assigned to English, Film, and Theater, PN is being assigned to Theater (or whatever the last item of the array happens to be).

I even tried doing a split on white space to see what would happen to subjects that have more than one word (such as "Art History"), and when I did that the class was also assigned to the last word in the array (in that case, "History").

The script I am working with is 426 lines long and complex (subroutines), so I realize I may not be giving enough information here, but I was hoping that maybe I am missing something simple about how split works. I'll be happy to post more code if you think that will clarify.

Many thanks in advance!

Update

I have managed to pare down the script to about 150 lines by taking out subroutines that I don't believe are the issue, limiting the call_subs hash to three, and taking out a very long SQL query. I hope this will help.

First, the code:

#!/m1/shared/bin/perl use DBI; $ENV{'ORACLE_HOME'}="/oracle/app/oracle/product/11.2.0.3/db_1"; $rssdate = `date`; chomp $rssdate; ### Require adjunct scripts to do character reencoding and date format +s require "/m1/scripts/misc/MARCtoLatin.pl"; require "/m1/scripts/misc/date.pl"; ### Change to the working directory chdir("/m1/scripts/newbooks"); ### Query the Voyager database and get lists of new materials by call +number. ### Then, parse the call number and associate it with disciplines as d +efined in ### %call_subs. Save the results to hashes. &GetNewItems; ### Print each item to the web content database and to an RSS XML file #&PrintRecords; close OUT; exit; ###################################################################### ###################################################################### sub GetNewItems { ### Voyager RO User login $dbuser = ""; $dbpassw = ""; ### Hash by call number to associate LC Classes with disciplines %call_subs = ( "PN" => "English, Film, Theater", "PQ" => "Foreign Languages a +nd Literatures", "PR" => "English", ); ### Connect to database $dbh = DBI->connect('dbi:Oracle:', $dbuser, $dbpassw); ### Get statement handle for this SQL stmt ###SQL query removed for brevety's sake # -- # -- execute the query # -- $sth->execute(); $sth->bind_col( 1, \$bib_id ); $sth->bind_col( 2, \$_title_marc ); $sth->bind_col( 3, \$timedate ); $sth->bind_col( 4, \$callno ); $sth->bind_col( 5, \$mfhd_id ); $sth->bind_col( 6, \$trash ); $sth->bind_col( 7, \$location ); $sth->bind_col( 8, \$loca_id ); $sth->bind_col( 9, \$isbn ); open(OUT, ">out"); ### Process each record returned from the Voyager query. while($sth->fetch) { $location =~ s/'/''/gi; ### Build a persistent URL to the catalog record $lucyurl = qq{http://lucy2.skidmore.edu/vwebv/holdingsInfo?bibId +=$bib_id}; ### Use date.pl to convert the timestamp into a readable date $date = $timedate; $display_date = &DisplayDate($date); ### Get the LC Class from the call number $call = substr($callno, 0, 3); $call =~ s/\d*//gi; #print "$call"; ### Convert the title to Latin1 and remove the trailing slash $title = &CharConv($_title_marc); $title =~ s/\/\s*$//gi; ### If there is more than one discipline associated with the LC +Class, ### build an array of disciplines @subjects = split(/,/, $call_subs{$call}); #print "@subjects"; ### For each subject, get the record data and save it to a hash foreach $subject (@subjects) { if ($subject eq 'Dance'){ ($uptocutter,$therest) = split(/\./, $call); $uptocutter =~ s/\D//g; if ($uptocutter < 1580 || $uptocutter > 1799){ next; #print OUT qq{$uptocutter\n}; }else{ &saveRecords; } }else{ &saveRecords; } ### -- End of if ($subject) } ### -- End of foreach $subject } ### -- End of while... } ### -- End of &GetNewRecords sub saveRecords { $subject =~ s/^\s*//gi; #print "$subject"; #print OUT qq{$bib_id, $timedate, $title, $add_date, $subject\n\ +n}; ### Build an index variable and save each field to its own hash $id++; $title{$id} = $title; $timedate{$id} = $timedate; $display_date{$id} = $display_date; $subject{$id} = $subject; $callno{$id} = $callno; $urls{$id} = $lucyurl; $location{$id} = $location; $loca_id{$id} = $loca_id; $isbn{$id} = $isbn; } ### - End of sub saveRecords

An important point: when I print OUT at line 138 I get exactly what I want-- for example:

649790, 20130201032215, America on film : representing race, class, gender, and sexuality at the movies , , English
649790, 20130201032215, America on film : representing race, class, gender, and sexuality at the movies , , Film
649790, 20130201032215, America on film : representing race, class, gender, and sexuality at the movies , , Theater

Update

I finally figured it out, and it DID have something to do with one of the subroutines. There was a loop that was looking for repeated urls and taking them out! That's why I was getting what I wanted in the saveRecords printout but not getting it in the database.

I want to thank you all for your patience. Every time I come to the Monastery I learn something new and valuable, even (maybe especially!) when I'm completely off mark regarding the issue at hand. Many thanks.

Replies are listed 'Best First'.
Re: Split Not Working Correctly?
by kcott (Archbishop) on Jan 08, 2014 at 14:40 UTC

    G'day Hans Castorp,

    See the documentation for split.

    There's nothing fundamentally wrong with the syntax you're using but $call may be an issue. Look at this test code:

    #!/usr/bin/env perl -l use strict; use warnings; my %call_subs = ( "PN" => "English, Film, Theater", ); # Your line gives "Global symbol "$call" requires explicit package nam +e ...": #my @subjects = split(/, /, $call_subs{$call}); # This works: my @subjects = split(/, /, $call_subs{PN}); print for @subjects;

    Output:

    English Film Theater

    You're probably quite correct in saying: "I realize I may not be giving enough information here". Try to write a small script that just reproduces the problem. The guidelines in "How do I post a question effectively?" should help you with this.

    I found parts of your description difficult to follow. For example, you write "instead of PN being assigned to English, Film, and Theater" but nowhere do you perform any such assignment.

    The problem may be a misunderstanding of what split actually does. An indication of actual vs. expected results will help in providing an answer (also explained in the guidelines I linked to above).

    -- Ken

      Many thanks Ken--I'm working on it and will be back later. I knew lack of example code would be a problem....

      Hi Ken, wanted to thank you. Turns out there was a subroutine running that took out all duplicate records. Split was working as it should. ;-) Thank you so much for your help!

Re: Split Not Working Correctly?
by Random_Walk (Prior) on Jan 08, 2014 at 15:21 UTC

    How are you assigning @subjects to PN? Where does the value of $call come from?

    use strict; use warnings; use Data::Dumper; my @subjects = split /, /, "foo, bar, baz"; my %lc_class; $lc_class{PN} = \@subjects; print Dumper \%lc_class;

    Update

    Here is a more extensive example of splitting and storing...

    use strict; use warnings; use Data::Dumper; # a home for our results my %lc_class; # read data from the end of the script # a handy testing tool while (my $line = <DATA>) { chomp $line; # remove newline # split each line on =, optionaly surounde by space, max 2 parts my ($class, $subj) = split /\s*=\s*/, $line, 2; # split the subject on , (optional space) my @subjects = split /\s*,\s*/, $subj; # store a reference to the @subjects array $lc_class{$class} = \@subjects; } # show and tell print Dumper \%lc_class; __DATA__ PN = this, that, the other GN = something RT = test with some space, testwithout, And more space
    Output:
    $VAR1 = { 'RT' => [ 'test with some space', 'testwithout', 'And more space' ], 'PN' => [ 'this', 'that', 'the other' ], 'GN' => [ 'something' ] };

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!

      Nice! Thanks RW. I'm going to try to post an abbreviated form of the script so people can have a better idea of what it's doing. Still working on it, though.

      Hi RW, I just figured out there was a subroutine running that took out all duplicate records. Split was working as it should. ;-)

      Thank you so much for your help. I learn a lot each time I ask for help here.

Re: Split Not Working Correctly?
by toolic (Bishop) on Jan 08, 2014 at 14:28 UTC

      Thanks, toolic. I did use print to see what was coming out (though not with the syntax you used here, which didn't work for me). @subjects contains everything it should--all the subject headings. It's just that some of the subject headings are not getting assigned the class.

      I am reading the documentation you have linked here. Thanks.

        @subjects contains everything it should ... It's just that some of the subject headings are not getting assigned the class.

        The strings in the  @subjects array are correct; OK so far. But I don't understand (and others apparently share my confusion) how a string can be "assigned a class".