Cannot read in multiple lines

Angel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Cannot read in multiple lines by Enlil (Parson) on Nov 03, 2002 at 22:30 UTC
I copied your code and added a } to the end of it so the sub block would be closed and it seems to work fine for me. perhaps, you are only passing one id (I tried it for 1-4), and each came back with the expected value. Only one thing will ever be returned from the `sub get_schoolname` regardless of how many times an id shows up in the file, because it will only return the last value of $schoolname from your sub. -enlil	[reply] [d/l]
Re: Cannot read in multiple lines by adrianh (Chancellor) on Nov 03, 2002 at 22:47 UTC
Works fine for me (once the missing trailing braces were added). Just for fun, we'll refactor it into more idiomatic perl... Add use strict and warnings :-) Split uses $_ by default, so don't need to state it. Since we're not looking at the end of the line we don't need the chomp at all... so we remove it... We don't need the close - the filehandle will automatically be closed when it falls out of scope. Rename $id to $id_to_fetch since it's more descriptive. Since we're only interested in the school and name we can just extract those from the split. Note we drop 'school' from the name since it's redundant. We can then simplify the name return to 'return $name if $id eq $fetch_id;' This gives us... `use strict; use warnings; sub get_schoolname { my $id_to_fetch = $_[0]; local SCHOOLDATA; open(SCHOOLDATA,"school_data.pdt") or die "Can't open school data +file: $!"; while ( <SCHOOLDATA> ) { my ($id, $name) = split /\\|/; return $name if $id eq $id_to_fetch; }; };` [download] update:* localised SCHOOLDATA so that that darn filehandle is actually closed. Thanks to sauoq and tachyon for pointing out my error.	[reply] [d/l]
Re: Re: Cannot read in multiple lines by sauoq (Abbot) on Nov 04, 2002 at 08:34 UTC
Just for fun, we'll refactor it into more idiomatic perl... Making cosmetic changes to code is not "refactoring." The only item you listed that actually makes a bit of real difference is: We don't need the close - the filehandle will automatically be closed when it falls out of scope. and that's terrible advice. The filehandle should absolutely be closed. It's a global (package) variable and probably won't go out of scope until the program ends. Leaving it open is an undocumented and probably undesired side-effect. Oh wait... You also made this subtle yet significant change: Since we're not looking at the end of the line we don't need the chomp at all... so we remove it... The problem with that is that you made an assumption based on a small bit of sample input. What if the input allows some lines to consist of only two fields? If it does, then you might well want the split to remove a newline from the school name. Granted, it will probably work without the split but recommending its removal without more information is irresponsible. By the way, IMHO, his variable names were just fine for the task at hand. I would have chosen them differently and probably closer to the names you picked but that's largely a matter of style. Style is something we all develop for ourselves. It takes time and we find ourselves making niggling little decisions like whether to use $id_to_fetch or $fetch_id as you seem to have struggled with. Personally, I would have stuck with $fetch_id because I think prepositions in variable names are ugly and cumbersome but it's still a style decision. Rewriting code just because you can is rarely a good idea. You should always have a good reason for rewriting something. Even with good reason, you should be careful to keep the functionality equivalent unless you're sure it doesn't work. In other words, if it ain't broke... </rant> -sauoq "My two cents aren't worth a dime.";	[reply]
Refactoring gone wrong (was: Cannot read in multiple lines) by adrianh (Chancellor) on Nov 07, 2002 at 03:01 UTC
Making cosmetic changes to code is not "refactoring." I beg to differ :-) Making cosmetic changes to code is almost the essence of refactoring. According to Fowler (my emphasis): Refactoring (noun): a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing it's external observable behaviour. People sometimes only consider structural refactorings like Replace Type Code with State/Strategy and ignore the smaller refactorings like Inline Temp that can be just as important. Obviously 'easier to understand' is a subjective term (people can argue forever about Inline Temp vs Introduce Explaining Variable) - so take all the 'increases clarity' that follow as 'increases clarity for me'. First off, the ones that I definitely consider to be refactorings: Add use strict and warnings Okay. This is actually not a refactoring in itself, but it's the first part of the common perl big refactoring 'Make Everything Compile with use strict and warnings'. The end result of this refactoring shouldn't alter behaviour and makes things easier to understand and cheaper to modify in the future. Split uses $_ by default, so don't need to state it Doesn't alter behaviour. You could even consider this the latter half of Remove Parameter if you look at it a certain way. Increases clarity. Valid refactoring. Rename $id to $id_to_fetch since it's more descriptive Doesn't alter behaviour. Increases clarity. Valid refactoring. Since we're only interested in the school and name we can just extract those from the split .... Doesn't alter behaviour. Does increase clarity. Valid refactoring. We can then simplify the name return to 'return $name if $id eq $fetch_id;' Doesn't alter behaviour. For me, this does increase clarity. Valid refactoring. Next a bad refactoring: Since we're not looking at the end of the line we don't need the chomp at all ... You're quite correct and this only applies if each line has more than two fields. I should have made that assumption explicit in the code or left the chomp in. Finally bug-fixes gone wrong. I said: We don't need the close - the filehandle will automatically be closed when it falls out of scope. And you said. that's terrible advice. The filehandle should absolutely be closed. It's a global (package) variable and probably won't go out of scope until the program ends. Leaving it open is an undocumented and probably undesired side-effect. Quite correct. I'm stupid. It was late. Foolish Adrian. It's instructive that I made this error because of a classic mistake. Let's look at the original code again: `sub get_schoolname( $ ) { my $id = $_[0]; my $schoolname; #get groupname from current database open(SCHOOLDATA,"school_data.pdt") or die "Can't open school data +file: $!"; while ( <SCHOOLDATA> ) { chomp($_); @split_record = split( /\\|/ , $_ ); print $split_record[0]; if( $split_record[0] eq $id ) { $schoolname = $split_record[1]; } } close( SCHOOLDATA ); return( $schoolname ); };` [download] Now, when I looked at this I actually did notice that it stomped all over the global SCHOOLDATA :-) What I should have done at this point is write a failing test. However, due to a little too much hubris some variant of "The code is so simple! Who needs a test to demonstrate the bug" went through my head. Bad Adrian. Later on I coding an extra close into the loop so that the file handle would be closed after the early exit in my version. As soon as I'd finished I thought - "Well that's pointless. Once you've localised SCHOOLDATA it will be closed when it falls out of scope" and removed both close statements. I then wrote up my nice little decision to remove the close - and promptly forgot to go add the `local` that would make it valid! So, instead of removing one bug I added a new one. Very bad Adrian. Now, if I'd been a good little developer and written some failing tests: `use Test::More tests => 4; { local SCHOOLDATA; open(SCHOOLDATA, "school_data.pdt") or die; is(get_schoolname(2), 'bunny werks', 'existing id found'); is(tell(SCHOOLDATA{IO}), 0, 'filehandle still open'); }; { local SCHOOLDATA; open(SCHOOLDATA, "school_data.pdt") or die; is(get_schoolname(-1), undef, 'bad id rejected'); is(tell(SCHOOLDATA{IO}), 0, 'filehandle still open'); };` [download] I wouldn't have humiliated myself in front of my peers like this :-) So the moral of the story is: Always write a failing test when you find a bug Don't put your bug-fixing and refactoring hats on at the same time By the way, IMHO, his variable names were just fine for the task at hand. I would have chosen them differently and probably closer to the names you picked but that's largely a matter of style. Style is something we all develop for ourselves. It takes time and we find ourselves making niggling little decisions like whether to use $id_to_fetch or $fetch_id as you seem to have struggled with. Personally, I would have stuck with $fetch_id because I think prepositions in variable names are ugly and cumbersome but it's still a style decision. Didn't have to struggle with the name change. Took longer to type than it did to think about. While I agree that it is a style decision I think the fact that both of us would have chosen something other than '$id' is a pointer to the fact that the original choice wasn't as clear as it could be. Not wrong - just not as descriptive as it could be. Rewriting code just because you can is rarely a good idea. You should always have a good reason for rewriting something. For me increasing the clarity of the code (for some definition of clarity) is a good enough reason. Even with good reason, you should be careful to keep the functionality equivalent unless you're sure it doesn't work. Quite correct. Fouled up with the close & chomp changes. In other words, if it ain't broke... I tend to disagree. This approach can turn the code base of a large project into a Big Ball Of Mud in a surprisingly small period of time. Over the years I'm spending more and more time Refactoring Mercilessly and finding it works very well. <Adrian wanders off to write out 'always write a failing test for a bug' 1000 times>	[reply] [d/l] [select]
What is refactoring? (was: Refactoring gone wrong) by sauoq (Abbot) on Nov 07, 2002 at 23:24 UTC
Re: What is refactoring? by adrianh (Chancellor) on Nov 10, 2002 at 00:23 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Cannot read in multiple lines by tachyon (Chancellor) on Nov 04, 2002 at 10:19 UTC
We don't need the close - the filehandle will automatically be closed when it falls out of scope. Actually that is wrong. FILE globs are not locally scoped. `C:\>type test.pl sub open_file_handle { open FILE, ">c:/test.txt" or die $! } open_file_handle(); print FILE "This file handle is not closed!" or die $! C:\>perl test.pl C:\>type test.txt This file handle is not closed! C:\>` [download] If you really feel that not explicitly closing your filehandles is good programming practice and want to depend on Perl doing it for you you need to use `open my $fh, $file or die $!;` [download] From memory this is only valid for Perl 5.6+ cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re^3: Cannot read in multiple lines by adrianh (Chancellor) on Nov 07, 2002 at 03:04 UTC
Quite correct. Me very very foolish. Bad Adrian.	[reply]
Re: Re: Cannot read in multiple lines by Angel (Friar) on Nov 04, 2002 at 00:01 UTC
Ok I looked at it deeper and it seems that the entire file is being read in as one line ( ignoring the /n ) how do i fix that? ANd think you for the help already :)	[reply]
Re: Re: Re: Cannot read in multiple lines by Enlil (Parson) on Nov 04, 2002 at 00:09 UTC
I would guess that at some point you assigned ~~`$\`~~ `$/` a new value, or perhaps undefined it. If you need this you should put it in a block and localize the assignment, like so: `{ local $\ = undef; $_ = <SOME_FILE>; }` [download] `{ local $/ = undef; $_ = <SOME_FILE>; }` [download] This will keep ~~`$\`~~ `$/` from continuing to be redefined as whatever you might have set it as (and set back to the default newline), once you have left the block. Update: As dingus pointed out. I did mean `$/` as opposed to `$\`, which is the output record separator. So I have changed all instances in the code and response above. Thanks dingus -enlil	[reply] [d/l] [select]
Re: Re: Re: Re: Cannot read in multiple lines by dingus (Friar) on Nov 04, 2002 at 13:06 UTC
Re: Cannot read in multiple lines by DamnDirtyApe (Curate) on Nov 04, 2002 at 02:35 UTC
At first glance, here's how I would approach your problem: #! /usr/bin/perl use strict ; use warnings ; $\|++ ; # First, read the data file into an array. open SCHOOLS, 'schools.dat' or die "Cannot open file: $!" ; chomp( my @rows = <SCHOOLS> ) ; # Next, rip each row into an array ref of fields, making a 2d array. my @table = map { [ split /\\|/ ] } @rows ; # Last, test out the function. print "ID 4 matches " . get_schoolname( 4 ) . "\n" ; print "ID 3 matches " . get_schoolname( 3 ) . "\n" ; print "ID 2 matches " . get_schoolname( 2 ) . "\n" ; print "ID 1 matches " . get_schoolname( 1 ) . "\n" ; sub get_schoolname { my $id = shift ; for ( @table ) { return $_->[1] if $id == $_->[0] ; } } exit ; [download] A couple of things to consider: This function returns as soon as it finds an ID match. If there's a lot of records in your data file, and there's some that are accessed more frequently than others, it would be advantageous to have those close to the top of the table. If you are always going to be looking up your data by ID, you might want to put the data into a hash, rather than a 2d array. This will simplify your lookups quite a bit. Hope that helps... _______________ DamnDirtyApe Those who know that they are profound strive for clarity. Those who would like to seem profound to the crowd strive for obscurity. --Friedrich Nietzsche	[reply] [d/l]
Re: Re: Cannot read in multiple lines by chromatic (Archbishop) on Nov 04, 2002 at 06:58 UTC
That works, but I have a few questions. Why read the whole file into an array? Why process each record twice? Why perform a numeric comparison? Why the following line? `$\|++ ;`	[reply] [d/l]
Re: Re: Re: Cannot read in multiple lines by DamnDirtyApe (Curate) on Nov 04, 2002 at 09:28 UTC
Why read the whole file into an array? To prevent the file being read every time the function is called. Of course, it isn't strictly necessary, but it seemed appropriate here. Why process each record twice? If you mean why read each record in and then split them all, I felt it simplified the lookups. You could split a record once it's been matched instead, but this does it all up front. It still runs in order-n time, so unless the table is huge, it shouldn't be a big deal. Why perform a numeric comparison? Because the IDs appeared to be numeric. Is this somehow worse than using `eq`? Why the following line? `$\|++ ;` Part of my emacs template for Perl files; it prevents output buffering. For more info, see perlvar or Suffering from Buffering. If you've got reasons why these methods are poor, I'm all ears. _______________ DamnDirtyApe Those who know that they are profound strive for clarity. Those who would like to seem profound to the crowd strive for obscurity. --Friedrich Nietzsche	[reply] [d/l] [select]
Re:x4 Cannot read in multiple lines by grinder (Bishop) on Nov 04, 2002 at 14:00 UTC
Re: Re: Re: Re: Cannot read in multiple lines by Angel (Friar) on Nov 05, 2002 at 01:00 UTC
Re: Re: Re: Re: Re: Cannot read in multiple lines by DamnDirtyApe (Curate) on Nov 05, 2002 at 02:24 UTC