in reply to Re^2: Code Critique
in thread Code Critique

I'm not sure what you want in this second programming problem. Maybe you could explain it a bit more?

This is a very simple CSV file (no quote characters or embedded quotes within quotes), so just split on the comma's and then do another split on space to get the basic value and measurement units from the line. I used what is called a list slice to assign directly to variables without any numeric subscript stuff.

The "formula" for this sort of thing is: open files, read and separate the input data into one "record" (a "quanta" if you will that makes sense for processing), either save that for processing in a collection of those "records" or process the records as you go.

In your first example, I don't think there is a need to save anything past the "current record". For this one, I don't know.. explain more and show more...

#!/usr/bin/perl use strict; use warnings; # the same as #!/usr/bin/perl -w in first line while (<DATA>) { next if /^\s*$/; # skip blank lines # sometimes a last blank line # causes problems! s/\s*$//; # or chomp; fine also my ($date, $description, $measurement) = (split (/,/,$_))[0,1,3]; my ($value, $units) = split (/\s+/,$measurement); print "date = $date\n", "descript = $description\n", "value = $value\n", "units = $units\n\n"; # instead of print, do something here ... } =prints date = 20/08/2007 descript = Erythrocyte sedimentation rate value = 3 units = mm/h date = 20/08/2008 descript = Total white blood count value = 6.7 units = 10*9/L date = 04/04/2007 descript = Haemoglobin estimation 12.9 g/dL value = 12.9 units = g/dL =cut __DATA__ 20/08/2007,Erythrocyte sedimentation rate,,3 mm/h 20/08/2008,Total white blood count,,6.7 10*9/L 04/04/2007,Haemoglobin estimation 12.9 g/dL,,12.9 g/dL
If a new problem is being discussed, please start a new node.

Replies are listed 'Best First'.
Re^4: Code Critique
by rhiridflaidd (Novice) on Oct 05, 2010 at 06:57 UTC
    It isn't a problem. What I was trying to do was give examples of the underlying data structures that was giving me the initial headache (but I forgot to show what comes after the headers that you cleverly deduced) The data structure is in the format
    Patient ID 1234 NAD 5678 Date Of Birth : 01/01/1965 Sex : yes Postcode : BH78 (as you guessed) Then goes on to the gubbins 20/08/2007,Erythrocyte sedimentation rate,,3 mm/h 20/08/2008,Total white blood count,,6.7 10*9/L 04/04/2007,Haemoglobin estimation 12.9 g/dL,,12.9 g/dL
    From that I split the data to 3 files One is a csv of basic information (anonomysed, all done within an encrypted face and DPA registered) and a second file of identifier, valuename, value A third file gives a simple list of every test that the script has come accross( that's the keys bit of the code) So I need to save the identifier as I go along, and add it to the lines that start with a date. Sorry for not being clear in the first place. But the next step will be to restructre all of this anong the lines that you've taught me. Thanks once again.
      Great!

      I did write some more code for you to demonstrate subroutines and I hope amplify my point about indenting.

      At some point, you may want to sort these test records by date. You actually have a very good basic date format to work from as there are leading zero's for the month and date.

      Below I show some code that has 2 subroutines, one to convert the file's date format into something that can be easily sorted as a simple string and one to convert it back. VERY important here is that it is easy to understand what code belongs to what subroutine!

      Clarity of thought and the grouping of those thoughts into logical, readable "units" is the single most important thing that you can do towards writing "great code".

      Go for the improvements and report back with progress! I understand more about your record format in the input file. And Perl has some very cool ways of dealing with record processing like this. But at this point, you need to do more work before things can proceed further. And I'm sure you will do that.

      #!/usr/bin/perl use strict; use warnings; # the same as #!/usr/bin/perl -w in first line while (<DATA>) { next if /^\s*$/; # skip blank lines chomp; # removes trailing \n but not spaces my ($date) = split (/[,]/,$_); print "date_original = $date\n", "date_sortable = ", year_first($date), "\n", "original date back = ", day_first(year_first($date)), "\n\n"; } sub year_first # convert 20/08/2007 to 2007-08-20 { my ($date) = @_; # one way to get the sub's input value my @tokens = split (/\//, $date); @tokens = reverse @tokens; my $new_date = join('-',@tokens); return $new_date; } sub day_first # convert 2007-08-20 to 20/08/2007 { my $date = shift; # another way to get a single value my @tokens = split (/-/, $date); @tokens = reverse @tokens; my $normal_date = join('/',@tokens); return $normal_date; } =prints date_original = 20/08/2007 date_sortable = 2007-08-20 original date back = 20/08/2007 date_original = 20/08/2008 date_sortable = 2008-08-20 original date back = 20/08/2008 date_original = 04/04/2007 date_sortable = 2007-04-04 original date back = 04/04/2007 =cut __DATA__ 20/08/2007,Erythrocyte sedimentation rate,,3 mm/h 20/08/2008,Total white blood count,,6.7 10*9/L 04/04/2007,Haemoglobin estimation 12.9 g/dL,,12.9 g/dL

        Thank you. My second bit of code, which I've alluded to, is what does the more intense processiong of this data. The point of this is to give the data a common format. Once I have that common format, then I'm doing a lot more with it e.g.

        - Working out trends (using least squares for now)

        - Sorting by date (I added an epoch datafield and sorted from that key as an approach.)

        -Exporting it to csv and a graphing tool.

        My home's up side down at the moment with my study being repainted, and my PC is needing reconstruction - So I haven't had much time to restructure yet - hopefully I'll have time this weekend. Here's an example of the end output.

        http://www.flickr.com/photos/37312673@N05/5059015903/