in reply to Re^4: Code Critique
in thread Code Critique

Great!

I did write some more code for you to demonstrate subroutines and I hope amplify my point about indenting.

At some point, you may want to sort these test records by date. You actually have a very good basic date format to work from as there are leading zero's for the month and date.

Below I show some code that has 2 subroutines, one to convert the file's date format into something that can be easily sorted as a simple string and one to convert it back. VERY important here is that it is easy to understand what code belongs to what subroutine!

Clarity of thought and the grouping of those thoughts into logical, readable "units" is the single most important thing that you can do towards writing "great code".

Go for the improvements and report back with progress! I understand more about your record format in the input file. And Perl has some very cool ways of dealing with record processing like this. But at this point, you need to do more work before things can proceed further. And I'm sure you will do that.

#!/usr/bin/perl use strict; use warnings; # the same as #!/usr/bin/perl -w in first line while (<DATA>) { next if /^\s*$/; # skip blank lines chomp; # removes trailing \n but not spaces my ($date) = split (/[,]/,$_); print "date_original = $date\n", "date_sortable = ", year_first($date), "\n", "original date back = ", day_first(year_first($date)), "\n\n"; } sub year_first # convert 20/08/2007 to 2007-08-20 { my ($date) = @_; # one way to get the sub's input value my @tokens = split (/\//, $date); @tokens = reverse @tokens; my $new_date = join('-',@tokens); return $new_date; } sub day_first # convert 2007-08-20 to 20/08/2007 { my $date = shift; # another way to get a single value my @tokens = split (/-/, $date); @tokens = reverse @tokens; my $normal_date = join('/',@tokens); return $normal_date; } =prints date_original = 20/08/2007 date_sortable = 2007-08-20 original date back = 20/08/2007 date_original = 20/08/2008 date_sortable = 2008-08-20 original date back = 20/08/2008 date_original = 04/04/2007 date_sortable = 2007-04-04 original date back = 04/04/2007 =cut __DATA__ 20/08/2007,Erythrocyte sedimentation rate,,3 mm/h 20/08/2008,Total white blood count,,6.7 10*9/L 04/04/2007,Haemoglobin estimation 12.9 g/dL,,12.9 g/dL

Replies are listed 'Best First'.
Re^6: Code Critique
by rhiridflaidd (Novice) on Oct 08, 2010 at 09:42 UTC

    Thank you. My second bit of code, which I've alluded to, is what does the more intense processiong of this data. The point of this is to give the data a common format. Once I have that common format, then I'm doing a lot more with it e.g.

    - Working out trends (using least squares for now)

    - Sorting by date (I added an epoch datafield and sorted from that key as an approach.)

    -Exporting it to csv and a graphing tool.

    My home's up side down at the moment with my study being repainted, and my PC is needing reconstruction - So I haven't had much time to restructure yet - hopefully I'll have time this weekend. Here's an example of the end output.

    http://www.flickr.com/photos/37312673@N05/5059015903/
      I looked at your graph. Nice. Would be interested in knowing the tool that you used for this graph? Graph is labeled as PSA, perhaps Prostate Specific Antigen?

      Getting off the topic of Perl, but some things to consider for the presentation.. What's normal or not for PSA is based upon prostate size, not some absolute value. Normal tissue produces 0.066 ng/mL/cc of prostate volume. Basically any value above 2.5 is worthy of some investigation as to the cause, perhaps BPH (Benign Prostatic Hyperplasia) etc. Evidence of disease is an exponential growth rate, like 2,4,8,16.

      Plot log(PSA) vs time. The logarithmic plot makes it easier to discern trends and many biological markers exhibit this type of growth. If the log plot is a straight line going upwards and to the right, then the slope of that line determines how fast things are progressing. From the slope of the log line, you can determine the "doubling rate", how long it takes for the tumor to double. That is a key prognostic factor and would help you quantify "getting worse fast".

      If you are writing code that is not-for-profit and can be shared amongst institutions for the general benefit of all, I will help you with this project and perhaps other Monks would also. Send a /msg to me and we'll discuss further off-line.

      I apologize for the non Perl part of this thread. But perhaps the tip about plotting log values instead of a linear plot could be counted as relevant.

        Sorry for gbeing away. Real life intervened. The graphs are done using Visifire. PSA is indeed prostate specific antigen. This tool works for any value though, and is designed to look at biomedical data. As you point out the doubling rate, or just the rate of increase is a massively useful value. By plotting a log rate, it makes the graphs easier for us, but risks making them more difficult for the end user. I was planning to open source all of this as soon as I got it into a fit state for others to understand.