comment on

So, the Java 1.4 documents are beginning to come out... and they are incredibly excited about the regular expression support and just how *easy* string processing is getting in java. As an example, here is the program the document suggests for creating a histogram of all of the words in a file:

 import java.io.*;
 import java.nio.*;
 import java.nio.channels.*;
 import java.nio.charset.*;
 import java.util.*;
 import java.util.regex.*;

 public class WordCount {
  public static void main(String args[]) throws 
  Exception { 
    String filename = args[0];

    // Map File from filename to byte buffer
    FileInputStream input = new 
    FileInputStream(filename);
    FileChannel channel = input.getChannel();
    int fileLength = (int)channel.size();
    MappedByteBuffer buffer = 
    channel.map(FileChannel.MAP_RO, 0, 
      fileLength);

    // Convert to character buffer
    Charset charset = Charset.forName("ISO-8859-1");
    CharsetDecoder decoder = charset.newDecoder();
    CharBuffer charBuffer = decoder.decode(buffer);

    // Create line pattern
    Pattern linePattern = Pattern.compile(".*$",
      Pattern.MULTILINE);

    // Create word pattern
    Pattern wordBreakPattern = 
    Pattern.compile("[{space}{punct}]");

    // Match line pattern to buffer
    Matcher lineMatcher = 
    linePattern.matcher(charBuffer);

    Map map = new TreeMap();
    Integer ONE = new Integer(1);

    // For each line
    while (lineMatcher.find()) {
      // Get line
      CharSequence line = lineMatcher.group();

      // Get array of words on line
      String words[] = wordBreakPattern.split(line);

      // For each word
      for (int i=0, n=words.length; i<n; i++) {
        if (words[i].length() > 0) {
          Integer frequency = 
          (Integer)map.get(words[i]);
          if (frequency == null) {
            frequency = ONE;
          } else {
            int value = frequency.intValue();
            frequency = new Integer(value + 1);
          }
          map.put(words[i], frequency);
        }
      }
    }
    System.out.println(map);
  }
 }
[download]

Ok... I don't know about you, but if I were a maintenence coder, and I was presented with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that the human mind can hold on average 7 units of information at once... *this* particular program has *considerably* more than 7 logical atoms of information... thereby making it larger than can be held in the mind at one moment. So, let's look at a program that duplicates this functionality in say... perl. Now, I know that Perl isn't the end all be all language, but:

 #!/usr/bin/perl -w

 use strict;

 my %frequency = ();

 $frequency{$_}++ for (split /\W/, <>);
 print "$_: $frequency{$_}\n" for (keys %frequency);
[download]

This program now has variable declaration checking, handles multiple files at the command line, etc... due to use strict, and -w there is a relatively strong guarantee that I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.) Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.

In reply to Efficiency in maintenance coding... by eduardo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.