comment on

<Samuel Jackson Voice>
Allow me to retort!
</Samuel Jackson Voice>

This code is brilliant++ and I am humbled. Since this is listed as a follow-up to my post, I suppose it's only fitting that I post the spoiler :)

To understand what the program is supposed to do, try man wc.

Note, to write this spoiler, I used this this snippet:

    perl -MO=Deparse wc.c > deparse.pl
[download]

When figuring out the code, I cleaned up the formatting and removed the "useless" code, so your output will be different from mine.

To get the C code, I used this:

    gcc wc.c -E > wc.txt
[download]

Highlight the following section to see the code.

# note the comma. This will unshift the first command line
# argument onto @ARGV and set $_
unshift @ARGV, $_ = $ARGV[0];

# clear $_ if its length is not 2
$_ = '' if length $_ != 2;

# if the first argument is -p or -P, print The Perl Journal
if (/-p/i)
{
    # \cH is a backspace
    printf "*\cHThe Perl Journal\n";
    exit 0;
}

# set $i[11] to the length of @ARGV (remember, this is one greater
# than the number of arguments) and set $i[10] to the original length
# (number of files on the command line
$i[10] = ($i[11] = scalar @ARGV) - 1;


# check the while{} at the end
# this will execute once for each file
do
{
    # if we had no command line arguments, we need to set $i[10] to 1 
+to ensure
    # the loop will exit and we need to read our arguments from STDIN.
    if ($i[11] < 2)
    {
        $i[10] = 1;
        *F = *STDIN;
    }
    # if we had arguments, we want to open each file in turn. Note tha
+t O_RDONLY
    # is a filehandle, not a constant. Also note that with the while a
+t the end
    # of this loop, $i[10] is being decremented, so we can loop throug
+h the
    # files this way.
    else
    {
        open O_RDONLY, $ARGV[$i[11] - $i[10]];
        *F = *O_RDONLY;
    }
    # read one byte at a time from the file, until EOF.
    while (read(F, $i, 1) > 0)
    {
        # increment $i[4] by one (thus, this will be the file size)
        ++$i[4];

        # set $_ to whatever byte we read
        $_ = $i;

        # if $_ is a newline, then the match will return 1, thus setti
+ng $i[3] to
        # the number of lines in the file.  ( *pp^0x0A) is superfluous
+ in the
        # Perl program
        # But it is used in the C program.  My C is pretty rusty, but 
+here goes:
        #    i[3] += m = ( *pp^0x0A ) ? 0 : 1;
        #    pp has been set to i, the last character read.  Here, we 
+do an XOR
        #    with 0x0A (a newline character). If any bits are set, we 
+know it's
        #    not a newline, so m is set to 0, else m is set to 1.  i[3
+] is
        #    incremented by m.
        # The /* in the regex appears to be an artifact left over from
+ an
        # embedded comment
        # in the obfu:  $i[3]+=m=( *pp^0x0A)?/*\n=;#*/0:1;
        $i[3] += m[( *pp^0x0A)?/*\n];

#-------------------------------------
# The following section is rather confusing.  It is a word count.  It 
+works by
# setting $i[1] to a true value when it encounters a white space chara
+ter and
# then incrementing $[2] by one when it encounters a non-whitespace ch
+aracter
# (whitespace as defined by the character class in the match)

    # Again, we see the /* as an artifact from the original file:
    #     if(m=/*[  \n\f\r\xB]=#*/q
    # Ff we match a space, newline, formfeed, carriage return, or cntl
+-B(?)
    # I believe this is where the embedded tab should be, but on my sy
+stem,
    # it was transformed to a space.
        if (m[/*[  \n\f\r\xB]])
        {
            # if we've set $i[1], then we want to increment $i[2] by o
+ne
            # and reset $i[1] to 0 (false)
            if ($i[1])
            {
                ++$i[$i[1]];
                $i[1] = 0;
            }
        }
        # if we didn't match, set $i[1] to 2 (which is the index of th
+e array
        # element we wish to increment for counting the above characte
+rs).  For
        # the most part, this means, "set this variable if we have a n
+on-
        # whitespace character"
        else
        {
            $i[1] = 2;
        }
    }
#-------------------------------------

    # if we got this far and $i[1] is true (it will be set to 2), then
+ we have
    # an extra word that we didn't account for, so we add 1 to the wor
+d count
    if ($i[1])
    {
        ++$i[$i[1]];
    }

    # print number of lines, word count, file size, and the name of th
+e file
    printf "%7d %7d %7d %s\n", $i[3], $i[2], $i[4], $ARGV[$i[11] - $i[
+10]];
    close F;

    # if we had more than one argument, we need to total the results
    if ($i[11] > 2)
    {
        # This is setting $i[6] to $i[8] by adding whatever is in @i[2
+..4] and
        # then resetting that value.  When we get to $i[5], it's never
+ been set
        # and is evaluated as zero.  This causes the entire expression
        # '$i[$i[1] + 4] += $i[$i[1]]' to return a zero, evaluating as
+ false and
        # thus terminating the loop.
        for ($i[1] = 2; $i[$i[1] + 4] += $i[$i[1]]; ++$i[1])
        {
            $i[$i[1]] = 0;
        }
        $i[1] = 0;
    }
} while --$i[10];

# if we had more than one argument, we need to print the results
if ($i[11] > 2)
{
    printf "%7d %7d %7d total\n", $i[7], $i[6], $i[8];
}
[download]

I'm not going to break it down, but if you know any C, this should be relatively easy to follow by tracing through the above code. The logic is the same (though there are a few parts that I don't get).

One difference is that when you pass it -p as the first argument, it prints "Hello, world!\n" istead of "The Perl Journal".

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <fcntl.h>

main(int argc, char *argv[])
{
  int m=1, i[14];
  char * pp;
  int p=-1;
  int q, F=3;
  char * qq = "Hello\, world!\n";
  /* red herring? */
  i[12]=537463307; i[13]=3085;

  if (m+-p?(argc>1&&!strcmp(argv[1],"-p"))?p+i? 1 : 1 :  0 :  0)
  {
    printf(qq); exit(0);
  }
  qq="=;#"; argv[0][0]='\0';
  memset(i,0,48);
  i[10]=(i[11]=(q =0) + argc)-1;

  do{
    if(i[11]<2)
    {
      i[10]=1; q =F=0;
    }
    else
    {
      open(  argv[i[11] - i[10]], 0  ) ;
    }
    while(read(F, i , 1)>0)
    {
      ++i[4]^(q=0);
      pp=i;
      i[3] += m=(*pp^0x0A)? 0:1;
      for(qq=&i[12];*qq;*pp^*qq++||(q=1));
      if(m=q)
      {
        if(i[1])
        {
          i[i[1]]++;
          i[1]=0;
        }
      }
      else
      {
        i[1]=2;
      }
    }

    if(i[1])
    {
      i[i[1]]++;
    };
    printf("%7d %7d %7d %s\n", i[3], i[2], i[4], argv[i[11]-i[10]]);
    close(F);
    if(i [11]>2)
    {
      for( i[1]=2;i[i[1]+4]+=i[i[1]];i[1]++)
      {
        i[i[1]]=0;
      };i [1]=0;
    }
  } while(-- i [10]);

  if(i[11]>2)
  {
    printf("%7d %7d %7d total\n",i [7],i [6],i [8]);
  }
}
[download]

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

In reply to (Ovid - Spoiler) Re: C is Perl by Ovid
in thread C is Perl by BooK

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks