in reply to Logfile parsing across redundant files

Certainly, in Unix, I could do something like:

1) Concatenate files into a single file

2) Then do: `sort -u <concatfile> > <sortedfile>`

... but I suspect this will eventually live on a Windows box.

Get your self the sort program from UnxUtils. It supports -u which appears to be all you require to use your unix solution on a windows box:

c:\>u:sort --help Usage: u:sort [OPTION]... [FILE]... Write sorted concatenation of all FILE(s) to standard output. Ordering options: Mandatory arguments to long options are mandatory for short options to +o. -b, --ignore-leading-blanks ignore leading blanks -d, --dictionary-order consider only blanks and alphanumeric ch +aracters -f, --ignore-case fold lower case to upper case characters -g, --general-numeric-sort compare according to general numerical v +alue -i, --ignore-nonprinting consider only printable characters -M, --month-sort compare (unknown) < `JAN' < ... < `DEC' -n, --numeric-sort compare according to string numerical va +lue -r, --reverse reverse the result of comparisons Other options: -c, --check check whether input is sorted; do not sort -k, --key=POS1[,POS2] start a key at POS1, end it at POS 2 (orig +in 1) -m, --merge merge already sorted files; do not sort -o, --output=FILE write result to FILE instead of standard o +utput -s, --stable stabilize sort by disabling last-resort co +mparison -S, --buffer-size=SIZE use SIZE for main memory buffer -t, --field-separator=SEP use SEP instead of non- to whitespace tran +sition -T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR +or c:/temp multiple options specify multiple direct +ories -u, --unique with -c: check for strict ordering otherwise: output only the first of an e +qual run -z, --zero-terminated end lines with 0 byte, not newline --help display this help and exit --version output version information and exit POS is F[.C][OPTS], where F is the field number and C the character po +sition in the field. OPTS is one or more single-letter ordering options, whi +ch override global ordering options for that key. If no key is given, us +e the entire line as the key. SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, + Y. With no FILE, or when FILE is -, read standard input. *** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values. Report bugs to <bug-textutils@gnu.org>.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Logfile parsing across redundant files
by thezip (Vicar) on Feb 02, 2007 at 06:51 UTC

    Point well taken, TYVM.

    Now what's the Perlish way to solve this?

    Where do you want *them* to go today?

      On the basis of what you've said about the data, it could be as simple as this:

      #! perl -slw use strict; my $dir = $ARGV[ 0 ] || die 'Need a directory'; my %hash; while( my $file = <"$dir/*.log"> ) { open my $fh, '<', $file or die "$file : $!"; while( <$fh> ) { $hash{ $_ } = 1; } close $fh; } open my $fh, '>', "$dir/composite.log" or die $!; print $fh $_ for sort keys %hash; close $fh;

      This assumes that all 31 log files from a particular server are located in a single directory, no other files are in that directory, and that the lines can be sorted using an alphanumeric sort. Eg. Each line carries a date/time stamp at the beginning of the line, and it is ordered in some sensible form (YYYYMMDD HH:MM:SS) that will sort correctly.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Which illustrates my question: Is using the entire line as the hash key *better* than using an MD5 digest of the line as the hash key?

        Where do you want *them* to go today?
Re^2: Logfile parsing across redundant files
by shenme (Priest) on Feb 03, 2007 at 19:07 UTC
    Having hit this just last night I thought I'd ask: how long has UnxUtils actually been unavailable? Clicking either of the .zip links gets you "You don't have permission to access /UnxUpdates.zip on this server."

    I'd love to equip my brethren des fenêtres with some basic tools (like 'wc') but don't want to ask them to do Cygwin.

      Hmm. I hadn't realised that the links didn't work. I just searched for them, but didn't check them as I have had my copies for years. And a quick browse doesn't turn up any reason either?

      If you want a copy, /msg me an email address and I'll forward it to you.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.