Help me with my command (primarily awk)

Question

I just wrote the most contorted command I've ever written and I want to know how I may make it better.

I wrote this:

grep -E '00[7-9]\.|0[1-9][0-9]\.' filename.log | awk '{print $6}' | sed 's/\(.*\):.*/\1/' | sort | uniq -c | sort -rn

An example input:

2011/06/30 07:59:43:81 20626 code_file.c (252): FunctionName: 009.63 seconds

Basically what it's doing is going through a log file that list the number of seconds that it took a command to execute and grabbing any of them that took between 7 and 99 seconds to execute. Then awk is printing the sixth word, which is the function name followed by a colon. Then sed is removing the colon and any trailing whitespace, then it's getting sorted, counted, and then sorted based on it's count.

I'm on HP-UX so some of my tools are limited, but I know that awk can do what I just did with sed. Can someone help me de-complicate my command?

geekosaur · Accepted Answer · 2011-07-01T00:34:33.243

awk '/00[7-9]\.|0[1-9][0-9]\./ { # for lines matching the regex
       split($6, c, /:/)         # take the part of field 6 before the colon
       cs[ c[1] ]++              # and increment the counter for that string
     }
     END {                       # after all lines have been read
       for (c in cs) {           # step through the counters
         print cs[c], c          # and output the count followed by the string
                                 #   ("," adds a space automatically)
       }
     }' filename.log | sort -rn  # standard awk doesn't support sorting, sadly

I continue to be amazed at the number of people who apparently believe that neither awk nor sed can do pattern matching, so they have to add a grep invocation.

grawity · Answer 2 · 2011-06-30T21:13:14.020

1

I'm so going to be downvoted for this...

#!/usr/bin/env perl
use strict;

my %counts;
while (my $line = <>) {
    my @line = split(/\s+/, $line);
    if ($line[6] >= 7) {
        $line[5] =~ /(.+):/ and $counts{$1}++;
    }
}

my @sorted = sort {$counts{$b} <=> $counts{$a}} keys %counts;

printf("%7d\t%s\n", $counts{$_}, $_) for @sorted;

edited Jun 30 '11 at 21:13

answered Jun 30 '11 at 21:06

grawity

501,077

score 1 · Answer 3 · answered Jul 01 '11 at 02:11

Your command is a bit brittle as it will fail if the filename has a space in it. Otherwise, your command is actually not too bad. It somewhat a matter of taste, but I find a chain of simple piped commands much easier to grok than one complex command, such as the large awk someone posted. It's almost likely programming in a functional style.

You could, however, change the grep to eliminate the awk and sed, but now the regex is much harder to understand:


grep -P -o '(?<=\): ).+?(?=: 00[7-9]|0[1-9]|1)' | sort | uniq -c | sort -nr

To explain the regex, we use perl style re (-P param) and use look behind (?<=) and look-ahead (?=) to isolate the match to exactly the function name. Note that the look-behind and look-ahead are zero-width, meaning they aren't considered part of the match, but control what the match actually will be. Since the match is now exactly the function name, we can use -o to tell grep to only print the matching string rather than the entire line. I think you should leave what you have, unless you think a filename with spaces is a possibility.

score 0 · Answer 4 · answered Jun 30 '11 at 22:17

While I'm at it:

#!/bin/sh
grep -E '00[7-9]\.|0[1-9][0-9]\.' "$@" | awk '{print $6}' |
    sed 's/:$//' | sort | uniq -c | sort -rn

The original command is not that complicated, it's the repetition for every log that makes it look so. Stick it into a script file (or a function), call it sortbytime, and there – you have a simple one-word command.

Help me with my command (primarily awk)

4 Answers4