2

I am trying to extract the timestamp and the number string in the URL called in an apache logfile that looks like this:

123.456.78.90 - - [16/Dec/2014:06:27:30 +0100] "GET /servlet/something.something=%2B2341231231234&subappid=hello&pass=hello&from=somebody&dlrreq=true&intflag=TRUE HTTP/1.1" 200 31 "-" "python-requests/2.5.0 CPython/2.7.3 Linux/2.6.32-431.el6.x86_64"

So far I'm able to use awk to extract the timestamp and the entire URL.

awk '{print $4,$5} {print $6}' /var/log/httpd/access_log

Please how can I strip out just the number string 2341231231234 so that just the timestamp and this string are on the same line?

1 Answers1

1

Assuming that all your lines have the same format for URL, you could get the timestamp and number string with a sed command like this one:

$ sed -r 's|.*\[(.*)\].*=%(.*)&sub.*|\1 \2|g' /var/log/httpd/access_log
16/Dec/2014:06:27:30 +0100 2B2341231231234

That expression takes whatever exist inside [ and ] (should be the timestamp) and whatever exists between =% and &sub (should be the number string).

jherran
  • 1,949