Hi there I'm searching for a command line tool working with a stream of lines (tail -f typically) and counting them like : tail -f /var/log/apache2/access.log | cut -d' ' -f1 | SOME_COMMAND and displaying a top-like view as :
52 xxx.xxx.xxx.xxx 12 xxx.xxx.xxx.xxx 6 xxx.xxx.xxx.xxx 2 xxx.xxx.xxx.xxx
It could be so handy, associated for example to this sh :
#!/bin/sh
# NCSA structure :
#IP - - [DATE] "METHOD URL HTTP/VERSION" STATUS LENGTH "REFERER" "USER AGENT"
QUERY=""
while [ "$1" ] ; do
case "$1" in
ip) QUERY="$QUERY"'\1' ;;
date) QUERY="$QUERY"'\4' ;;
method) QUERY="$QUERY"'\5' ;;
url) QUERY="$QUERY"'\6' ;;
version) QUERY="$QUERY"'\7' ;;
status) QUERY="$QUERY"'\8' ;;
length) QUERY="$QUERY"'\9' ;;
referer) QUERY="$QUERY"'\10' ;; # Does not work...
useragent) QUERY="$QUERY"'\11' ;; # Does not work
*) QUERY="$QUERY""$1" ;;
esac
shift
done
sed -r 's/^([^ ]+) ([^ ]+) ([^ ]+) \[([^]]+)] "([^ ]+) ([^"]+) HTTP\/([^"]+)" ([^ ]+) ([^ ]+) "([^"]+)" "([^"]+)"$/'"$QUERY"'/g'
With this command i'm searching and my script you could do : cat somelog | ncsa.sh url | SOME_COMMAND and get an top of your viewed url'z, or referer, or what you want
(and if someone can fix the bug of \10 interpreted as \1 followed by a 0 ... :p )
Have a great day !
-
Are you looking for uniq -c and tr ?
cat /var/log/apache2/access.log | cut -d' ' -f1 | uniq -c | tr -s "\n " " "
From the uniq man page:
Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output). -c, --count prefix lines by the number of occurrences
From the tr man page:
Translate, squeeze, and/or delete characters from standard input, writing to standard output. -s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
To have it sorted descendant:
cat /var/log/apache2/access.log | cut -d' ' -f1 | uniq -c | sort -gr | tr -s "\n " " "
An example of output (I obfuscated the IP's):
87 71.255.255.11 54 95.255.222.255 50 84.255.255.120 50 178.255.255.14 49 92.255.255.240 49 91.255.36.215 49 255.52.126.184 49 217.255.110.23 49 216.255.45.4 49 255.8.27.5
Note: My examples are using cat because I don't think using tail -f would work as there is no End of File, but you could instead just use tail -100 for example and do it periodically.
Slartibartfast : I agree. The performance with tail is going to be bad. As described above (top-like behavior), you'd want to add 'watch' around that whole shebang, making it that much worse. A dedicated application (maybe in python?) would be a decent approach.Mandark : I'm not looking for uniq -c, bug for i use daily {cat/cut/grep/sed} | sort | uniq -c | sort -gr, and this line is really valuable. But i want it to work with data incoming not from a file, so i really need a ... | cut -d' ' -f1 | some_top_cmd.Mandark : Yes in Python i can write it in minutes, or in C, but, i can't imagine that it's not already packaged ! If i need this, someone needed this ans wrote this before me.Weboide : Mandark please check out my new solution. it gives exactly what you are looking for.Mandark : Just read your 6mn old edit, thanks for it, I agree, your line works as i'm actually watching my stats with a very similar line (watching cache hit and misses for a particular subset of pages from my cache log, grepping as : watch './analyse.sh | grep --line-buffered -v "PASS$" | head -n 100 | grep -o "[^ ]*$" | sort | uniq -c | sort -gr' and the result is what i expect, but, it updates every 100 lines, having a program to do it would permit to update the statistics every new lines, to show multiples column as "last 10, minutely, hourly" counts etc...From Weboide -
First version of a program solving this problem commited here :
http://github.com/JulienPalard/logtop
From Mandark
0 comments:
Post a Comment