2

As the question heading states, given a packet capture I want to extract the top 5 flows for TCP (or UDP) sorted based on total bytes in the descending order.

I have come up with this so far
tshark -r test.pcap -q -z conv,tcp | sed "1,5d" | head -n -1 | sort -r -k5 | head -n 5

The sed and head commands are to remove the first 5 lines and last line followed by sorting the column 5 and truncating the output to top 5 lines using head again.

An example of just the tshark command output looks like this (heading rows and last row removed):

10.215.173.1:49248         <-> 49.44.185.78:443                84 312 kB         78 10 kB         162 323 kB      215.775760000        12.0809
10.215.173.1:49212         <-> 49.44.185.78:443                83 312 kB         76 10 kB         159 322 kB      215.740042000        12.1151
10.215.173.1:49302         <-> 49.44.185.78:443                79 211 kB         80 9876 bytes     159 221 kB      215.811485000        12.0465
10.215.173.1:49242         <-> 49.44.185.78:443                82 312 kB         76 10 kB         158 322 kB      215.771412000        12.0851
10.215.173.1:49134         <-> 49.44.185.78:443                80 311 kB         76 10 kB         156 322 kB      215.647900000        12.2038
10.215.173.1:49202         <-> 49.44.185.78:443                83 312 kB         73 10 kB         156 322 kB      215.728497000        12.1263
10.215.173.1:49290         <-> 49.44.185.78:443                77 211 kB         78 9700 bytes     155 221 kB      215.803830000        12.0538
10.215.173.1:49278         <-> 49.44.185.78:443                77 211 kB         77 9612 bytes     154 221 kB      215.797622000         7.7149
10.215.173.1:49342         <-> 49.44.185.78:443                74 211 kB         75 9436 bytes     149 220 kB      215.866905000        11.9925
10.215.173.1:49360         <-> 49.44.185.78:443                73 211 kB         74 9348 bytes     147 220 kB      215.895946000        11.9642

Columns in Order: Source ip:port Destination ip:port Incoming Packets:Bytes Outgoing Packets:Bytes Total Packets:Bytes Relative start Duration of flow

I think you can see the problem here, some values are in kB and others in just bytes, since sort only works on numeric values, the result will be wrong. And even if all the values were in kB the sort seems to give the wrong output, meaning I am using it the wrong way.

How do I convert all relevant bytes column related values to kB and then sort the output the right way?

Any other alternative approach using tshark is also accepted.

Trevor Philip
  • 43
  • 1
  • 4

2 Answers2

0

The cleanest way to do what you are asking for would be to find a way for tshark to print the actual (machine readable) numbers, so that you can easily sort. Unfortunately, tshark seems to have changed the way they print these values (from machine readable to human readable) in version 3.3.0 and looking at the source code this does not seem to be configurable, neither with a command line option, nor with one of the preferences.

Lacking this option, the easiest way I can see you accomplishing this, is by trying to convert the human readable format to the human readable format that sort -h understands, i.e. without spaces between the number and the kB and without the unit bytes.

Something like this should do the trick:

tshark -r test.pcap -q -z conv,tcp |
    sed "1,5d" |
    head -n -1 |
    sed -E -e 's/ ([kMGT]B )/\1/g' |
    sed -e 's/ bytes /     /g' |
    sort -h -r -k5 |
    head -n 5

But again, the optimal solution would be if anybody was to update tshark and add an option to have the presentation of these values configurable (human/machine-readable). There is a reason that this format is called human-readable and is not expected to be parsed by a machine.

gepa
  • 1,216
0

After looking at the comments and answers, I think it is better to parse the output of tshark and use some programming language to infer required results.
I think using python with pandas package makes this task very easy and simple for me than using sed and sort linux CLI tools.
I know this was not the intended approach to proceed but is time saving and easier.

Trevor Philip
  • 43
  • 1
  • 4