1

I try to enter a pcap file into elasticsearch. So I first convert the pcap file to json like this:

tshark -T ek -j "http tcp ip" -x -r file.pcap > file.json

Then I want to load it up to elasticsearch like this:

curl -H "Content-Type: application/json" -XPOST 'localhost:9200/_bulk?pretty' --data-binary "@file.json"

But this fails with many errors saying that there are duplicate fields in the json. I read that elasticsearch 6.0 doesn't accept duplicate keys anymore. I checked and my json file does indeep contain many duplicate keys. But I read that

tshark -T ek

is supposed to de-duplicate itself and the option --no-duplicate-keys seems to have gone (my tshark version is 2.2.6).

So how do I get my pcap data into elasticsearch?

frank
  • 111

2 Answers2

1

I had the same issue. Apparently the newest development release (2.5.1) fixes this. If you are on windows it is very easy: just download the installer from https://www.wireshark.org/download.html and install it. If you are on linux as me, you have to download the Source Code from there and build wireshark from source. To do that I found these sources to be helpful: https://scottlinux.com/2013/06/07/how-to-install-the-latest-wireshark-from-source-on-debian-or-ubuntu-linux/

https://www.wireshark.org/docs/wsug_html_chunked/ChBuildInstallUnixBuild.html

nkaenzig
  • 111
0

Feed the data to any other JSON parser (and then dump back to JSON). Most parsers have an option to quietly ignore duplicate fields, keeping the first or last one.

For example, the command-line tools jq or jshon can be used:

$ echo '{"foo": "111", "bar": "222", "foo": "333"}' | jq .
{"foo": "333", "bar": "222"}
grawity
  • 501,077