Introduction
DoHlyzer is a tool for extracting statistical features from packet-captured DoH (DNS-over-HTTPS) traffic.
This article describes what you need to know when installing DoHlyzer on Ubuntu 22.04 LTS.
Update Ubuntu
$ sudo apt update
$ sudo apt upgrade
Check Ubuntu version
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
Check Python version
$ python3 -V
Python 3.10.12
Set up virtual environment in Python
$ sudo apt install -y python3.10-venv
$ python3 -m venv boosting
$ source ~/boosting/bin/activate
Update package installer for Python
(boosting)$ pip install -U pip
(boosting)$ pip -V
pip 23.3.2 from /home/username/boosting/lib/python3.10/site-packages/pip (python 3.10)
Clone GitHub repository
(boosting)$ sudo apt install -y git
(boosting)$ git clone https://github.com/ahlashkari/DoHLyzer
Edit requirement list
(boosting)$ cd DoHLyzer
(boosting)$ mv requirements.txt requirements.txt.org
(boosting)$ vi requirements.txt
numpy==1.22.4
scipy==1.7.3
scapy==2.4.3
matplotlib==3.5.0
scikit-learn==1.3.2
Keras==2.8.0
tensorflow==2.8.0
ijson==3.2.3
# numpy~=1.18
# scipy~=1.4.1
# scapy~=2.4.3
# matplotlib==3.1.2
# scikit-learn==0.22.1
# Keras==2.3.1
# tensorflow==2.1.0
# ijson==3.0
Create setup file
(boosting)$ vi setup.py
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(
name="dohlyzer",
description="Set of tools to capture HTTPS traffic, extract statistical and time-series features from it, and analyze them with a focus on detecting and characterizing DoH (DNS-over-HTTPS) traffic. ",
long_description=open('README.md').read(),
long_description_content_type="text/markdown",
url="https://github.com/ahlashkari/DoHlyzer",
packages=find_packages(exclude=[]),
python_requires=">=3.6",
install_requires=open('requirements.txt').read().split('\n'),
entry_points={
"console_scripts": [
"dohlyzer=meter.dohlyzer:main",
]
},
)
Setup file citation
The setup file is cited from the following site.
Prepare flow.patch
(boosting)$ cd ./meter/features/context
(boosting)$ vi flow.patch
--- packet_flow_key.py 2023-12-16 02:19:53.536965986 -0800
+++ packet_flow_key.py.new 2023-12-16 02:58:13.650651329 -0800
@@ -1,7 +1,7 @@
#!/usr/bin/env python
-from meter.features.context import packet_direction
+from meter.features.context.packet_direction import PacketDirection
def get_packet_flow_key(packet, direction) -> tuple:
@@ -30,7 +30,7 @@
else:
raise Exception('Only TCP protocols are supported.')
- if direction == packet_direction.FORWARD:
+ if direction == PacketDirection.FORWARD:
dest_ip = packet['IP'].dst
src_ip = packet['IP'].src
src_port = packet[protocol].sport
Apply flow.patch
(boosting)$ patch -b < flow.patch
patching file packet_flow_key.py
Prepare time.patch
(boosting)$ cd ../../features
(boosting)$ vi time.patch
--- packet_time.py 2023-12-19 06:53:15.148255511 -0800
+++ packet_time.py.new 2023-12-19 06:54:03.259084221 -0800
@@ -48,7 +48,7 @@
String of Date and time.
"""
- time = self.flow.packets[0][0].time
+ time = float(self.flow.packets[0][0].time)
date_time = datetime.fromtimestamp(time).strftime('%Y-%m-%d %H:%M:%S')
return date_time
Apply time.patche
(boosting)$ patch -b < time.patch
patching file packet_time.py
Install DoHlyzer
(boosting)$ cd ../../../DoHLyzer/
(boosting)$ pip install .
Successfully installed Keras-2.8.0 MarkupSafe-2.1.3 absl-py-2.0.0 astunparse-1.6.3
cachetools-5.3.2 certifi-2023.11.17 charset-normalizer-3.3.2 dohlyzer-0.0.0
flatbuffers-23.5.26 gast-0.5.4 google-auth-2.25.2 google-auth-oauthlib-0.4.6
google-pasta-0.2.0 grpcio-1.60.0 h5py-3.10.0 idna-3.6 ijson-3.2.3 joblib-1.3.2
keras-preprocessing-1.1.2 libclang-16.0.6 markdown-3.5.1 matplotlib-3.5.0
numpy-1.22.4 oauthlib-3.2.2 opt-einsum-3.3.0 protobuf-4.25.1 pyasn1-0.5.1
pyasn1-modules-0.3.0 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 scapy-2.4.3
scikit-learn-1.3.2 scipy-1.7.3 setuptools-scm-8.0.4 tensorboard-2.8.0
tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.8.0
tensorflow-io-gcs-filesystem-0.35.0 termcolor-2.4.0
tf-estimator-nightly-2.8.0.dev2021122109 threadpoolctl-3.2.0 tomli-2.0.1
typing-extensions-4.9.0 urllib3-2.1.0 werkzeug-3.0.1 wheel-0.42.0 wrapt-1.16.0
Get sample data of packet-captured DoH traffic
(boosting)$ cd meter
(boosting)$ wget https://eprints.lib.hokudai.ac.jp/dspace/bitstream/2115/88092/2/DoH-Tunnel-Traffic-HKD.zip
(boosting)$ unzip DoH-Tunnel-Traffic-HKD.zip
(boosting)$ cp ./DoH-Pcaps/DoH-Pcaps-48h/tuns-48h.pcap .
Sample data details
More details on the sample data can be found at the site below.
Start DoHlyzer
(boosting)$ python dohlyzer.py -f ./tuns-48h.pcap -c ./tuns-48h.csv
reading from file ./tuns-48h.pcap, link-type EN10MB (Ethernet)
Packet count: 262
Garbage Collection Began. Flows = 2
Garbage Collection Finished. Flows = 0
Packet count: 307264
Garbage Collection Began. Flows = 1
Garbage Collection Finished. Flows = 0
Garbage Collection Began. Flows = 2
Garbage Collection Finished. Flows = 0
Waiting time
It was about 5 minutes on my computer.
Mark data with a label
(boosting)$ cp -p tuns-48h.csv tuns-48h.csv.org
(boosting)$ sed -i "s/DoH/Label/g" tuns-48h.csv
(boosting)$ sed -i "s/False/tuns/g" tuns-48h.csv
(boosting)$ sed -i "s/True/tuns/g" tuns-48h.csv
Confirm results
(boosting)$ head -n 3 tuns-48h.csv
SourceIP,DestinationIP,SourcePort,DestinationPort,TimeStamp,Duration,FlowBytesSent,
FlowSentRate,FlowBytesReceived,FlowReceivedRate,PacketLengthVariance,
PacketLengthStandardDeviation,PacketLengthMean,PacketLengthMedian,PacketLengthMode,
PacketLengthSkewFromMedian,PacketLengthSkewFromMode,PacketLengthCoefficientofVariation,
PacketTimeVariance,PacketTimeStandardDeviation,PacketTimeMean,PacketTimeMedian,
PacketTimeMode,PacketTimeSkewFromMedian,PacketTimeSkewFromMode,
PacketTimeCoefficientofVariation,ResponseTimeTimeVariance,
ResponseTimeTimeStandardDeviation,ResponseTimeTimeMean,ResponseTimeTimeMedian,
ResponseTimeTimeMode,ResponseTimeTimeSkewFromMedian,ResponseTimeTimeSkewFromMode,
ResponseTimeTimeCoefficientofVariation,Label
192.168.11.12,192.168.11.16,35146,443,2021-10-30 09:44:22,0.058030,
1057,18214.71652593486127864897467,5497,94726.86541444080647940720317,
241648.4709141274,491.5775329631404,344.94736842105266,74.0,66,1.653537948253031,
0.5674534528451868,1.4250798178669006,0.0006792408957783933518005540163,
0.02606225039743101984381890440,0.02310989473684210526315789474,0.002850,0.0,
2.332096548980951604474806341,0.8867190816001087,1.127752882226773168550630360,
0.0000171263761875,0.004138402613025948687341905884,0.00247025,0.0001025,
3.8e-05,1.716423138155277661149494295,0.5877267698276386,
1.675297080467948056812835091,tuns
192.168.11.12,192.168.11.16,35148,443,2021-10-30 09:44:22,121.103592,15694,
129.5915318515077570944386191,23173,191.3485770100031384700794011,27973.836644142997,
167.25380905720203,159.94650205761317,74.0,66,1.5416061829997336,0.5617002242710226,
1.0456859443975632,1444.809650456204504801097394,38.01065180256982260949407615,
53.03876174074074074074074074,51.127386,0.056111,0.1508557983168957281846502085,
1.3938895606404382,0.7166579790902743810399707875,0.8197690634662370200108166581,
0.9054109914653328417537853837,0.6178618255813953488372093023,0.000087,
4.5e-05,2.046942763246924571056582828,0.6823606421891454,
1.465393966706649347200615788,tuns
28 statistica features are extracted in the csv file.
Parameter | Feature |
---|---|
F1 | Number of flow bytes sent |
F2 | Rate of flow bytes sent |
F3 | Number of flow bytes received |
F4 | Rate of flow bytes received |
F5 | Variance of Packet Length |
F6 | Standard Deviation of Packet Length |
F7 | Mean Packet Length |
F8 | Median Packet Length |
F9 | Mode Packet Length |
F10 | Skew from median Packet Length |
F11 | Skew from mode Packet Length |
F12 | Coefficient of Variation of Packet Length |
F13 | Variance of Packet Time |
F14 | Standard Deviation of Packet Time |
F15 | Mean Packet Time |
F16 | Median Packet Time |
F17 | Mode Packet Time |
F18 | Skew from median Packet Time |
F19 | Skew from mode Packet Time |
F20 | Coefficient of Variation of Packet Time |
F21 | Variance of Request/response time difference |
F22 | Standard Deviation of Request/response time difference |
F23 | Mean Request/response time difference |
F24 | Median Request/response time difference |
F25 | Mode Request/response time difference |
F26 | Skew from median Request/response time difference |
F27 | Skew from mode Request/response time difference |
F28 | Coefficient of Variation of Request/response time difference |
Statistical features citation
The statistical features have been cited from the below site.
Deactivate the virtual environment in Python
(boosting)$ deactivate
$
Conclusion
This article introduced how to install DoHlyzer on Ubuntu 22.04 LTS.
DoHlyzer is a powerful feature extractor for packet-captured DoH traffic.
Hopefully, DoHlyzer will be more widely used.