More than 3 years have passed since last update.

JoeSandboxから不審なPowerShellコマンドの収集

Last updated at 2022-02-25Posted at 2020-06-13

はじめに

不審なPowerShellコマンドの収集方法はいくつかありますが、今回はJoeSandboxのマルウェア分析レポートをBeautifulSoup+PythonでWebスクレイピングし、PowerShellのコマンドラインを取得してみたいと思います。

JoeSandboxについて

マルウェアを解析してレポートを出力してくれるサイトです。
https://www.joesandbox.com

JoeSandboxには色々バージョンがありますが、Cloud Basicというバージョンであれば無料でマルウェア解析ができます。
さらにCloud Basicで解析されたレポートは公開されますので、他の人の分析結果レポートを見ることもできます。
ちなみにCloud Basic以外のバージョンですとWeb APIが利用できますが、Cloud Basicでは利用できないようです。

詳細に知りたい方は以下をご参照ください。

JoeSandboxでマルウェアの動的解析をする方法
https://qiita.com/hanzawak/items/ec665e0f96dc65f3def3

BeautifulSoup+Pythonで、マルウェア動的解析サイトからWebスクレイピング
https://qiita.com/hanzawak/items/0a8a26d1ebe62b84f847

今回やりたいこと

JoeSandbox Cloud Basicの解析レポートから、PowerShellのコマンドラインを取得します。

ただ普通に取得しただけではマルウェアが実行したPowerShellのコマンドラインと、正規のファイルが実行したPowerShellのコマンドラインが混ざってしまいます。
そこでJoeSandboxが判定したマルウェア度を表すスコアも一緒に取得します。

スコアは以下から取得します。

PowerShellのコマンドラインは以下から抽出します。
以下の場合はC:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -noP -sta -w 1 -enc略 を抽出します。

アウトプットはテキストファイルに以下のように情報を書き出そうと思います。
カンマ区切りで「レポート番号, スコア, PowerShellコマンド」の順番です。

ReportNumber,DetectionScore,PowerShellCommandLine
236546,56,C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -noP -sta -w 1 -enc略
236547,99,C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -NoP -NonI -W Hiden -Exec Bypass略
236548,10,C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe 略

コード

以下に記載の内容を繋げたら動くようにしています。

必要なライブラリをインポート

BeautifulSoup以外にrequests, os, re もインポートします。

import requests
from bs4 import BeautifulSoup
import os
import re

抽出するレポート番号の指定

複数のレポートから情報を抽出するため、fromとtoを入力するように準備します。ここではテスト用に1つだけ指定しています。

report_num_from = 236543
report_num_to = 236543

指定されたページのクロール

レポートのURLは以下のようになっています。この「236543」の部分がレポート番号のようですので、この番号をループするようにしたいと思います。
https://www.joesandbox.com/analysis/236543/0/html

def extract_powershell_command(report_num_from, report_num_to):
    for reoprt_num in range(report_num_from, report_num_to + 1):
        ps_cmdline = []
        try:
            target_url = 'https://www.joesandbox.com/analysis/' + str(reoprt_num) + '/0/html'
            response = requests.get(target_url)
            soup = BeautifulSoup(response.text, 'lxml')

スコアの取得

画面の上の方にScoreが書いています。
今回の場合はスコア56ですが、この数値を取得します。

該当のコードは以下でした。

少し乱暴ですが以下のようにするとスコアが取得できます。

detection_score = 0
table = soup.findAll("table", {"id":"detection-details-overview-table"})[0]
rows = table.findAll("tr")
detection_score = re.sub(r'.+>(.+)</td></tr>', r'\1', str(rows[0]))

PowerShellのコマンドラインを取得

以下はPowerShellのコマンドラインの含まれる画面例です。

powershell.exeの後ろのcmdlineから後ろの部分を取得してみようと思います。
該当のコードは以下でした。

中身を確認してみます。

どうやらテーブルに格納されているようです。
liで区切ってpowershell.exeが含まれるものを取得し、それを整形してcmdlineから後ろの情報を取得しようと思います。
もっとスマートなやり方があるかもしれませんが、以下のようにしました。

startup = soup.find('div', id='startup1')

for line in startup.findAll('li'):
    if 'powershell.exe' in str(line):
        tmp = str(line).replace('<wbr>', '').replace('</wbr>', '')
        cmdline = re.sub(r'.+ cmdline: (.*) MD5: <span.+', r'\1', tmp)
        ps_cmdline.append(str(reoprt_num) + ',' + detection_score + ',' + cmdline)

例外処理

一応例外処理も入れておきます。
処理の最後にsave_file関数（この後記載）を呼び出します。

except IndexError as e:
    ps_cmdline.append('{},ERROR:{}'.format(reoprt_num,e))

except Exception as e:
    ps_cmdline.append('{},ERROR:{}'.format(reoprt_num,e))

finally:
    save_file(ps_cmdline)

ファイル書き込み

ファイルに書き込む処理です。
別に関数にしなくても良いですが今後ファイル書き出し以外の処理に変更しようと思うので、書き換えやすいように関数にしました。
同じフォルダにoutput.txtを作成して書き込みます。

def save_file(ps_cmdline):
    with open('./output.txt', 'a') as f:
        if os.stat('./output.txt').st_size == 0:
            f.write('ReportNumber,DetectionScore,PowerShellCommandLine\n')

        for x in ps_cmdline:
            f.write(str(x) + "\n")

完成したコード

今までのコードをつなげて、コメントを追加しました。
当たり前のことですがfromとtoで指定する範囲はほどほどにしてください。
Jupyter Notebookで作成・実行したため以下のような形にしています。

import requests
from bs4 import BeautifulSoup
import os
import re

report_num_from = 236547
report_num_to = 236547

def extract_powershell_command(report_num_from, report_num_to):
    """
    Extract PowerShell Command from JoeSandbox analysis result.

    Parameters
    ----------
    report_num_from : int
        First report number to analyze
    report_num_to : int
        Last report number to analyze
    """

    for reoprt_num in range(report_num_from, report_num_to + 1):
        ps_cmdline = []
        try:
            target_url = 'https://www.joesandbox.com/analysis/' + str(reoprt_num) + '/0/html'
            response = requests.get(target_url)
            soup = BeautifulSoup(response.text, 'lxml')

            # Check JoeSandbox Detection Score (Maybe score above 40 is malicious)
            detection_score = 0
            table = soup.findAll("table", {"id":"detection-details-overview-table"})[0]
            rows = table.findAll("tr")
            detection_score = re.sub(r'.+>(.+)</td></tr>', r'\1', str(rows[0]))
    
            startup = soup.find('div', id='startup1')  # 'startup1' is a table with ProcessName & CommandLine

            for line in startup.findAll('li'):
                if 'powershell.exe' in str(line):
                    tmp = str(line).replace('<wbr>', '').replace('</wbr>', '')
                    cmdline = re.sub(r'.+ cmdline: (.*) MD5: <span.+', r'\1', tmp)
                    ps_cmdline.append(str(reoprt_num) + ',' + detection_score + ',' + cmdline)

        # Report number does not exist
        except IndexError as e:
            ps_cmdline.append('{},ERROR:{}'.format(reoprt_num,e))

        except Exception as e:
            ps_cmdline.append('{},ERROR:{}'.format(reoprt_num,e))

        finally:
            save_file(ps_cmdline)

def save_file(ps_cmdline):
    """
    Save the extraction results to a file.
    File I/O is a function because it may change.

    Parameters
    ----------
    ps_cmdline : list of str
        List containing process names.
    """

    with open('./output.txt', 'a') as f:
        if os.stat('./output.txt').st_size == 0:
            f.write('ReportNumber,DetectionScore,PowerShellCommandLine\n')

        for x in ps_cmdline:
            f.write(str(x) + "\n")

extract_powershell_command(report_num_from, report_num_to)

実行結果

上記のコードを実行すると以下のような結果が得られます。

ReportNumber,DetectionScore,PowerShellCommandLine
236547,56,C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -noP -sta -w 1 -enc SQBmACgAJABQAFMAVgBFAFIAcwBJAG8AbgBUAGEAYgBMAGUALgBQA略

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up