1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

MPFS-DISCO-KIT 向け Ubuntu 22.04 で u-dma-buf を試してみた

Posted at

はじめに

筆者は MPFS-DISCO-KIT(Microchip PolarFire SoC FPGA Discovery Kit) で動作する Ubuntu 22.04 を独自に構築しています。また、Linux でユーザー空間で動作するプログラムとハードウェアがメモリを共有するためのデバイスドライバとして u-dma-buf を開発して github に公開しています。この記事では、この MPFS-DISCO-KIT 向け Ubuntu 22.04 で u-dma-buf を動かしてみた話をします。なお、この記事では CPU が u-dma-buf にアクセスする場合を評価した結果のみを紹介しています。DMA Device が u-dma-buf にアクセスした際の性能やキャッシュのコヒーレンシーなどはまたの機会に紹介する予定です。

なお、この記事で紹介した Device Tree や プログラムなどは以下の URL で公開しています。

u-dma-buf の準備

『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり、CPU Core Complex がメモリにアクセスする場合には次の3つのモードがあり、これらのモードの切り替えは MSS アドレス空間上の物理アドレスによって決まります。

  • Cached: CPU キャッシュが有効
  • Non-Cached: CPU キャッシュが無効かつ WCB(Write Combining Buffer) も無効
  • Non-Cached WCB: CPU キャッシュが無効だがWCB(Write Combining Buffer) は有効

つまり DMA Buffer をどのアドレスに確保するかによって CPU キャッシュが有効か無効かが変わります。
そこで u-dma-buf を準備するにあたり、次の6パターンを用意しました。

No. u-dma-buf name memory-region MSS Address Space
1 udmabuf-fabric-low kernel DDR-Cached Low
2 udmabuf-fabric-high ddr_cached_high DDR-Cached High
3 udmabuf-soc-high ddr_non_cached_high DDR-Non-Cached High
4 udmabuf-ddr-c0 fabricbuf0ddrc DDR-Cached Low
5 udmabuf-ddr-nc0 fabricbuf1ddrnc DDR-Non-Cached Low
6 udmabuf-ddr-nc-wcb0 fabricbuf2ddrncwcb DDR-Non-Cached WCB Low

この章では、これら u-dma-buf を準備するための Device Tree とそれらを Overlay した結果を紹介します。

udmabuf-fabric-low

udmabuf-fabric-low は /fabric-bus@40000000 の配下に作ります。
その際に dma-mask プロパティに 32 を設定して、確保する DMA Buffer の物理アドレスが 32bit 以内になるようにします。
具体的には次のような Device Tree を用意します。

udmabuf-fabric-low.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/fabric-bus@40000000";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-fabric-low {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-fabric-low";
				size = <0x200000>;
				dma-mask = <32>;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-fabric-low.dtb udmabuf-fabric-low.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-fabric-low
shell$ sudo cp udmabuf-fabric-low.dtb /config/device-tree/overlays/udmabuf-fabric-low/dtbo
shell$ dmesg | tail -6
[ 2581.177406] u-dma-buf udmabuf-fabric-low: driver version = 5.0.3
[ 2581.177460] u-dma-buf udmabuf-fabric-low: major number   = 244
[ 2581.177484] u-dma-buf udmabuf-fabric-low: minor number   = 0
[ 2581.177504] u-dma-buf udmabuf-fabric-low: phys address   = 0x0000000081a00000
[ 2581.177528] u-dma-buf udmabuf-fabric-low: buffer size    = 2097152
[ 2581.177552] u-dma-buf fabric-bus@40000000:udmabuf-fabric-low: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0000_81a0_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の kernel: memory@80000000 (MSS アドレス空間上のアドレスは DDR-Cached Low) に DMA Buffer が確保されたことを意味しています。

udmabuf-fabric-high

udmabuf-fabric-high は /fabric-bus@40000000 の配下に作ります。
その際に dma-mask プロパティに 40 を設定して、確保する DMA Buffer の物理アドレスが 32bit 以上でも可能になるようにします。
具体的には次のような Device Tree を用意します。

udmabuf-fabric-low.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/fabric-bus@40000000";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-fabric-high {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-fabric-high";
				size = <0x200000>;
				dma-mask = <40>;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-fabric-high.dtb udmabuf-fabric-high.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-fabric-high
shell$ sudo cp udmabuf-fabric-high.dtb /config/device-tree/overlays/udmabuf-fabric-high/dtbo
shell$ dmesg | tail -6
[ 3470.093101] u-dma-buf udmabuf-fabric-high: driver version = 5.0.3
[ 3470.093157] u-dma-buf udmabuf-fabric-high: major number   = 244
[ 3470.093182] u-dma-buf udmabuf-fabric-high: minor number   = 0
[ 3470.093202] u-dma-buf udmabuf-fabric-high: phys address   = 0x0000001032600000
[ 3470.093227] u-dma-buf udmabuf-fabric-high: buffer size    = 2097152
[ 3470.093252] u-dma-buf fabric-bus@40000000:udmabuf-fabric-high: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0010_3260_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の ddr_cached_high: memory@1022000000(MSS アドレス空間上のアドレスは DDR-Cached High) に DMA Buffer が確保されたことを意味しています。

udmabuf-soc-high

udmabuf-fabric-high は /soc の配下に作ります。
その際に dma-mask プロパティに 40 を設定して、物理アドレスが 32bit以上でも可能になるようにします。なお /soc には『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり dma-ranges で DMA Buffer を確保する物理アドレスが 0x14_0000_0000 以上になるように指定されているため、dma-mask をこの 40 未満にした場合、DMA Buffer の確保に失敗します。
具体的には次のような Device Tree を用意します。

udmabuf-soc-high.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/soc";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-soc-high {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-soc-high";
				size = <0x200000>;
				dma-mask = <40>;
				dma-noncoherent;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-soc-high.dtb udmabuf-soc-high.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-soc-high
shell$ sudo cp udmabuf-soc-high.dtb /config/device-tree/overlays/udmabuf-soc-high/dtbo
shell$ dmesg | tail -6
[ 3945.367440] u-dma-buf udmabuf-soc-high: driver version = 5.0.3
[ 3945.367485] u-dma-buf udmabuf-soc-high: major number   = 244
[ 3945.367506] u-dma-buf udmabuf-soc-high: minor number   = 0
[ 3945.367527] u-dma-buf udmabuf-soc-high: phys address   = 0x0000001412200000
[ 3945.367549] u-dma-buf udmabuf-soc-high: buffer size    = 2097152
[ 3945.367572] u-dma-buf soc:udmabuf-soc-high: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0014_1220_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の ddr_non_cached_high: memory@1412000000 に DMA Buffer が確保されたことを意味しています。

udmabuf-ddr-c0

udmabuf-ddr-c0 は reserved-memory の fabricbuf0ddrc に DMA Buffer を確保します。
なお fabricbuf0ddrc は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域ですが MSS アドレス空間上のアドレスは DDR-Cached Low にあります。
具体的には次のような Device Tree を用意します。

udmabuf-ddr-c0.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/fabric-bus@40000000";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-ddr-c0 {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-ddr-c0";
				memory-region = <&fabricbuf0ddrc>;
				size = <0x2000000>;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-ddr-c0.dtb udmabuf-ddr-c0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-c0
shell$ sudo cp udmabuf-ddr-c0.dtb /config/device-tree/overlays/udmabuf-ddr-c0/dtbo
shell$ dmesg | tail -6
[ 4939.448074] u-dma-buf udmabuf-ddr-c0: driver version = 5.0.3
[ 4939.448127] u-dma-buf udmabuf-ddr-c0: major number   = 244
[ 4939.448149] u-dma-buf udmabuf-ddr-c0: minor number   = 0
[ 4939.448168] u-dma-buf udmabuf-ddr-c0: phys address   = 0x0000000088000000
[ 4939.448192] u-dma-buf udmabuf-ddr-c0: buffer size    = 33554432
[ 4939.448216] u-dma-buf fabric-bus@40000000:udmabuf-ddr-c0: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0000_8800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf0ddrc に DMA Buffer が確保されたことを意味しています。

udmabuf-ddr-nc0

udmabuf-ddr-nc0 は reserved-memory の fabricbuf1ddrnc に DMA Buffer を確保します。
なお fabricbuf1ddrnc は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域で MSS アドレス空間上のアドレスは DDR-Non-Cached Low にあります。
具体的には次のような Device Tree を用意します。

udmabuf-ddr-nc0.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/fabric-bus@40000000";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-ddr-nc0 {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-ddr-nc0";
				memory-region = <&fabricbuf1ddrnc>;
				size = <0x2000000>;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-ddr-nc0.dtb udmabuf-ddr-nc0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-nc0
shell$ sudo cp udmabuf-ddr-nc0.dtb /config/device-tree/overlays/udmabuf-ddr-nc0/dtbo
shell$ dmesg | tail -6
[ 6876.210642] u-dma-buf udmabuf-ddr-nc0: driver version = 5.0.3
[ 6876.210689] u-dma-buf udmabuf-ddr-nc0: major number   = 244
[ 6876.210713] u-dma-buf udmabuf-ddr-nc0: minor number   = 0
[ 6876.210733] u-dma-buf udmabuf-ddr-nc0: phys address   = 0x00000000c8000000
[ 6876.210756] u-dma-buf udmabuf-ddr-nc0: buffer size    = 33554432
[ 6876.210779] u-dma-buf fabric-bus@40000000:udmabuf-ddr-nc0: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0000_c800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf1ddrnc に DMA Buffer が確保されたことを意味しています。

udmabuf-ddr-nc-wcb0

udmabuf-ddr-nc-wcb0 は reserved-memory の fabricbuf2ddrncwcb に DMA Buffer を確保します。
なお fabricbuf2ddrncwcb は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域で MSS アドレス空間上のアドレスは DDR-Non-Cached WCB Low にあります。
具体的には次のような Device Tree を用意します。

udmabuf-ddr-nc-wcb0.dts
/dts-v1/; /plugin/;

/ {
	fragment@0 {
		target-path = "/fabric-bus@40000000";
		#address-cells = <2>;
		#size-cells = <2>;

		__overlay__ {
			#address-cells = <2>;
			#size-cells = <2>;
			udmabuf-ddr-nc-wcb0 {
				compatible = "ikwzm,u-dma-buf";
				device-name = "udmabuf-ddr-nc-wcb0";
				memory-region = <&fabricbuf2ddrncwcb>;
				size = <0x2000000>;
			};
		};
	};
};

この Device Tree をオーバレイします。

shell$ dtc -I dts -O dtb -o udmabuf-ddr-nc-wcb0.dtb udmabuf-ddr-nc-wcb0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-nc-wcb0
shell$ sudo cp udmabuf-ddr-nc-wcb0.dtb /config/device-tree/overlays/udmabuf-ddr-nc-wcb0/dtbo
shell$ dmesg | tail -6
[ 7716.790940] u-dma-buf udmabuf-ddr-nc-wcb0: driver version = 5.0.3
[ 7716.790990] u-dma-buf udmabuf-ddr-nc-wcb0: major number   = 244
[ 7716.791013] u-dma-buf udmabuf-ddr-nc-wcb0: minor number   = 0
[ 7716.791034] u-dma-buf udmabuf-ddr-nc-wcb0: phys address   = 0x00000000d8000000
[ 7716.791057] u-dma-buf udmabuf-ddr-nc-wcb0: buffer size    = 33554432
[ 7716.791080] u-dma-buf fabric-bus@40000000:udmabuf-ddr-nc-wcb0: driver installed.

確保した DMA Buffer の物理アドレスが 0x0000_0000_d800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf2ddrncwcb に DMA Buffer が確保されたことを意味しています。

u-dma-buf の性能

性能評価用プログラム

CPU が u-dma-buf にアクセスするさいの性能を評価するにあたり、次のようなプログラムを用意しました。

u-dma-buf-file-test.c (長いので折りたたみ)
u-dma-buf-file-test.c
#include        <stdio.h>
#include        <fcntl.h>
#include        <string.h>
#include        <time.h>
#include        <stdlib.h>
#include        <unistd.h>
#include        <getopt.h>
#include        <errno.h>
#include        <sys/ioctl.h>
#include        <sys/time.h>
#include        <sys/types.h>
#include        <sys/mman.h>
#include        <sys/utsname.h>
#include        <inttypes.h>

struct u_dma_buf
{
    char*      name;
    char*      dev_name;
    char*      sys_path;
    char*      version;
    uint64_t   phys_addr;
    size_t     size;
    int        dma_coherent;
    int        sync_for_cpu_file;
    int        sync_for_dev_file;
    char       sync_command[1024];
    int        sync_command_len;
};
const  int  U_DMA_BUF_READ_WRITE  =  0;
const  int  U_DMA_BUF_WRITE_ONLY  =  1;
const  int  U_DMA_BUF_READ_ONLY   =  2;

void u_dma_buf_destroy(struct u_dma_buf* this)
{
    if (this == NULL)
        return;
    
    if (this->sync_for_cpu_file >= 0) close(this->sync_for_cpu_file);
    if (this->sync_for_dev_file >= 0) close(this->sync_for_dev_file);
    if (this->name     != NULL) free(this->name);
    if (this->dev_name != NULL) free(this->dev_name);
    if (this->sys_path != NULL) free(this->sys_path);
    if (this->version  != NULL) free(this->version);
    free(this);
}

struct u_dma_buf* u_dma_buf_create(char* name)
{
    struct u_dma_buf*  this;
    char               file_name[1024];
    char               attr[1024];
    int                str_len;
    int                fd;

    if ((this = calloc(1, sizeof(struct u_dma_buf))) == NULL) {
        printf("Can not alloc u_dma_buf\n");
        goto failed;
    }
    this->sync_for_cpu_file = -1;
    this->sync_for_dev_file = -1;
    
    if ((this->name = strdup(name)) == NULL) {
        printf("Can not alloc this->name\n");
        goto failed;
    }
    str_len = sprintf(file_name, "/dev/%s", this->name);
    if ((this->dev_name = strdup(file_name)) == NULL) {
        printf("Can not alloc this->dev_name\n");
        goto failed;
    }
    str_len = sprintf(file_name, "/sys/class/u-dma-buf/%s", this->name);
    if ((this->sys_path = strdup(file_name)) == NULL) {
        printf("Can not alloc this->sys_path\n");
        goto failed;
    }
    str_len = sprintf(file_name, "%s/size", this->sys_path);
    if ((fd = open(file_name, O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%ld", &this->size);
        close(fd);
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    } 
    str_len = sprintf(file_name, "%s/phys_addr", this->sys_path);
    if ((fd = open(file_name, O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%lx", &this->phys_addr);
        close(fd);
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    } 
    str_len = sprintf(file_name, "%s/driver_version", this->sys_path);
    if ((fd = open(file_name, O_RDONLY)) != -1) {
        int len;
        len = read(fd, attr, 1024);
        while(--len >= 0) {
          if (attr[len] =='\n') {
            attr[len] = '\0';
            break;
          }
        }
        this->version = strdup(attr);
        close(fd);
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    } 
    str_len = sprintf(file_name, "%s/dma_coherent", this->sys_path);
    if ((fd = open(file_name, O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%d", &this->dma_coherent);
        close(fd);
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    } 
    str_len = sprintf(file_name, "%s/sync_for_cpu", this->sys_path);
    if ((fd = open(file_name, O_RDWR)) != -1) {
        this->sync_for_cpu_file = fd;
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    } 
    str_len = sprintf(file_name, "%s/sync_for_device", this->sys_path);
    if ((fd = open(file_name, O_RDWR)) != -1) {
        this->sync_for_dev_file = fd;
    } else {
        printf("Can not open %s\n", file_name);
        goto failed;
    }
    return this;
      
  failed:
    u_dma_buf_destroy(this);
    return NULL;
}

int  u_dma_buf_open(struct u_dma_buf* this, int flags)
{
    return open(this->dev_name, flags);
}

void u_dma_buf_set_sync_area(struct u_dma_buf* this, unsigned int offset, unsigned int size, int direction)
{
    this->sync_command_len = 
        sprintf(this->sync_command, "0x%08X%08X\n",
                offset,
               ((size & 0xFFFFFFF0) | (direction << 2) | 1));
}

size_t u_dma_buf_sync_for_cpu(struct u_dma_buf* this)
{
    if (this->sync_command_len > 0)
        return write(this->sync_for_cpu_file,
                     this->sync_command,
                     this->sync_command_len);
    else
        return 0;
}

size_t u_dma_buf_sync_for_dev(struct u_dma_buf* this)
{
    if (this->sync_command_len > 0)
        return write(this->sync_for_dev_file,
                     this->sync_command,
                     this->sync_command_len);
    else
        return 0;
}

struct test_time
{
    struct timeval main;
    struct timeval sync_for_cpu;
    struct timeval sync_for_dev;
    struct timeval total;
};

static void diff_time(struct timeval* run_time, struct timeval* start_time, struct timeval* end_time)
{
    if (end_time->tv_usec < start_time->tv_usec) {
        run_time->tv_sec  = end_time->tv_sec  - start_time->tv_sec  - 1;
        run_time->tv_usec = end_time->tv_usec - start_time->tv_usec + 1000*1000;
    } else {
        run_time->tv_sec  = end_time->tv_sec  - start_time->tv_sec ;
        run_time->tv_usec = end_time->tv_usec - start_time->tv_usec;
    }
}

int u_dma_buf_mmap_write_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
    int            fd;
    void*          iomem;
    struct timeval test_start_time, test_end_time;
    struct timeval main_start_time, main_end_time;

    if (sync == 0)
        u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_WRITE_ONLY);
      
    if ((fd  = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
        iomem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        gettimeofday(&test_start_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_cpu(this);
        gettimeofday(&main_start_time, NULL);
        memcpy(iomem, buf, size);
        gettimeofday(&main_end_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_dev(this);
        gettimeofday(&test_end_time  , NULL);
        if (time != NULL) {
            diff_time(&time->total       , &test_start_time, &test_end_time  );
            diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
            diff_time(&time->main        , &main_start_time, &main_end_time  );
            diff_time(&time->sync_for_dev, &main_end_time  , &test_end_time  );
        }
        (void)close(fd);
        return 0;
    } else {
        return -1;
    }
}

int u_dma_buf_mmap_read_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
    int            fd;
    void*          iomem;
    struct timeval test_start_time, test_end_time;
    struct timeval main_start_time, main_end_time;

    if (sync == 0)
        u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_READ_ONLY);
      
    if ((fd  = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
        iomem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        gettimeofday(&test_start_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_cpu(this);
        gettimeofday(&main_start_time, NULL);
        memcpy(buf, iomem, size);
        gettimeofday(&main_end_time  , NULL);
        if (sync == 0)
            u_dma_buf_sync_for_dev(this);
        gettimeofday(&test_end_time  , NULL);
        if (time != NULL) {
            diff_time(&time->total       , &test_start_time, &test_end_time  );
            diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
            diff_time(&time->main        , &main_start_time, &main_end_time  );
            diff_time(&time->sync_for_dev, &main_end_time  , &test_end_time  );
        }
        close(fd);
        return 0;
    } else {
        return -1;
    }
}

int u_dma_buf_file_write_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
    int            fd;
    int            len;
    void*          ptr;
    struct timeval test_start_time, test_end_time;
    struct timeval main_start_time, main_end_time;

    if (sync == 0)
        u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_WRITE_ONLY);
      
    if ((fd  = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
        gettimeofday(&test_start_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_cpu(this);
        gettimeofday(&main_start_time, NULL);
        len = size;
        ptr = buf;
        while(len > 0) {
            int count = write(fd, ptr, len);
            if (count < 0) {
                break;
            }
            ptr += count;
            len -= count;
        }
        gettimeofday(&main_end_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_dev(this);
        gettimeofday(&test_end_time, NULL);
        if (time != NULL) {
            diff_time(&time->total       , &test_start_time, &test_end_time  );
            diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
            diff_time(&time->main        , &main_start_time, &main_end_time  );
            diff_time(&time->sync_for_dev, &main_end_time  , &test_end_time  );
        }
        (void)close(fd);
        return 0;
    } else {
        return -1;
    }
}

int u_dma_buf_file_read_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
    int            fd;
    int            len;
    void*          ptr;
    struct timeval test_start_time, test_end_time;
    struct timeval main_start_time, main_end_time;

    if (sync == 0)
        u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_READ_ONLY);
      
    if ((fd  = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
        gettimeofday(&test_start_time, NULL);
        if (sync == 0)
            u_dma_buf_sync_for_cpu(this);
        gettimeofday(&main_start_time, NULL);
        len = size;
        ptr = buf;
        while(len > 0) {
            int count = read(fd, ptr, len);
            if (count < 0) {
                break;
            }
            ptr += count;
            len -= count;
        }
        gettimeofday(&main_end_time  , NULL);
        if (sync == 0)
            u_dma_buf_sync_for_dev(this);
        gettimeofday(&test_end_time  , NULL);
        if (time != NULL) {
            diff_time(&time->total       , &test_start_time, &test_end_time  );
            diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
            diff_time(&time->main        , &main_start_time, &main_end_time  );
            diff_time(&time->sync_for_dev, &main_end_time  , &test_end_time  );
        }
        close(fd);
        return 0;
    } else {
        return -1;
    }
}

int main(int argc, char* argv[])
{
    struct u_dma_buf* u_dma_buf;
    char              device_name[256];
    unsigned int      err_count = 0;
    size_t            buf_size;
    void*             null_buf = NULL;
    void*             src0_buf = NULL;
    void*             src1_buf = NULL;
    void*             temp_buf = NULL;
    int               verbose  = 0;
    int               opt;
    int               optindex;
    struct option     longopts[] = {
      { "name"      , required_argument, NULL, 'n'},
      { "verbose"   , no_argument      , NULL, 'v'},
      { NULL        , 0                , NULL,  0 },
    };

    strncpy(device_name, "udmabuf0", sizeof(device_name));
    while ((opt = getopt_long(argc, argv, "n:", longopts, &optindex)) != -1) {
      switch (opt) {
        case 'n':
          strncpy(device_name, optarg, sizeof(device_name));
          break;
        case 'v':
          verbose = 1;
          break;
        default:
          printf("error options\n");
          break;
      }
    }
    printf("device=%s\n", device_name);
    //
    // u_dma_buf
    //
    if ((u_dma_buf = u_dma_buf_create(device_name)) == NULL) {
        goto done;
    }
    printf("driver_version=%s\n"       , u_dma_buf->version     );
    printf("size=%ld\n"                , u_dma_buf->size        );
    printf("phys_addr=0x%" PRIx64 "\n" , u_dma_buf->phys_addr   );
    printf("dma_coherent=%d\n"         , u_dma_buf->dma_coherent);
    //
    // initilize buffers 
    //
    buf_size = u_dma_buf->size;
    if ((null_buf = malloc(buf_size)) == NULL) {
        printf("Can not malloc null_buf\n");
        goto done;
    } else {
        memset(null_buf, 0, buf_size);
    }
    if ((src0_buf = malloc(buf_size)) == NULL) {
        printf("Can not malloc src0_buf\n");
        goto done;
    } else {
        int*   word  = (int *)src0_buf;
        size_t words = buf_size/sizeof(int);
        for(int i = 0; i < words; i++) {
            word[i] = i;
        }
    }
    if ((src1_buf = malloc(buf_size)) == NULL) {
        printf("Can not malloc src1_buf\n");
        goto done;
    } else {
        int*   word  = (int *)src1_buf;
        size_t words = buf_size/sizeof(int);
        for(int i = 0; i < words; i++) {
            word[i] = ~i;
        }
    }
    if ((temp_buf = malloc(buf_size)) == NULL) {
        printf("Can not malloc temp_buf\n");
        goto done;
    } else {
        memset(temp_buf, 0, buf_size);
    }
    //
    // define TEST1()
    //
#define TEST1(w_type,w_sync,r_type,r_sync,src,dst,size)    \
    {                                                      \
        struct test_time w_time;                           \
        struct test_time r_time;                           \
        long long w_total_usec;                            \
        long long r_total_usec;                            \
        memset(dst, 0, buf_size);                          \
        printf(#w_type " write test : sync=%d ", w_sync);  \
        u_dma_buf_##w_type##_write_test(u_dma_buf, src, size, w_sync, &w_time); \
        w_total_usec = (long long)w_time.total.tv_sec*(1000*1000)+(long long)w_time.total.tv_usec; \
        printf("time=%ld.%06ld sec (%ld.%06ld sec) ", w_time.total.tv_sec, w_time.total.tv_usec, w_time.main.tv_sec, w_time.main.tv_usec); \
        printf("%6.1f MB/sec\n", (double)size / (double)w_total_usec); \
        printf(#r_type " read  test : sync=%d ", r_sync);  \
        u_dma_buf_##r_type##_read_test (u_dma_buf, dst, size, r_sync, &r_time); \
        r_total_usec = (long long)r_time.total.tv_sec*(1000*1000)+(long long)r_time.total.tv_usec; \
        printf("time=%ld.%06ld sec (%ld.%06ld sec) ", r_time.total.tv_sec, r_time.total.tv_usec, r_time.main.tv_sec, r_time.main.tv_usec); \
        printf("%6.1f MB/sec\n", (double)size / (double)r_total_usec); \
        if (memcmp(dst, src, size) != 0) {   \
            printf("compare = mismatch\n");  \
            err_count++;                     \
        } else {                             \
            printf("compare = ok\n");        \
        }                                    \
    }
    TEST1(mmap, 1, mmap, 1, src0_buf, temp_buf, buf_size);
    TEST1(mmap, 0, mmap, 1, src1_buf, temp_buf, buf_size);
    TEST1(mmap, 1, mmap, 0, src0_buf, temp_buf, buf_size);
    TEST1(mmap, 0, mmap, 0, src1_buf, temp_buf, buf_size);
    TEST1(file, 1, mmap, 0, src0_buf, temp_buf, buf_size);
    TEST1(file, 0, mmap, 0, src1_buf, temp_buf, buf_size);
    TEST1(mmap, 0, file, 1, src0_buf, temp_buf, buf_size);
    TEST1(mmap, 0, file, 0, src1_buf, temp_buf, buf_size);

 done:
    if (temp_buf  != NULL)
        free(temp_buf);
    if (src1_buf  != NULL)
        free(src1_buf);
    if (src0_buf  != NULL)
        free(src0_buf);
    if (null_buf  != NULL)
        free(null_buf);
    if (u_dma_buf != NULL)
      u_dma_buf_destroy(u_dma_buf);
}

このテストプログラムは、引数で指定された u-dma-buf に対して、mmap read、file read、mmap write、file write の性能を評価します。

性能評価用プログラムの実行ログ

性能評価用プログラムを実行してみた結果のログを示します。

u-dma-buf-file-test.log (長いので折りたたみ)
u-dma-buf-file-test.log
##  file-test-udmabuf-fabric-low
device=udmabuf-fabric-low
driver_version=5.0.3
size=2097152
phys_addr=0x81a00000
dma_coherent=1
mmap write test : sync=1 time=0.021085 sec (0.021081 sec)   99.5 MB/sec
mmap read  test : sync=1 time=0.023062 sec (0.023058 sec)   90.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.020971 sec (0.020886 sec)  100.0 MB/sec
mmap read  test : sync=1 time=0.023041 sec (0.023036 sec)   91.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.021026 sec (0.021022 sec)   99.7 MB/sec
mmap read  test : sync=0 time=0.023123 sec (0.023038 sec)   90.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.020986 sec (0.020903 sec)   99.9 MB/sec
mmap read  test : sync=0 time=0.023063 sec (0.022997 sec)   90.9 MB/sec
compare = ok
file write test : sync=1 time=0.018645 sec (0.018641 sec)  112.5 MB/sec
mmap read  test : sync=0 time=0.023189 sec (0.023106 sec)   90.4 MB/sec
compare = ok
file write test : sync=0 time=0.018813 sec (0.018737 sec)  111.5 MB/sec
mmap read  test : sync=0 time=0.023196 sec (0.023127 sec)   90.4 MB/sec
compare = ok
mmap write test : sync=0 time=0.020976 sec (0.020891 sec)  100.0 MB/sec
file read  test : sync=1 time=0.020546 sec (0.020542 sec)  102.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.020988 sec (0.020902 sec)   99.9 MB/sec
file read  test : sync=0 time=0.020360 sec (0.020302 sec)  103.0 MB/sec
compare = ok

##  file-test-udmabuf-fabric-high
device=udmabuf-fabric-high
driver_version=5.0.3
size=2097152
phys_addr=0x1032600000
dma_coherent=1
mmap write test : sync=1 time=0.020997 sec (0.020993 sec)   99.9 MB/sec
mmap read  test : sync=1 time=0.024140 sec (0.024136 sec)   86.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.021072 sec (0.020989 sec)   99.5 MB/sec
mmap read  test : sync=1 time=0.024104 sec (0.024100 sec)   87.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.020901 sec (0.020896 sec)  100.3 MB/sec
mmap read  test : sync=0 time=0.024230 sec (0.024146 sec)   86.6 MB/sec
compare = ok
mmap write test : sync=0 time=0.021115 sec (0.021032 sec)   99.3 MB/sec
mmap read  test : sync=0 time=0.024211 sec (0.024142 sec)   86.6 MB/sec
compare = ok
file write test : sync=1 time=0.018739 sec (0.018735 sec)  111.9 MB/sec
mmap read  test : sync=0 time=0.024265 sec (0.024183 sec)   86.4 MB/sec
compare = ok
file write test : sync=0 time=0.018952 sec (0.018875 sec)  110.7 MB/sec
mmap read  test : sync=0 time=0.024374 sec (0.024303 sec)   86.0 MB/sec
compare = ok
mmap write test : sync=0 time=0.021010 sec (0.020928 sec)   99.8 MB/sec
file read  test : sync=1 time=0.021169 sec (0.021167 sec)   99.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.021256 sec (0.021050 sec)   98.7 MB/sec
file read  test : sync=0 time=0.021142 sec (0.021087 sec)   99.2 MB/sec
compare = ok

##  file-test-udmabuf-soc-high
device=udmabuf-soc-high
driver_version=5.0.3
size=2097152
phys_addr=0x1412200000
dma_coherent=0
mmap write test : sync=1 time=0.015515 sec (0.015511 sec)  135.2 MB/sec
mmap read  test : sync=1 time=0.038190 sec (0.038187 sec)   54.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.016677 sec (0.015393 sec)  125.8 MB/sec
mmap read  test : sync=1 time=0.037967 sec (0.037965 sec)   55.2 MB/sec
compare = ok
mmap write test : sync=1 time=0.015359 sec (0.015354 sec)  136.5 MB/sec
mmap read  test : sync=0 time=0.040526 sec (0.037932 sec)   51.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016673 sec (0.015384 sec)  125.8 MB/sec
mmap read  test : sync=0 time=0.040548 sec (0.037905 sec)   51.7 MB/sec
compare = ok
file write test : sync=1 time=0.012514 sec (0.012511 sec)  167.6 MB/sec
mmap read  test : sync=0 time=0.040502 sec (0.037948 sec)   51.8 MB/sec
compare = ok
file write test : sync=0 time=0.012607 sec (0.011264 sec)  166.3 MB/sec
mmap read  test : sync=0 time=0.040596 sec (0.038118 sec)   51.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016678 sec (0.015392 sec)  125.7 MB/sec
file read  test : sync=1 time=0.027340 sec (0.027337 sec)   76.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016668 sec (0.015380 sec)  125.8 MB/sec
file read  test : sync=0 time=0.027377 sec (0.024853 sec)   76.6 MB/sec
compare = ok

##  file-test-udmabuf-ddr-c0
device=udmabuf-ddr-c0
driver_version=5.0.3
size=33554432
phys_addr=0x88000000
dma_coherent=1
mmap write test : sync=1 time=0.326402 sec (0.326397 sec)  102.8 MB/sec
mmap read  test : sync=1 time=0.335157 sec (0.335153 sec)  100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.326644 sec (0.326557 sec)  102.7 MB/sec
mmap read  test : sync=1 time=0.335646 sec (0.335642 sec)  100.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.326066 sec (0.326063 sec)  102.9 MB/sec
mmap read  test : sync=0 time=0.335145 sec (0.335061 sec)  100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.325970 sec (0.325883 sec)  102.9 MB/sec
mmap read  test : sync=0 time=0.335155 sec (0.335084 sec)  100.1 MB/sec
compare = ok
file write test : sync=1 time=0.292303 sec (0.292299 sec)  114.8 MB/sec
mmap read  test : sync=0 time=0.335115 sec (0.335029 sec)  100.1 MB/sec
compare = ok
file write test : sync=0 time=0.292203 sec (0.292120 sec)  114.8 MB/sec
mmap read  test : sync=0 time=0.335275 sec (0.335203 sec)  100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.326377 sec (0.326288 sec)  102.8 MB/sec
file read  test : sync=1 time=0.293119 sec (0.293116 sec)  114.5 MB/sec
compare = ok
mmap write test : sync=0 time=0.326570 sec (0.326483 sec)  102.7 MB/sec
file read  test : sync=0 time=0.293292 sec (0.293234 sec)  114.4 MB/sec
compare = ok

##  file-test-udmabuf-ddr-nc0
device=udmabuf-ddr-nc0
driver_version=5.0.3
size=33554432
phys_addr=0xc8000000
dma_coherent=1
mmap write test : sync=1 time=0.199065 sec (0.199061 sec)  168.6 MB/sec
mmap read  test : sync=1 time=0.570913 sec (0.570910 sec)   58.8 MB/sec
compare = ok
mmap write test : sync=0 time=0.199312 sec (0.199230 sec)  168.4 MB/sec
mmap read  test : sync=1 time=0.570328 sec (0.570324 sec)   58.8 MB/sec
compare = ok
mmap write test : sync=1 time=0.199157 sec (0.199153 sec)  168.5 MB/sec
mmap read  test : sync=0 time=0.570156 sec (0.570073 sec)   58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.200012 sec (0.199928 sec)  167.8 MB/sec
mmap read  test : sync=0 time=0.569450 sec (0.569380 sec)   58.9 MB/sec
compare = ok
file write test : sync=1 time=0.167946 sec (0.167942 sec)  199.8 MB/sec
mmap read  test : sync=0 time=0.569724 sec (0.569639 sec)   58.9 MB/sec
compare = ok
file write test : sync=0 time=0.169711 sec (0.169632 sec)  197.7 MB/sec
mmap read  test : sync=0 time=0.570536 sec (0.570463 sec)   58.8 MB/sec
compare = ok
mmap write test : sync=0 time=0.198988 sec (0.198905 sec)  168.6 MB/sec
file read  test : sync=1 time=0.401089 sec (0.401085 sec)   83.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.199280 sec (0.199197 sec)  168.4 MB/sec
file read  test : sync=0 time=0.399430 sec (0.399368 sec)   84.0 MB/sec
compare = ok

##  file-test-udmabuf-ddr-nc-wcb0
device=udmabuf-ddr-nc-wcb0
driver_version=5.0.3
size=33554432
phys_addr=0xd8000000
dma_coherent=1
mmap write test : sync=1 time=0.237719 sec (0.237715 sec)  141.2 MB/sec
mmap read  test : sync=1 time=0.569537 sec (0.569533 sec)   58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238123 sec (0.238042 sec)  140.9 MB/sec
mmap read  test : sync=1 time=0.570406 sec (0.570403 sec)   58.8 MB/sec
compare = ok
mmap write test : sync=1 time=0.237692 sec (0.237687 sec)  141.2 MB/sec
mmap read  test : sync=0 time=0.569988 sec (0.569901 sec)   58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238346 sec (0.238265 sec)  140.8 MB/sec
mmap read  test : sync=0 time=0.569465 sec (0.569400 sec)   58.9 MB/sec
compare = ok
file write test : sync=1 time=0.180402 sec (0.180399 sec)  186.0 MB/sec
mmap read  test : sync=0 time=0.569633 sec (0.569549 sec)   58.9 MB/sec
compare = ok
file write test : sync=0 time=0.181134 sec (0.181057 sec)  185.2 MB/sec
mmap read  test : sync=0 time=0.570051 sec (0.569981 sec)   58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238029 sec (0.237945 sec)  141.0 MB/sec
file read  test : sync=1 time=0.401591 sec (0.401587 sec)   83.6 MB/sec
compare = ok
mmap write test : sync=0 time=0.238641 sec (0.238559 sec)  140.6 MB/sec
file read  test : sync=0 time=0.398296 sec (0.398237 sec)   84.2 MB/sec
compare = ok

u-dma-buf のリード性能結果

mmap read および file read の性能を MB/sec で示します。

No. u-dma-buf name MSS Address Space mmap read file read
1 udmabuf-fabric-low DDR-Cached Low 91.0 100.0
2 udmabuf-fabric-high DDR-Cached High 87.0 99.2
3 udmabuf-soc-high DDR-Non-Cached High 55.2 76.7
4 udmabuf-ddr-c0 DDR-Cached Low 101.1 114.5
5 udmabuf-ddr-nc0 DDR-Non-Cached Low 58.9 84.0
6 udmabuf-ddr-nc-wcb0 DDR-Non-Cached WCB Low 58.9 84.2

u-dma-buf のライト性能結果

mmap write および file write の性能を MB/sec で示します。

No. u-dma-buf name MSS Address Space mmap write file write
1 udmabuf-fabric-low DDR-Cached Low 100.0 112.5
2 udmabuf-fabric-high DDR-Cached High 100.3 111.9
3 udmabuf-soc-high DDR-Non-Cached High 136.5 167.6
4 udmabuf-ddr-c0 DDR-Cached Low 102.9 114.8
5 udmabuf-ddr-nc0 DDR-Non-Cached Low 168.5 199.8
6 udmabuf-ddr-nc-wcb0 DDR-Non-Cached WCB Low 141.2 186.0

所感

ちょっと正直なところ、予想していた性能とは違った結果が出て戸惑っています。なにか測定手法に問題があるのかもしれません。
私が引掛った点は次のとおり。

CPU キャッシュ効いてる?

確かにリードアクセスの時は Cached の性能に対して Non-Cached がその半分くらいなので、CPU キャッシュが有効なのは間違いないのですが、思ったより差がない気がします。
それよりも、ライトアクセスの時は、Cached の性能に対して Non-Cached の方が良いのはどゆこと?

WCB 効いてる?

WCB(Write Combining Buffer) ってライトアクセスを速くする目的で搭載されていると思っていたのですが、Non-Cached よりも遲いのは何故でしょう?

mmap よりも file アクセスのほうが速いのはなぜ?

これもちょっと予想外でした。もう少し Linux Kernel の中身を勉強する必要がありそうです。

参考

MPFS-FPGA-Example-1-DISCO-KIT のリポジトリ

MPFS-DISCO-KIT 向け Ubuntu 22.04 に関する Qiita の記事

MPFS-DISCO-KIT 向け Ubuntu 22.04 のリポジトリ

u-dma-buf に関する Qiita の記事

u-dma-buf のリポジトリ

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?