はじめに
筆者は MPFS-DISCO-KIT(Microchip PolarFire SoC FPGA Discovery Kit) で動作する Ubuntu 22.04 を独自に構築しています。また、Linux でユーザー空間で動作するプログラムとハードウェアがメモリを共有するためのデバイスドライバとして u-dma-buf を開発して github に公開しています。この記事では、この MPFS-DISCO-KIT 向け Ubuntu 22.04 で u-dma-buf を動かしてみた話をします。なお、この記事では CPU が u-dma-buf にアクセスする場合を評価した結果のみを紹介しています。DMA Device が u-dma-buf にアクセスした際の性能やキャッシュのコヒーレンシーなどはまたの機会に紹介する予定です。
なお、この記事で紹介した Device Tree や プログラムなどは以下の URL で公開しています。
u-dma-buf の準備
『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり、CPU Core Complex がメモリにアクセスする場合には次の3つのモードがあり、これらのモードの切り替えは MSS アドレス空間上の物理アドレスによって決まります。
- Cached: CPU キャッシュが有効
- Non-Cached: CPU キャッシュが無効かつ WCB(Write Combining Buffer) も無効
- Non-Cached WCB: CPU キャッシュが無効だがWCB(Write Combining Buffer) は有効
つまり DMA Buffer をどのアドレスに確保するかによって CPU キャッシュが有効か無効かが変わります。
そこで u-dma-buf を準備するにあたり、次の6パターンを用意しました。
No. | u-dma-buf name | memory-region | MSS Address Space |
---|---|---|---|
1 | udmabuf-fabric-low | kernel | DDR-Cached Low |
2 | udmabuf-fabric-high | ddr_cached_high | DDR-Cached High |
3 | udmabuf-soc-high | ddr_non_cached_high | DDR-Non-Cached High |
4 | udmabuf-ddr-c0 | fabricbuf0ddrc | DDR-Cached Low |
5 | udmabuf-ddr-nc0 | fabricbuf1ddrnc | DDR-Non-Cached Low |
6 | udmabuf-ddr-nc-wcb0 | fabricbuf2ddrncwcb | DDR-Non-Cached WCB Low |
この章では、これら u-dma-buf を準備するための Device Tree とそれらを Overlay した結果を紹介します。
udmabuf-fabric-low
udmabuf-fabric-low は /fabric-bus@40000000 の配下に作ります。
その際に dma-mask プロパティに 32 を設定して、確保する DMA Buffer の物理アドレスが 32bit 以内になるようにします。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/fabric-bus@40000000";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-fabric-low {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-fabric-low";
size = <0x200000>;
dma-mask = <32>;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-fabric-low.dtb udmabuf-fabric-low.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-fabric-low
shell$ sudo cp udmabuf-fabric-low.dtb /config/device-tree/overlays/udmabuf-fabric-low/dtbo
shell$ dmesg | tail -6
[ 2581.177406] u-dma-buf udmabuf-fabric-low: driver version = 5.0.3
[ 2581.177460] u-dma-buf udmabuf-fabric-low: major number = 244
[ 2581.177484] u-dma-buf udmabuf-fabric-low: minor number = 0
[ 2581.177504] u-dma-buf udmabuf-fabric-low: phys address = 0x0000000081a00000
[ 2581.177528] u-dma-buf udmabuf-fabric-low: buffer size = 2097152
[ 2581.177552] u-dma-buf fabric-bus@40000000:udmabuf-fabric-low: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0000_81a0_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の kernel: memory@80000000 (MSS アドレス空間上のアドレスは DDR-Cached Low) に DMA Buffer が確保されたことを意味しています。
udmabuf-fabric-high
udmabuf-fabric-high は /fabric-bus@40000000 の配下に作ります。
その際に dma-mask プロパティに 40 を設定して、確保する DMA Buffer の物理アドレスが 32bit 以上でも可能になるようにします。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/fabric-bus@40000000";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-fabric-high {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-fabric-high";
size = <0x200000>;
dma-mask = <40>;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-fabric-high.dtb udmabuf-fabric-high.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-fabric-high
shell$ sudo cp udmabuf-fabric-high.dtb /config/device-tree/overlays/udmabuf-fabric-high/dtbo
shell$ dmesg | tail -6
[ 3470.093101] u-dma-buf udmabuf-fabric-high: driver version = 5.0.3
[ 3470.093157] u-dma-buf udmabuf-fabric-high: major number = 244
[ 3470.093182] u-dma-buf udmabuf-fabric-high: minor number = 0
[ 3470.093202] u-dma-buf udmabuf-fabric-high: phys address = 0x0000001032600000
[ 3470.093227] u-dma-buf udmabuf-fabric-high: buffer size = 2097152
[ 3470.093252] u-dma-buf fabric-bus@40000000:udmabuf-fabric-high: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0010_3260_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の ddr_cached_high: memory@1022000000(MSS アドレス空間上のアドレスは DDR-Cached High) に DMA Buffer が確保されたことを意味しています。
udmabuf-soc-high
udmabuf-fabric-high は /soc の配下に作ります。
その際に dma-mask プロパティに 40 を設定して、物理アドレスが 32bit以上でも可能になるようにします。なお /soc には『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり dma-ranges で DMA Buffer を確保する物理アドレスが 0x14_0000_0000 以上になるように指定されているため、dma-mask をこの 40 未満にした場合、DMA Buffer の確保に失敗します。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/soc";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-soc-high {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-soc-high";
size = <0x200000>;
dma-mask = <40>;
dma-noncoherent;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-soc-high.dtb udmabuf-soc-high.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-soc-high
shell$ sudo cp udmabuf-soc-high.dtb /config/device-tree/overlays/udmabuf-soc-high/dtbo
shell$ dmesg | tail -6
[ 3945.367440] u-dma-buf udmabuf-soc-high: driver version = 5.0.3
[ 3945.367485] u-dma-buf udmabuf-soc-high: major number = 244
[ 3945.367506] u-dma-buf udmabuf-soc-high: minor number = 0
[ 3945.367527] u-dma-buf udmabuf-soc-high: phys address = 0x0000001412200000
[ 3945.367549] u-dma-buf udmabuf-soc-high: buffer size = 2097152
[ 3945.367572] u-dma-buf soc:udmabuf-soc-high: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0014_1220_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したメモリ領域の ddr_non_cached_high: memory@1412000000 に DMA Buffer が確保されたことを意味しています。
udmabuf-ddr-c0
udmabuf-ddr-c0 は reserved-memory の fabricbuf0ddrc に DMA Buffer を確保します。
なお fabricbuf0ddrc は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域ですが MSS アドレス空間上のアドレスは DDR-Cached Low にあります。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/fabric-bus@40000000";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-ddr-c0 {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-ddr-c0";
memory-region = <&fabricbuf0ddrc>;
size = <0x2000000>;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-ddr-c0.dtb udmabuf-ddr-c0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-c0
shell$ sudo cp udmabuf-ddr-c0.dtb /config/device-tree/overlays/udmabuf-ddr-c0/dtbo
shell$ dmesg | tail -6
[ 4939.448074] u-dma-buf udmabuf-ddr-c0: driver version = 5.0.3
[ 4939.448127] u-dma-buf udmabuf-ddr-c0: major number = 244
[ 4939.448149] u-dma-buf udmabuf-ddr-c0: minor number = 0
[ 4939.448168] u-dma-buf udmabuf-ddr-c0: phys address = 0x0000000088000000
[ 4939.448192] u-dma-buf udmabuf-ddr-c0: buffer size = 33554432
[ 4939.448216] u-dma-buf fabric-bus@40000000:udmabuf-ddr-c0: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0000_8800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf0ddrc に DMA Buffer が確保されたことを意味しています。
udmabuf-ddr-nc0
udmabuf-ddr-nc0 は reserved-memory の fabricbuf1ddrnc に DMA Buffer を確保します。
なお fabricbuf1ddrnc は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域で MSS アドレス空間上のアドレスは DDR-Non-Cached Low にあります。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/fabric-bus@40000000";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-ddr-nc0 {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-ddr-nc0";
memory-region = <&fabricbuf1ddrnc>;
size = <0x2000000>;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-ddr-nc0.dtb udmabuf-ddr-nc0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-nc0
shell$ sudo cp udmabuf-ddr-nc0.dtb /config/device-tree/overlays/udmabuf-ddr-nc0/dtbo
shell$ dmesg | tail -6
[ 6876.210642] u-dma-buf udmabuf-ddr-nc0: driver version = 5.0.3
[ 6876.210689] u-dma-buf udmabuf-ddr-nc0: major number = 244
[ 6876.210713] u-dma-buf udmabuf-ddr-nc0: minor number = 0
[ 6876.210733] u-dma-buf udmabuf-ddr-nc0: phys address = 0x00000000c8000000
[ 6876.210756] u-dma-buf udmabuf-ddr-nc0: buffer size = 33554432
[ 6876.210779] u-dma-buf fabric-bus@40000000:udmabuf-ddr-nc0: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0000_c800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf1ddrnc に DMA Buffer が確保されたことを意味しています。
udmabuf-ddr-nc-wcb0
udmabuf-ddr-nc-wcb0 は reserved-memory の fabricbuf2ddrncwcb に DMA Buffer を確保します。
なお fabricbuf2ddrncwcb は『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明したとおり ddr_non_cached_low に予約されたメモリ領域で MSS アドレス空間上のアドレスは DDR-Non-Cached WCB Low にあります。
具体的には次のような Device Tree を用意します。
/dts-v1/; /plugin/;
/ {
fragment@0 {
target-path = "/fabric-bus@40000000";
#address-cells = <2>;
#size-cells = <2>;
__overlay__ {
#address-cells = <2>;
#size-cells = <2>;
udmabuf-ddr-nc-wcb0 {
compatible = "ikwzm,u-dma-buf";
device-name = "udmabuf-ddr-nc-wcb0";
memory-region = <&fabricbuf2ddrncwcb>;
size = <0x2000000>;
};
};
};
};
この Device Tree をオーバレイします。
shell$ dtc -I dts -O dtb -o udmabuf-ddr-nc-wcb0.dtb udmabuf-ddr-nc-wcb0.dts
shell$ sudo mkdir /config/device-tree/overlays/udmabuf-ddr-nc-wcb0
shell$ sudo cp udmabuf-ddr-nc-wcb0.dtb /config/device-tree/overlays/udmabuf-ddr-nc-wcb0/dtbo
shell$ dmesg | tail -6
[ 7716.790940] u-dma-buf udmabuf-ddr-nc-wcb0: driver version = 5.0.3
[ 7716.790990] u-dma-buf udmabuf-ddr-nc-wcb0: major number = 244
[ 7716.791013] u-dma-buf udmabuf-ddr-nc-wcb0: minor number = 0
[ 7716.791034] u-dma-buf udmabuf-ddr-nc-wcb0: phys address = 0x00000000d8000000
[ 7716.791057] u-dma-buf udmabuf-ddr-nc-wcb0: buffer size = 33554432
[ 7716.791080] u-dma-buf fabric-bus@40000000:udmabuf-ddr-nc-wcb0: driver installed.
確保した DMA Buffer の物理アドレスが 0x0000_0000_d800_0000 になっています。つまり『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』 で説明した予約メモリ fabricbuf2ddrncwcb に DMA Buffer が確保されたことを意味しています。
u-dma-buf の性能
性能評価用プログラム
CPU が u-dma-buf にアクセスするさいの性能を評価するにあたり、次のようなプログラムを用意しました。
u-dma-buf-file-test.c (長いので折りたたみ)
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/utsname.h>
#include <inttypes.h>
struct u_dma_buf
{
char* name;
char* dev_name;
char* sys_path;
char* version;
uint64_t phys_addr;
size_t size;
int dma_coherent;
int sync_for_cpu_file;
int sync_for_dev_file;
char sync_command[1024];
int sync_command_len;
};
const int U_DMA_BUF_READ_WRITE = 0;
const int U_DMA_BUF_WRITE_ONLY = 1;
const int U_DMA_BUF_READ_ONLY = 2;
void u_dma_buf_destroy(struct u_dma_buf* this)
{
if (this == NULL)
return;
if (this->sync_for_cpu_file >= 0) close(this->sync_for_cpu_file);
if (this->sync_for_dev_file >= 0) close(this->sync_for_dev_file);
if (this->name != NULL) free(this->name);
if (this->dev_name != NULL) free(this->dev_name);
if (this->sys_path != NULL) free(this->sys_path);
if (this->version != NULL) free(this->version);
free(this);
}
struct u_dma_buf* u_dma_buf_create(char* name)
{
struct u_dma_buf* this;
char file_name[1024];
char attr[1024];
int str_len;
int fd;
if ((this = calloc(1, sizeof(struct u_dma_buf))) == NULL) {
printf("Can not alloc u_dma_buf\n");
goto failed;
}
this->sync_for_cpu_file = -1;
this->sync_for_dev_file = -1;
if ((this->name = strdup(name)) == NULL) {
printf("Can not alloc this->name\n");
goto failed;
}
str_len = sprintf(file_name, "/dev/%s", this->name);
if ((this->dev_name = strdup(file_name)) == NULL) {
printf("Can not alloc this->dev_name\n");
goto failed;
}
str_len = sprintf(file_name, "/sys/class/u-dma-buf/%s", this->name);
if ((this->sys_path = strdup(file_name)) == NULL) {
printf("Can not alloc this->sys_path\n");
goto failed;
}
str_len = sprintf(file_name, "%s/size", this->sys_path);
if ((fd = open(file_name, O_RDONLY)) != -1) {
read(fd, attr, 1024);
sscanf(attr, "%ld", &this->size);
close(fd);
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
str_len = sprintf(file_name, "%s/phys_addr", this->sys_path);
if ((fd = open(file_name, O_RDONLY)) != -1) {
read(fd, attr, 1024);
sscanf(attr, "%lx", &this->phys_addr);
close(fd);
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
str_len = sprintf(file_name, "%s/driver_version", this->sys_path);
if ((fd = open(file_name, O_RDONLY)) != -1) {
int len;
len = read(fd, attr, 1024);
while(--len >= 0) {
if (attr[len] =='\n') {
attr[len] = '\0';
break;
}
}
this->version = strdup(attr);
close(fd);
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
str_len = sprintf(file_name, "%s/dma_coherent", this->sys_path);
if ((fd = open(file_name, O_RDONLY)) != -1) {
read(fd, attr, 1024);
sscanf(attr, "%d", &this->dma_coherent);
close(fd);
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
str_len = sprintf(file_name, "%s/sync_for_cpu", this->sys_path);
if ((fd = open(file_name, O_RDWR)) != -1) {
this->sync_for_cpu_file = fd;
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
str_len = sprintf(file_name, "%s/sync_for_device", this->sys_path);
if ((fd = open(file_name, O_RDWR)) != -1) {
this->sync_for_dev_file = fd;
} else {
printf("Can not open %s\n", file_name);
goto failed;
}
return this;
failed:
u_dma_buf_destroy(this);
return NULL;
}
int u_dma_buf_open(struct u_dma_buf* this, int flags)
{
return open(this->dev_name, flags);
}
void u_dma_buf_set_sync_area(struct u_dma_buf* this, unsigned int offset, unsigned int size, int direction)
{
this->sync_command_len =
sprintf(this->sync_command, "0x%08X%08X\n",
offset,
((size & 0xFFFFFFF0) | (direction << 2) | 1));
}
size_t u_dma_buf_sync_for_cpu(struct u_dma_buf* this)
{
if (this->sync_command_len > 0)
return write(this->sync_for_cpu_file,
this->sync_command,
this->sync_command_len);
else
return 0;
}
size_t u_dma_buf_sync_for_dev(struct u_dma_buf* this)
{
if (this->sync_command_len > 0)
return write(this->sync_for_dev_file,
this->sync_command,
this->sync_command_len);
else
return 0;
}
struct test_time
{
struct timeval main;
struct timeval sync_for_cpu;
struct timeval sync_for_dev;
struct timeval total;
};
static void diff_time(struct timeval* run_time, struct timeval* start_time, struct timeval* end_time)
{
if (end_time->tv_usec < start_time->tv_usec) {
run_time->tv_sec = end_time->tv_sec - start_time->tv_sec - 1;
run_time->tv_usec = end_time->tv_usec - start_time->tv_usec + 1000*1000;
} else {
run_time->tv_sec = end_time->tv_sec - start_time->tv_sec ;
run_time->tv_usec = end_time->tv_usec - start_time->tv_usec;
}
}
int u_dma_buf_mmap_write_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
int fd;
void* iomem;
struct timeval test_start_time, test_end_time;
struct timeval main_start_time, main_end_time;
if (sync == 0)
u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_WRITE_ONLY);
if ((fd = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
iomem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
gettimeofday(&test_start_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_cpu(this);
gettimeofday(&main_start_time, NULL);
memcpy(iomem, buf, size);
gettimeofday(&main_end_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_dev(this);
gettimeofday(&test_end_time , NULL);
if (time != NULL) {
diff_time(&time->total , &test_start_time, &test_end_time );
diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
diff_time(&time->main , &main_start_time, &main_end_time );
diff_time(&time->sync_for_dev, &main_end_time , &test_end_time );
}
(void)close(fd);
return 0;
} else {
return -1;
}
}
int u_dma_buf_mmap_read_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
int fd;
void* iomem;
struct timeval test_start_time, test_end_time;
struct timeval main_start_time, main_end_time;
if (sync == 0)
u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_READ_ONLY);
if ((fd = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
iomem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
gettimeofday(&test_start_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_cpu(this);
gettimeofday(&main_start_time, NULL);
memcpy(buf, iomem, size);
gettimeofday(&main_end_time , NULL);
if (sync == 0)
u_dma_buf_sync_for_dev(this);
gettimeofday(&test_end_time , NULL);
if (time != NULL) {
diff_time(&time->total , &test_start_time, &test_end_time );
diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
diff_time(&time->main , &main_start_time, &main_end_time );
diff_time(&time->sync_for_dev, &main_end_time , &test_end_time );
}
close(fd);
return 0;
} else {
return -1;
}
}
int u_dma_buf_file_write_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
int fd;
int len;
void* ptr;
struct timeval test_start_time, test_end_time;
struct timeval main_start_time, main_end_time;
if (sync == 0)
u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_WRITE_ONLY);
if ((fd = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
gettimeofday(&test_start_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_cpu(this);
gettimeofday(&main_start_time, NULL);
len = size;
ptr = buf;
while(len > 0) {
int count = write(fd, ptr, len);
if (count < 0) {
break;
}
ptr += count;
len -= count;
}
gettimeofday(&main_end_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_dev(this);
gettimeofday(&test_end_time, NULL);
if (time != NULL) {
diff_time(&time->total , &test_start_time, &test_end_time );
diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
diff_time(&time->main , &main_start_time, &main_end_time );
diff_time(&time->sync_for_dev, &main_end_time , &test_end_time );
}
(void)close(fd);
return 0;
} else {
return -1;
}
}
int u_dma_buf_file_read_test(struct u_dma_buf* this, void* buf, unsigned int size, int sync, struct test_time* time)
{
int fd;
int len;
void* ptr;
struct timeval test_start_time, test_end_time;
struct timeval main_start_time, main_end_time;
if (sync == 0)
u_dma_buf_set_sync_area(this, 0, size, U_DMA_BUF_READ_ONLY);
if ((fd = u_dma_buf_open(this, O_RDWR | ((sync)?O_SYNC:0))) != -1) {
gettimeofday(&test_start_time, NULL);
if (sync == 0)
u_dma_buf_sync_for_cpu(this);
gettimeofday(&main_start_time, NULL);
len = size;
ptr = buf;
while(len > 0) {
int count = read(fd, ptr, len);
if (count < 0) {
break;
}
ptr += count;
len -= count;
}
gettimeofday(&main_end_time , NULL);
if (sync == 0)
u_dma_buf_sync_for_dev(this);
gettimeofday(&test_end_time , NULL);
if (time != NULL) {
diff_time(&time->total , &test_start_time, &test_end_time );
diff_time(&time->sync_for_cpu, &test_start_time, &main_start_time);
diff_time(&time->main , &main_start_time, &main_end_time );
diff_time(&time->sync_for_dev, &main_end_time , &test_end_time );
}
close(fd);
return 0;
} else {
return -1;
}
}
int main(int argc, char* argv[])
{
struct u_dma_buf* u_dma_buf;
char device_name[256];
unsigned int err_count = 0;
size_t buf_size;
void* null_buf = NULL;
void* src0_buf = NULL;
void* src1_buf = NULL;
void* temp_buf = NULL;
int verbose = 0;
int opt;
int optindex;
struct option longopts[] = {
{ "name" , required_argument, NULL, 'n'},
{ "verbose" , no_argument , NULL, 'v'},
{ NULL , 0 , NULL, 0 },
};
strncpy(device_name, "udmabuf0", sizeof(device_name));
while ((opt = getopt_long(argc, argv, "n:", longopts, &optindex)) != -1) {
switch (opt) {
case 'n':
strncpy(device_name, optarg, sizeof(device_name));
break;
case 'v':
verbose = 1;
break;
default:
printf("error options\n");
break;
}
}
printf("device=%s\n", device_name);
//
// u_dma_buf
//
if ((u_dma_buf = u_dma_buf_create(device_name)) == NULL) {
goto done;
}
printf("driver_version=%s\n" , u_dma_buf->version );
printf("size=%ld\n" , u_dma_buf->size );
printf("phys_addr=0x%" PRIx64 "\n" , u_dma_buf->phys_addr );
printf("dma_coherent=%d\n" , u_dma_buf->dma_coherent);
//
// initilize buffers
//
buf_size = u_dma_buf->size;
if ((null_buf = malloc(buf_size)) == NULL) {
printf("Can not malloc null_buf\n");
goto done;
} else {
memset(null_buf, 0, buf_size);
}
if ((src0_buf = malloc(buf_size)) == NULL) {
printf("Can not malloc src0_buf\n");
goto done;
} else {
int* word = (int *)src0_buf;
size_t words = buf_size/sizeof(int);
for(int i = 0; i < words; i++) {
word[i] = i;
}
}
if ((src1_buf = malloc(buf_size)) == NULL) {
printf("Can not malloc src1_buf\n");
goto done;
} else {
int* word = (int *)src1_buf;
size_t words = buf_size/sizeof(int);
for(int i = 0; i < words; i++) {
word[i] = ~i;
}
}
if ((temp_buf = malloc(buf_size)) == NULL) {
printf("Can not malloc temp_buf\n");
goto done;
} else {
memset(temp_buf, 0, buf_size);
}
//
// define TEST1()
//
#define TEST1(w_type,w_sync,r_type,r_sync,src,dst,size) \
{ \
struct test_time w_time; \
struct test_time r_time; \
long long w_total_usec; \
long long r_total_usec; \
memset(dst, 0, buf_size); \
printf(#w_type " write test : sync=%d ", w_sync); \
u_dma_buf_##w_type##_write_test(u_dma_buf, src, size, w_sync, &w_time); \
w_total_usec = (long long)w_time.total.tv_sec*(1000*1000)+(long long)w_time.total.tv_usec; \
printf("time=%ld.%06ld sec (%ld.%06ld sec) ", w_time.total.tv_sec, w_time.total.tv_usec, w_time.main.tv_sec, w_time.main.tv_usec); \
printf("%6.1f MB/sec\n", (double)size / (double)w_total_usec); \
printf(#r_type " read test : sync=%d ", r_sync); \
u_dma_buf_##r_type##_read_test (u_dma_buf, dst, size, r_sync, &r_time); \
r_total_usec = (long long)r_time.total.tv_sec*(1000*1000)+(long long)r_time.total.tv_usec; \
printf("time=%ld.%06ld sec (%ld.%06ld sec) ", r_time.total.tv_sec, r_time.total.tv_usec, r_time.main.tv_sec, r_time.main.tv_usec); \
printf("%6.1f MB/sec\n", (double)size / (double)r_total_usec); \
if (memcmp(dst, src, size) != 0) { \
printf("compare = mismatch\n"); \
err_count++; \
} else { \
printf("compare = ok\n"); \
} \
}
TEST1(mmap, 1, mmap, 1, src0_buf, temp_buf, buf_size);
TEST1(mmap, 0, mmap, 1, src1_buf, temp_buf, buf_size);
TEST1(mmap, 1, mmap, 0, src0_buf, temp_buf, buf_size);
TEST1(mmap, 0, mmap, 0, src1_buf, temp_buf, buf_size);
TEST1(file, 1, mmap, 0, src0_buf, temp_buf, buf_size);
TEST1(file, 0, mmap, 0, src1_buf, temp_buf, buf_size);
TEST1(mmap, 0, file, 1, src0_buf, temp_buf, buf_size);
TEST1(mmap, 0, file, 0, src1_buf, temp_buf, buf_size);
done:
if (temp_buf != NULL)
free(temp_buf);
if (src1_buf != NULL)
free(src1_buf);
if (src0_buf != NULL)
free(src0_buf);
if (null_buf != NULL)
free(null_buf);
if (u_dma_buf != NULL)
u_dma_buf_destroy(u_dma_buf);
}
このテストプログラムは、引数で指定された u-dma-buf に対して、mmap read、file read、mmap write、file write の性能を評価します。
性能評価用プログラムの実行ログ
性能評価用プログラムを実行してみた結果のログを示します。
u-dma-buf-file-test.log (長いので折りたたみ)
## file-test-udmabuf-fabric-low
device=udmabuf-fabric-low
driver_version=5.0.3
size=2097152
phys_addr=0x81a00000
dma_coherent=1
mmap write test : sync=1 time=0.021085 sec (0.021081 sec) 99.5 MB/sec
mmap read test : sync=1 time=0.023062 sec (0.023058 sec) 90.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.020971 sec (0.020886 sec) 100.0 MB/sec
mmap read test : sync=1 time=0.023041 sec (0.023036 sec) 91.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.021026 sec (0.021022 sec) 99.7 MB/sec
mmap read test : sync=0 time=0.023123 sec (0.023038 sec) 90.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.020986 sec (0.020903 sec) 99.9 MB/sec
mmap read test : sync=0 time=0.023063 sec (0.022997 sec) 90.9 MB/sec
compare = ok
file write test : sync=1 time=0.018645 sec (0.018641 sec) 112.5 MB/sec
mmap read test : sync=0 time=0.023189 sec (0.023106 sec) 90.4 MB/sec
compare = ok
file write test : sync=0 time=0.018813 sec (0.018737 sec) 111.5 MB/sec
mmap read test : sync=0 time=0.023196 sec (0.023127 sec) 90.4 MB/sec
compare = ok
mmap write test : sync=0 time=0.020976 sec (0.020891 sec) 100.0 MB/sec
file read test : sync=1 time=0.020546 sec (0.020542 sec) 102.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.020988 sec (0.020902 sec) 99.9 MB/sec
file read test : sync=0 time=0.020360 sec (0.020302 sec) 103.0 MB/sec
compare = ok
## file-test-udmabuf-fabric-high
device=udmabuf-fabric-high
driver_version=5.0.3
size=2097152
phys_addr=0x1032600000
dma_coherent=1
mmap write test : sync=1 time=0.020997 sec (0.020993 sec) 99.9 MB/sec
mmap read test : sync=1 time=0.024140 sec (0.024136 sec) 86.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.021072 sec (0.020989 sec) 99.5 MB/sec
mmap read test : sync=1 time=0.024104 sec (0.024100 sec) 87.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.020901 sec (0.020896 sec) 100.3 MB/sec
mmap read test : sync=0 time=0.024230 sec (0.024146 sec) 86.6 MB/sec
compare = ok
mmap write test : sync=0 time=0.021115 sec (0.021032 sec) 99.3 MB/sec
mmap read test : sync=0 time=0.024211 sec (0.024142 sec) 86.6 MB/sec
compare = ok
file write test : sync=1 time=0.018739 sec (0.018735 sec) 111.9 MB/sec
mmap read test : sync=0 time=0.024265 sec (0.024183 sec) 86.4 MB/sec
compare = ok
file write test : sync=0 time=0.018952 sec (0.018875 sec) 110.7 MB/sec
mmap read test : sync=0 time=0.024374 sec (0.024303 sec) 86.0 MB/sec
compare = ok
mmap write test : sync=0 time=0.021010 sec (0.020928 sec) 99.8 MB/sec
file read test : sync=1 time=0.021169 sec (0.021167 sec) 99.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.021256 sec (0.021050 sec) 98.7 MB/sec
file read test : sync=0 time=0.021142 sec (0.021087 sec) 99.2 MB/sec
compare = ok
## file-test-udmabuf-soc-high
device=udmabuf-soc-high
driver_version=5.0.3
size=2097152
phys_addr=0x1412200000
dma_coherent=0
mmap write test : sync=1 time=0.015515 sec (0.015511 sec) 135.2 MB/sec
mmap read test : sync=1 time=0.038190 sec (0.038187 sec) 54.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.016677 sec (0.015393 sec) 125.8 MB/sec
mmap read test : sync=1 time=0.037967 sec (0.037965 sec) 55.2 MB/sec
compare = ok
mmap write test : sync=1 time=0.015359 sec (0.015354 sec) 136.5 MB/sec
mmap read test : sync=0 time=0.040526 sec (0.037932 sec) 51.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016673 sec (0.015384 sec) 125.8 MB/sec
mmap read test : sync=0 time=0.040548 sec (0.037905 sec) 51.7 MB/sec
compare = ok
file write test : sync=1 time=0.012514 sec (0.012511 sec) 167.6 MB/sec
mmap read test : sync=0 time=0.040502 sec (0.037948 sec) 51.8 MB/sec
compare = ok
file write test : sync=0 time=0.012607 sec (0.011264 sec) 166.3 MB/sec
mmap read test : sync=0 time=0.040596 sec (0.038118 sec) 51.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016678 sec (0.015392 sec) 125.7 MB/sec
file read test : sync=1 time=0.027340 sec (0.027337 sec) 76.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.016668 sec (0.015380 sec) 125.8 MB/sec
file read test : sync=0 time=0.027377 sec (0.024853 sec) 76.6 MB/sec
compare = ok
## file-test-udmabuf-ddr-c0
device=udmabuf-ddr-c0
driver_version=5.0.3
size=33554432
phys_addr=0x88000000
dma_coherent=1
mmap write test : sync=1 time=0.326402 sec (0.326397 sec) 102.8 MB/sec
mmap read test : sync=1 time=0.335157 sec (0.335153 sec) 100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.326644 sec (0.326557 sec) 102.7 MB/sec
mmap read test : sync=1 time=0.335646 sec (0.335642 sec) 100.0 MB/sec
compare = ok
mmap write test : sync=1 time=0.326066 sec (0.326063 sec) 102.9 MB/sec
mmap read test : sync=0 time=0.335145 sec (0.335061 sec) 100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.325970 sec (0.325883 sec) 102.9 MB/sec
mmap read test : sync=0 time=0.335155 sec (0.335084 sec) 100.1 MB/sec
compare = ok
file write test : sync=1 time=0.292303 sec (0.292299 sec) 114.8 MB/sec
mmap read test : sync=0 time=0.335115 sec (0.335029 sec) 100.1 MB/sec
compare = ok
file write test : sync=0 time=0.292203 sec (0.292120 sec) 114.8 MB/sec
mmap read test : sync=0 time=0.335275 sec (0.335203 sec) 100.1 MB/sec
compare = ok
mmap write test : sync=0 time=0.326377 sec (0.326288 sec) 102.8 MB/sec
file read test : sync=1 time=0.293119 sec (0.293116 sec) 114.5 MB/sec
compare = ok
mmap write test : sync=0 time=0.326570 sec (0.326483 sec) 102.7 MB/sec
file read test : sync=0 time=0.293292 sec (0.293234 sec) 114.4 MB/sec
compare = ok
## file-test-udmabuf-ddr-nc0
device=udmabuf-ddr-nc0
driver_version=5.0.3
size=33554432
phys_addr=0xc8000000
dma_coherent=1
mmap write test : sync=1 time=0.199065 sec (0.199061 sec) 168.6 MB/sec
mmap read test : sync=1 time=0.570913 sec (0.570910 sec) 58.8 MB/sec
compare = ok
mmap write test : sync=0 time=0.199312 sec (0.199230 sec) 168.4 MB/sec
mmap read test : sync=1 time=0.570328 sec (0.570324 sec) 58.8 MB/sec
compare = ok
mmap write test : sync=1 time=0.199157 sec (0.199153 sec) 168.5 MB/sec
mmap read test : sync=0 time=0.570156 sec (0.570073 sec) 58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.200012 sec (0.199928 sec) 167.8 MB/sec
mmap read test : sync=0 time=0.569450 sec (0.569380 sec) 58.9 MB/sec
compare = ok
file write test : sync=1 time=0.167946 sec (0.167942 sec) 199.8 MB/sec
mmap read test : sync=0 time=0.569724 sec (0.569639 sec) 58.9 MB/sec
compare = ok
file write test : sync=0 time=0.169711 sec (0.169632 sec) 197.7 MB/sec
mmap read test : sync=0 time=0.570536 sec (0.570463 sec) 58.8 MB/sec
compare = ok
mmap write test : sync=0 time=0.198988 sec (0.198905 sec) 168.6 MB/sec
file read test : sync=1 time=0.401089 sec (0.401085 sec) 83.7 MB/sec
compare = ok
mmap write test : sync=0 time=0.199280 sec (0.199197 sec) 168.4 MB/sec
file read test : sync=0 time=0.399430 sec (0.399368 sec) 84.0 MB/sec
compare = ok
## file-test-udmabuf-ddr-nc-wcb0
device=udmabuf-ddr-nc-wcb0
driver_version=5.0.3
size=33554432
phys_addr=0xd8000000
dma_coherent=1
mmap write test : sync=1 time=0.237719 sec (0.237715 sec) 141.2 MB/sec
mmap read test : sync=1 time=0.569537 sec (0.569533 sec) 58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238123 sec (0.238042 sec) 140.9 MB/sec
mmap read test : sync=1 time=0.570406 sec (0.570403 sec) 58.8 MB/sec
compare = ok
mmap write test : sync=1 time=0.237692 sec (0.237687 sec) 141.2 MB/sec
mmap read test : sync=0 time=0.569988 sec (0.569901 sec) 58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238346 sec (0.238265 sec) 140.8 MB/sec
mmap read test : sync=0 time=0.569465 sec (0.569400 sec) 58.9 MB/sec
compare = ok
file write test : sync=1 time=0.180402 sec (0.180399 sec) 186.0 MB/sec
mmap read test : sync=0 time=0.569633 sec (0.569549 sec) 58.9 MB/sec
compare = ok
file write test : sync=0 time=0.181134 sec (0.181057 sec) 185.2 MB/sec
mmap read test : sync=0 time=0.570051 sec (0.569981 sec) 58.9 MB/sec
compare = ok
mmap write test : sync=0 time=0.238029 sec (0.237945 sec) 141.0 MB/sec
file read test : sync=1 time=0.401591 sec (0.401587 sec) 83.6 MB/sec
compare = ok
mmap write test : sync=0 time=0.238641 sec (0.238559 sec) 140.6 MB/sec
file read test : sync=0 time=0.398296 sec (0.398237 sec) 84.2 MB/sec
compare = ok
u-dma-buf のリード性能結果
mmap read および file read の性能を MB/sec で示します。
No. | u-dma-buf name | MSS Address Space | mmap read | file read |
---|---|---|---|---|
1 | udmabuf-fabric-low | DDR-Cached Low | 91.0 | 100.0 |
2 | udmabuf-fabric-high | DDR-Cached High | 87.0 | 99.2 |
3 | udmabuf-soc-high | DDR-Non-Cached High | 55.2 | 76.7 |
4 | udmabuf-ddr-c0 | DDR-Cached Low | 101.1 | 114.5 |
5 | udmabuf-ddr-nc0 | DDR-Non-Cached Low | 58.9 | 84.0 |
6 | udmabuf-ddr-nc-wcb0 | DDR-Non-Cached WCB Low | 58.9 | 84.2 |
u-dma-buf のライト性能結果
mmap write および file write の性能を MB/sec で示します。
No. | u-dma-buf name | MSS Address Space | mmap write | file write |
---|---|---|---|---|
1 | udmabuf-fabric-low | DDR-Cached Low | 100.0 | 112.5 |
2 | udmabuf-fabric-high | DDR-Cached High | 100.3 | 111.9 |
3 | udmabuf-soc-high | DDR-Non-Cached High | 136.5 | 167.6 |
4 | udmabuf-ddr-c0 | DDR-Cached Low | 102.9 | 114.8 |
5 | udmabuf-ddr-nc0 | DDR-Non-Cached Low | 168.5 | 199.8 |
6 | udmabuf-ddr-nc-wcb0 | DDR-Non-Cached WCB Low | 141.2 | 186.0 |
所感
ちょっと正直なところ、予想していた性能とは違った結果が出て戸惑っています。なにか測定手法に問題があるのかもしれません。
私が引掛った点は次のとおり。
CPU キャッシュ効いてる?
確かにリードアクセスの時は Cached の性能に対して Non-Cached がその半分くらいなので、CPU キャッシュが有効なのは間違いないのですが、思ったより差がない気がします。
それよりも、ライトアクセスの時は、Cached の性能に対して Non-Cached の方が良いのはどゆこと?
WCB 効いてる?
WCB(Write Combining Buffer) ってライトアクセスを速くする目的で搭載されていると思っていたのですが、Non-Cached よりも遲いのは何故でしょう?
mmap よりも file アクセスのほうが速いのはなぜ?
これもちょっと予想外でした。もう少し Linux Kernel の中身を勉強する必要がありそうです。
参考
MPFS-FPGA-Example-1-DISCO-KIT のリポジトリ
MPFS-DISCO-KIT 向け Ubuntu 22.04 に関する Qiita の記事
- 『MPFS-DISCO-KIT 向け Ubuntu 22.04 の構築(イントロ編)』@Qiita
- 『MPFS-DISCO-KIT 向け Ubuntu 22.04 の構築(HSS編)』@Qiita
- 『MPFS-DISCO-KIT 向け Ubuntu 22.04 の構築(SD-Card 作成編)』@Qiita
- 『MPFS-DISCO-KIT 向け Ubuntu 22.04 の構築(SD-Card 起動編)』@Qiita
- 『MPFS-DISCO-KIT 向け Ubuntu 22.04 のメモリマップ』@Qiita