More than 1 year has passed since last update.

Dhrystone [Version C/1.1] speed test をやってみる

Last updated at 2024-08-14Posted at 2022-12-30

先日、The IBM 1401 Demo Lab and Restoration Project Computer History Museum で、dhrystoneベンチマークのソースコードを発見したので、動くかどうか試してみよう。
https://ibm-1401.info/Dhrystone.html

ソースコードは dry.c らしい。ソースの中にベンチマークのリザルトまで書いてあるという豪快な構成になってるし、関数プロトタイプ宣言を使わず、昔のK&R本にあったような構成。enum型が使えないコンパイラとかにも対応してるようで、物凄く古いソースコードの雰囲気が漂っている。NULL値やbool型相当の値に入れるTRUEやFALSEの値も自分で #define してるぐらいだ。
バージョンは C/1.1 で1984年12月1日リリース。コードのアップデートは1986年1月6日、ベンチマーク結果のリスト部分は1986年3月31日にアップデートされているらしい。

そのまま手元のLinux(AMD64)機に入れてコンパイルしてみると、結構な数のWarningが出たので、ちょっと手を入れてみた。

dhrystone1.c

/***** hpda:net.sources / homxb!gemini /  1:58 am  Apr  1, 1986*/
/*	EVERBODY:	Please read "APOLOGY" below. -rick 01/06/85
 *			See introduction in net.arch, or net.micro
 *
 *	"DHRYSTONE" Benchmark Program
 *
 *	Version:	C/1.1, 12/01/84
 *
 *	Date:		PROGRAM updated 01/06/86, RESULTS updated 03/31/86
 *
 *	Author:		Reinhold P. Weicker,  CACM Vol 27, No 10, 10/84 pg. 1013
 *			Translated from ADA by Rick Richardson
 *			Every method to preserve ADA-likeness has been used,
 *			at the expense of C-ness.
 *
 *	Compile:	cc -O dry.c -o drynr			: No registers
 *			cc -O -DREG=register dry.c -o dryr	: Registers
 *
 *	Defines:	Defines are provided for old C compiler's
 *			which don't have enums, and can't assign structures.
 *			The time(2) function is library dependant; Most
 *			return the time in seconds, but beware of some, like
 *			Aztec C, which return other units.
 *			The LOOPS define is initially set for 50000 loops.
 *			If you have a machine with large integers and is
 *			very fast, please change this number to 500000 to
 *			get better accuracy.  Please select the way to
 *			measure the execution time using the TIME define.
 *			For single user machines, time(2) is adequate. For
 *			multi-user machines where you cannot get single-user
 *			access, use the times(2) function.  If you have
 *			neither, use a stopwatch in the dead of night.
 *			Use a "printf" at the point marked "start timer"
 *			to begin your timings. DO NOT use the UNIX "time(1)"
 *			command, as this will measure the total time to
 *			run this program, which will (erroneously) include
 *			the time to malloc(3) storage and to compute the
 *			time it takes to do nothing.
 *
 *	Run:		drynr; dryr
 *
 *	Results:	If you get any new machine/OS results, please send to:
 *
 *				ihnp4!castor!pcrat!rick
 *
 *			and thanks to all that do.  Space prevents listing
 *			the names of those who have provided some of these
 *			results.  I'll be forwarding these results to
 *			Rheinhold Weicker.
 *
 *	Note:		I order the list in increasing performance of the
 *			"with registers" benchmark.  If the compiler doesn't
 *			provide register variables, then the benchmark
 *			is the same for both REG and NOREG.
 *
 *	PLEASE:		Send complete information about the machine type,
 *			clock speed, OS and C manufacturer/version.  If
 *			the machine is modified, tell me what was done.
 *			On UNIX, execute uname -a and cc -V to get this info.
 *
 *	80x8x NOTE:	80x8x benchers: please try to do all memory models
 *			for a particular compiler.
 *
 *	APOLOGY (1/30/86):
 *		Well, I goofed things up!  As pointed out by Haakon Bugge,
 *		the line of code marked "GOOF" below was missing from the
 *		Dhrystone distribution for the last several months.  It
 *		*WAS* in a backup copy I made last winter, so no doubt it
 *		was victimized by sleepy fingers operating vi!
 *
 *		The effect of the line missing is that the reported benchmarks
 *		are 15% too fast (at least on a 80286).  Now, this creates
 *		a dilema - do I throw out ALL the data so far collected
 *		and use only results from this (corrected) version, or
 *		do I just keep collecting data for the old version?
 *
 *		Since the data collected so far *is* valid as long as it
 *		is compared with like data, I have decided to keep
 *		TWO lists- one for the old benchmark, and one for the
 *		new.  This also gives me an opportunity to correct one
 *		other error I made in the instructions for this benchmark.
 *		My experience with C compilers has been mostly with
 *		UNIX 'pcc' derived compilers, where the 'optimizer' simply
 *		fixes sloppy code generation (peephole optimization).
 *		But today, there exist C compiler optimizers that will actually
 *		perform optimization in the Computer Science sense of the word,
 *		by removing, for example, assignments to a variable whose
 *		value is never used.  Dhrystone, unfortunately, provides
 *		lots of opportunities for this sort of optimization.
 *
 *		I request that benchmarkers re-run this new, corrected
 *		version of Dhrystone, turning off or bypassing optimizers
 *		which perform more than peephole optimization.  Please
 *		indicate the version of Dhrystone used when reporting the
 *		results to me.
 *		
 * RESULTS BEGIN HERE
 *
 *----------------DHRYSTONE VERSION 1.1 RESULTS BEGIN--------------------------
 *
 * MACHINE	MICROPROCESSOR	OPERATING	COMPILER	DHRYSTONES/SEC.
 * TYPE				SYSTEM				NO REG	REGS
 * --------------------------	------------	-----------	---------------
 * Apple IIe	65C02-1.02Mhz	DOS 3.3		Aztec CII v1.05i  37	  37
 * -		Z80-2.5Mhz	CPM-80 v2.2	Aztec CII v1.05g  91	  91
 * -		8086-8Mhz	RMX86 V6	Intel C-86 V2.0	 197	 203LM??
 * IBM PC/XT	8088-4.77Mhz	COHERENT 2.3.43	Mark Wiiliams	 259	 275
 * -		8086-8Mhz	RMX86 V6	Intel C-86 V2.0	 287	 304 ??
 * Fortune 32:16 68000-6Mhz	V7+sys3+4.1BSD  cc		 360	 346
 * PDP-11/34A	w/FP-11C	UNIX V7m	cc		 406	 449
 * Macintosh512	68000-7.7Mhz	Mac ROM O/S	DeSmet(C ware)	 625	 625
 * VAX-11/750	w/FPA		UNIX 4.2BSD	cc		 831	 852
 * DataMedia 932 68000-10Mhz	UNIX sysV	cc		 837	 888
 * Plexus P35	68000-12.5Mhz	UNIX sysIII	cc		 835	 894
 * ATT PC7300	68010-10Mhz	UNIX 5.0.3	cc		 973	1034
 * Compaq II	80286-8Mhz	MSDOS 3.1	MS C 3.0 	1086	1140 LM
 * IBM PC/AT    80286-7.5Mhz    Venix/286 SVR2  cc              1159    1254 *15
 * Compaq II	80286-8Mhz	MSDOS 3.1	MS C 3.0 	1190	1282 MM
 * MicroVAX II	-		Mach/4.3	cc		1361	1385
 * DEC uVAX II	-		Ultrix-32m v1.1	cc		1385	1399
 * Compaq II	80286-8Mhz	MSDOS 3.1	MS C 3.0 	1351	1428
 * VAX 11/780	-		UNIX 4.2BSD	cc		1417	1441
 * VAX-780/MA780		Mach/4.3	cc		1428	1470
 * VAX 11/780	-		UNIX 5.0.1	cc 4.1.1.31	1650	1640
 * Ridge 32C V1	-		ROS 3.3		Ridge C (older)	1628	1695
 * Gould PN6005	-		UTX 1.1c+ (4.2)	cc		1732	1884
 * Gould PN9080	custom ECL	UTX-32 1.1C	cc		4745	4992
 * VAX-784	-		Mach/4.3	cc		5263	5555 &4
 * VAX 8600	-		4.3 BSD		cc		6329	6423
 * Amdahl 5860	-		UTS sysV	cc 1.22	       28735   28846
 * IBM3090/200	-		?		?	       31250   31250
 *
 *
 *----------------DHRYSTONE VERSION 1.0 RESULTS BEGIN--------------------------
 *
 * MACHINE	MICROPROCESSOR	OPERATING	COMPILER	DHRYSTONES/SEC.
 * TYPE				SYSTEM				NO REG	REGS
 * --------------------------	------------	-----------	---------------
 * Commodore 64	6510-1MHz	C64 ROM		C Power 2.8	  36	  36
 * HP-110	8086-5.33Mhz	MSDOS 2.11	Lattice 2.14	 284	 284
 * IBM PC/XT	8088-4.77Mhz	PC/IX		cc		 271	 294
 * CCC 3205	-		Xelos(SVR2) 	cc		 558	 592
 * Perq-II	2901 bitslice	Accent S5c 	cc (CMU)	 301	 301
 * IBM PC/XT	8088-4.77Mhz	COHERENT 2.3.43	MarkWilliams cc  296	 317
 * Cosmos	68000-8Mhz	UniSoft		cc		 305	 322
 * IBM PC/XT	8088-4.77Mhz	Venix/86 2.0	cc		 297	 324
 * DEC PRO 350  11/23           Venix/PRO SVR2  cc               299     325
 * IBM PC	8088-4.77Mhz	MSDOS 2.0	b16cc 2.0	 310	 340
 * PDP11/23	11/23           Venix (V7)      cc               320     358
 * Commodore Amiga		?		Lattice 3.02	 368	 371
 * PC/XT        8088-4.77Mhz    Venix/86 SYS V  cc               339     377
 * IBM PC	8088-4.77Mhz	MSDOS 2.0	CI-C86 2.20M	 390	 390
 * IBM PC/XT	8088-4.77Mhz	PCDOS 2.1	Wizard 2.1	 367	 403
 * IBM PC/XT	8088-4.77Mhz	PCDOS 3.1	Lattice 2.15	 403	 403 @
 * Colex DM-6	68010-8Mhz	Unisoft SYSV	cc		 378	 410
 * IBM PC	8088-4.77Mhz	PCDOS 3.1	Datalight 1.10	 416	 416
 * IBM PC	NEC V20-4.77Mhz	MSDOS 3.1	MS 3.1 		 387	 420
 * IBM PC/XT	8088-4.77Mhz	PCDOS 2.1	Microsoft 3.0	 390	 427
 * IBM PC	NEC V20-4.77Mhz	MSDOS 3.1	MS 3.1 (186) 	 393	 427
 * PDP-11/34	-		UNIX V7M	cc		 387	 438
 * IBM PC	8088, 4.77mhz	PC-DOS 2.1	Aztec C v3.2d	 423	 454
 * Tandy 1000	V20, 4.77mhz	MS-DOS 2.11	Aztec C v3.2d	 423	 458
 * Tandy TRS-16B 68000-6Mhz	Xenix 1.3.5	cc		 438	 458
 * PDP-11/34	-		RSTS/E		decus c		 438	 495
 * Onyx C8002	Z8000-4Mhz	IS/1 1.1 (V7)	cc		 476	 511
 * Tandy TRS-16B 68000-6Mhz	Xenix 1.3.5	Green Hills	 609	 617
 * DEC PRO 380  11/73           Venix/PRO SVR2  cc               577     628
 * FHL QT+	68000-10Mhz	Os9/68000	version 1.3	 603	 649 FH
 * Apollo DN550	68010-?Mhz	AegisSR9/IX	cc 3.12		 666	 666
 * HP-110	8086-5.33Mhz	MSDOS 2.11	Aztec-C		 641	 676 
 * ATT PC6300	8086-8Mhz	MSDOS 2.11	b16cc 2.0	 632	 684
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	CI-C86 2.1	 666	 684
 * Tandy 6000	68000-8Mhz	Xenix 3.0	cc		 694	 694
 * IBM PC/AT	80286-6Mhz	Xenix 3.0	cc		 684	 704 MM
 * Macintosh	68000-7.8Mhz 2M	Mac Rom		Mac C 32 bit int 694	 704
 * Macintosh	68000-7.7Mhz	-		MegaMax C 2.0	 661	 709
 * Macintosh512	68000-7.7Mhz	Mac ROM O/S	DeSmet(C ware)	 714	 714
 * IBM PC/AT	80286-6Mhz	Xenix 3.0	cc		 704	 714 LM
 * Codata 3300	68000-8Mhz	UniPlus+ (v7)	cc		 678	 725
 * WICAT MB	68000-8Mhz	System V	WICAT C 4.1	 585	 731 ~
 * Cadmus 9000	68010-10Mhz	UNIX		cc		 714	 735
 * AT&T 6300    8086-8Mhz       Venix/86 SVR2   cc               668     743
 * Cadmus 9790	68010-10Mhz 1MB	SVR0,Cadmus3.7	cc		 720	 747
 * NEC PC9801F	8086-8Mhz	PCDOS 2.11	Lattice 2.15	 768	  -  @
 * ATT PC6300	8086-8Mhz	MSDOS 2.11	CI-C86 2.20M	 769	 769
 * Burroughs XE550 68010-10Mhz	Centix 2.10	cc		 769	 769 CT1
 * EAGLE/TURBO  8086-8Mhz       Venix/86 SVR2   cc               696     779
 * ALTOS 586	8086-10Mhz	Xenix 3.0b	cc 		 724	 793
 * DEC 11/73	J-11 micro	Ultrix-11 V3.0	cc		 735	 793
 * ATT 3B2/300	WE32000-?Mhz	UNIX 5.0.2	cc		 735	 806
 * Apollo DN320	68010-?Mhz	AegisSR9/IX	cc 3.12		 806	 806
 * IRIS-2400	68010-10Mhz	UNIX System V	cc		 772	 829
 * Atari 520ST  68000-8Mhz      TOS             DigResearch      839     846
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	MS 3.0(large)	 833	 847 LM
 * WICAT MB	68000-8Mhz	System V	WICAT C 4.1	 675	 853 S~
 * VAX 11/750	-		Ultrix 1.1	4.2BSD cc	 781	 862
 * CCC  7350A	68000-8MHz	UniSoft V.2	cc		 821	 875
 * VAX 11/750	-		UNIX 4.2bsd	cc		 862	 877
 * Fast Mac	68000-7.7Mhz	-		MegaMax C 2.0	 839	 904 +
 * IBM PC/XT	8086-9.54Mhz	PCDOS 3.1	Microsoft 3.0	 833	 909 C1
 * DEC 11/44			Ultrix-11 V3.0	cc		 862	 909
 * Macintosh	68000-7.8Mhz 2M	Mac Rom		Mac C 16 bit int 877	 909 S
 * CCC 3210	-		Xelos R01(SVR2)	cc		 849	 924
 * CCC 3220	-               Ed. 7 v2.3      cc		 892	 925
 * IBM PC/AT	80286-6Mhz	Xenix 3.0	cc -i		 909	 925
 * AT&T 6300	8086, 8mhz	MS-DOS 2.11	Aztec C v3.2d	 862	 943
 * IBM PC/AT	80286-6Mhz	Xenix 3.0	cc		 892	 961
 * VAX 11/750	w/FPA		Eunice 3.2	cc		 914	 976
 * IBM PC/XT	8086-9.54Mhz	PCDOS 3.1	Wizard 2.1	 892	 980 C1
 * IBM PC/XT	8086-9.54Mhz	PCDOS 3.1	Lattice 2.15	 980	 980 C1
 * Plexus P35	68000-10Mhz	UNIX System III cc		 984	 980
 * PDP-11/73	KDJ11-AA 15Mhz	UNIX V7M 2.1	cc		 862     981
 * VAX 11/750	w/FPA		UNIX 4.3bsd	cc		 994	 997
 * IRIS-1400	68010-10Mhz	UNIX System V	cc		 909	1000
 * IBM PC/AT	80286-6Mhz	Venix/86 2.1	cc		 961	1000
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	b16cc 2.0	 943	1063
 * Zilog S8000/11 Z8001-5.5Mhz	Zeus 3.2	cc		1011	1084
 * NSC ICM-3216 NSC 32016-10Mhz	UNIX SVR2	cc		1041	1084
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	MS 3.0(small)	1063	1086
 * VAX 11/750	w/FPA		VMS		VAX-11 C 2.0	 958	1091
 * Stride	68000-10Mhz	System-V/68	cc		1041	1111
 * Plexus P/60  MC68000-12.5Mhz	UNIX SYSIII	Plexus		1111	1111
 * ATT PC7300	68010-10Mhz	UNIX 5.0.2	cc		1041	1111
 * CCC 3230	-		Xelos R01(SVR2)	cc		1040	1126
 * Stride	68000-12Mhz	System-V/68	cc		1063	1136
 * IBM PC/AT    80286-6Mhz      Venix/286 SVR2  cc              1056    1149
 * Plexus P/60  MC68000-12.5Mhz	UNIX SYSIII	Plexus		1111	1163 T
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	Datalight 1.10	1190	1190
 * ATT PC6300+	80286-6Mhz	MSDOS 3.1	b16cc 2.0	1111	1219
 * IBM PC/AT	80286-6Mhz	PCDOS 3.1	Wizard 2.1	1136	1219
 * Sun2/120	68010-10Mhz	Sun 4.2BSD	cc		1136	1219
 * IBM PC/AT	80286-6Mhz	PCDOS 3.0	CI-C86 2.20M	1219	1219
 * WICAT PB	68000-8Mhz	System V	WICAT C 4.1	 998	1226 ~
 * MASSCOMP 500	68010-10MHz	RTU V3.0	cc (V3.2)	1156	1238
 * Alliant FX/8 IP (68012-12Mhz) Concentrix	cc -ip;exec -i 	1170	1243 FX
 * Cyb DataMate	68010-12.5Mhz	Uniplus 5.0	Unisoft cc	1162	1250
 * PDP 11/70	-		UNIX 5.2	cc		1162	1250
 * IBM PC/AT	80286-6Mhz	PCDOS 3.1	Lattice 2.15	1250	1250
 * IBM PC/AT	80286-7.5Mhz	Venix/86 2.1	cc		1190	1315 *15
 * Sun2/120	68010-10Mhz	Standalone	cc		1219	1315
 * Intel 380	80286-8Mhz	Xenix R3.0up1	cc		1250	1315 *16
 * Sequent Balance 8000	NS32032-10MHz	Dynix 2.0	cc	1250	1315 N12
 * IBM PC/DSI-32 32032-10Mhz	MSDOS 3.1	GreenHills 2.14	1282	1315 C3
 * ATT 3B2/400	WE32100-?Mhz	UNIX 5.2	cc		1315	1315
 * CCC 3250XP	-		Xelos R01(SVR2)	cc		1215	1318
 * IBM PC/RT 032 RISC(801?)?Mhz BSD 4.2         cc              1248    1333 RT
 * DG MV4000	-		AOS/VS 5.00	cc		1333	1333
 * IBM PC/AT	80286-8Mhz	Venix/86 2.1	cc		1275	1380 *16
 * IBM PC/AT	80286-6Mhz	MSDOS 3.0	Microsoft 3.0	1250	1388
 * ATT PC6300+	80286-6Mhz	MSDOS 3.1	CI-C86 2.20M	1428	1428
 * COMPAQ/286   80286-8Mhz      Venix/286 SVR2  cc              1326    1443
 * IBM PC/AT    80286-7.5Mhz    Venix/286 SVR2  cc              1333    1449 *15
 * WICAT PB	68000-8Mhz	System V	WICAT C 4.1	1169	1464 S~
 * Tandy II/6000 68000-8Mhz	Xenix 3.0	cc      	1384	1477
 * MicroVAX II	-		Mach/4.3	cc		1513	1536
 * WICAT MB	68000-12.5Mhz	System V	WICAT C 4.1	1246	1537 ~
 * IBM PC/AT    80286-9Mhz      SCO Xenix V     cc              1540    1556 *18
 * Cyb DataMate	68010-12.5Mhz	Uniplus 5.0	Unisoft cc	1470	1562 S
 * VAX 11/780	-		UNIX 5.2	cc		1515	1562
 * MicroVAX-II	-		-		-		1562	1612
 * VAX-780/MA780		Mach/4.3	cc		1587	1612
 * VAX 11/780	-		UNIX 4.3bsd	cc		1646	1662
 * Apollo DN660	-		AegisSR9/IX	cc 3.12		1666	1666
 * ATT 3B20	-		UNIX 5.2	cc		1515	1724
 * NEC PC-98XA	80286-8Mhz	PCDOS 3.1	Lattice 2.15	1724	1724 @
 * HP9000-500	B series CPU	HP-UX 4.02	cc		1724	-
 * Ridge 32C V1	-		ROS 3.3		Ridge C (older)	1776	-
 * IBM PC/STD	80286-8Mhz	MSDOS 3.0 	Microsoft 3.0	1724	1785 C2
 * WICAT MB	68000-12.5Mhz	System V	WICAT C 4.1	1450	1814 S~
 * WICAT PB	68000-12.5Mhz	System V	WICAT C 4.1	1530	1898 ~
 * DEC-2065	KL10-Model B	TOPS-20 6.1FT5	Port. C Comp.	1937	1946
 * Gould PN6005	-		UTX 1.1(4.2BSD)	cc		1675	1964
 * DEC2060	KL-10		TOPS-20		cc		2000	2000 NM
 * Intel 310AP	80286-8Mhz	Xenix 3.0	cc		1893	2009
 * VAX 11/785	-		UNIX 5.2	cc		2083	2083
 * VAX 11/785	-		VMS		VAX-11 C 2.0	2083	2083
 * VAX 11/785	-		UNIX SVR2	cc		2123	2083
 * VAX 11/785   -               ULTRIX-32 1.1   cc		2083    2091 
 * VAX 11/785	-		UNIX 4.3bsd	cc		2135	2136
 * WICAT PB	68000-12.5Mhz	System V	WICAT C 4.1	1780	2233 S~
 * Pyramid 90x	-		OSx 2.3		cc		2272	2272
 * Pyramid 90x	FPA,cache,4Mb	OSx 2.5		cc no -O	2777	2777
 * Pyramid 90x	w/cache		OSx 2.5		cc w/-O		3333	3333
 * IBM-4341-II	-		VM/SP3		Waterloo C 1.2  3333	3333
 * IRIS-2400T	68020-16.67Mhz	UNIX System V	cc		3105	3401
 * Celerity C-1200 ?		UNIX 4.2BSD	cc		3485	3468
 * SUN 3/75	68020-16.67Mhz	SUN 4.2 V3	cc		3333	3571
 * IBM-4341	Model 12	UTS 5.0		?		3685	3685
 * SUN-3/160    68020-16.67Mhz  Sun 4.2 V3.0A   cc		3381    3764
 * Sun 3/180	68020-16.67Mhz	Sun 4.2		cc		3333	3846
 * IBM-4341	Model 12	UTS 5.0		?		3910	3910 MN
 * MC 5400	68020-16.67MHz	RTU V3.0	cc (V4.0)	3952	4054
 * Intel 386/20	80386-12.5Mhz	PMON debugger	Intel C386v0.2	4149	4386
 * NCR Tower32  68020-16.67Mhz  SYS 5.0 Rel 2.0 cc              3846	4545
 * MC 5600/5700	68020-16.67MHz	RTU V3.0	cc (V4.0)	4504	4746 %
 * Intel 386/20	80386-12.5Mhz	PMON debugger	Intel C386v0.2	4534	4794 i1
 * Intel 386/20	80386-16Mhz	PMON debugger	Intel C386v0.2	5304	5607
 * Gould PN9080	custom ECL	UTX-32 1.1C	cc		5369	5676
 * Gould 1460-342 ECL proc      UTX/32 1.1/c    cc              5342    5677 G1
 * VAX-784	-		Mach/4.3	cc		5882	5882 &4
 * Intel 386/20	80386-16Mhz	PMON debugger	Intel C386v0.2	5801	6133 i1
 * VAX 8600	-		UNIX 4.3bsd	cc		7024	7088
 * VAX 8600	-		VMS		VAX-11 C 2.0	7142	7142
 * Alliant FX/8 CE		Concentrix	cc -ce;exec -c 	6952	7655 FX
 * CCI POWER 6/32		COS(SV+4.2)	cc		7500	7800
 * CCI POWER 6/32		POWER 6 UNIX/V	cc		8236	8498
 * CCI POWER 6/32		4.2 Rel. 1.2b	cc		8963	9544
 * Sperry (CCI Power 6)		4.2BSD		cc		9345   10000
 * CRAY-X-MP/12	   105Mhz	COS 1.14	Cray C         10204   10204
 * IBM-3083	-		UTS 5.0 Rel 1	cc	       16666   12500
 * CRAY-1A	    80Mhz	CTSS		Cray C 2.0     12100   13888
 * IBM-3083	-		VM/CMS HPO 3.4	Waterloo C 1.2 13889   13889
 * Amdahl 470 V/8 		UTS/V 5.2       cc v1.23       15560   15560
 * CRAY-X-MP/48	   105Mhz	CTSS		Cray C 2.0     15625   17857
 * Amdahl 580	-		UTS 5.0 Rel 1.2	cc v1.5        23076   23076
 * Amdahl 5860	 		UTS/V 5.2       cc v1.23       28970   28970
 *
 * NOTE
 *   *   Crystal changed from 'stock' to listed value.
 *   +   This Macintosh was upgraded from 128K to 512K in such a way that
 *       the new 384K of memory is not slowed down by video generator accesses.
 *   %   Single processor; MC == MASSCOMP
 *   NM  A version 7 C compiler written at New Mexico Tech.
 *   @   vanilla Lattice compiler used with MicroPro standard library
 *   S   Shorts used instead of ints
 *   T	 with Chris Torek's patches (whatever they are).
 *   ~   For WICAT Systems: MB=MultiBus, PB=Proprietary Bus
 *   LM  Large Memory Model. (Otherwise, all 80x8x results are small model)
 *   MM  Medium Memory Model. (Otherwise, all 80x8x results are small model)
 *   C1  Univation PC TURBO Co-processor; 9.54Mhz 8086, 640K RAM
 *   C2  Seattle Telecom STD-286 board
 *   C3  Definicon DSI-32 coprocessor
 *   C?  Unknown co-processor board?
 *   CT1 Convergent Technologies MegaFrame, 1 processor.
 *   MN  Using Mike Newtons 'optimizer' (see net.sources).
 *   G1  This Gould machine has 2 processors and was able to run 2 dhrystone
 *       Benchmarks in parallel with no slowdown.
 *   FH  FHC == Frank Hogg Labs (Hazelwood Uniquad 2 in an FHL box).
 *   FX  The Alliant FX/8 is a system consisting of 1-8 CEs (computation
 *	 engines) and 1-12 IPs (interactive processors). Note N8 applies.
 *   RT  This is one of the RT's that CMU has been using for awhile.  I'm
 *	 not sure that this is identical to the machine that IBM is selling
 *	 to the public.
 *   i1  Normally, the 386/20 starter kit has a 16k direct mapped cache
 *	 which inserts 2 or 3 wait states on a write thru.  These results
 *	 were obtained by disabling the write-thru, or essentially turning
 *	 the cache into 0 wait state memory.
 *   Nnn This machine has multiple processors, allowing "nn" copies of the
 *	 benchmark to run in the same time as 1 copy.
 *   &nn This machine has "nn" processors, and the benchmark results were
 *	 obtained by having all "nn" processors working on 1 copy of dhrystone.
 *	 (Note, this is different than Nnn. Salesmen like this measure).
 *   ?   I don't trust results marked with '?'.  These were sent to me with
 *       either incomplete info, or with times that just don't make sense.
 *	 ?? means I think the performance is too poor, ?! means too good.
 *       If anybody can confirm these figures, please respond.
 *
 *  ABBREVIATIONS
 *	CCC	Concurrent Computer Corp. (was Perkin-Elmer)
 *	MC	Masscomp
 *
 *--------------------------------RESULTS END----------------------------------
 *
 *	The following program contains statements of a high-level programming
 *	language (C) in a distribution considered representative:
 *
 *	assignments			53%
 *	control statements		32%
 *	procedure, function calls	15%
 *
 *	100 statements are dynamically executed.  The program is balanced with
 *	respect to the three aspects:
 *		- statement type
 *		- operand type (for simple data types)
 *		- operand access
 *			operand global, local, parameter, or constant.
 *
 *	The combination of these three aspects is balanced only approximately.
 *
 *	The program does not compute anything meaningfull, but it is
 *	syntactically and semantically correct.
 *
 */

/* Accuracy of timings and human fatigue controlled by next two lines */
/*#define LOOPS	5000		/* Use this for slow or 16 bit machines */
/*#define LOOPS	50000		/* Use this for slow or 16 bit machines */
/*#define LOOPS	500000		/* Use this for faster machines */
#define LOOPS	50000000	/* High value for i7 QuadCore MacBook Pro */

/* Compiler dependent options */
#undef	NOENUM			/* Define if compiler has no enum's */
#undef	NOSTRUCTASSIGN		/* Define if compiler can't assign structures */

/* define only one of the next three defines */
/*#define GETRUSAGE		/* Use getrusage(2) time function */
/*#define TIMES			/* Use times(2) time function */
#define TIME			/* Use time(2) time function */

/* define the granularity of your times(2) function (when used) */
#define HZ	    60		/* times(2) returns 1/60 second (most) */
/*#define HZ	100		/* times(2) returns 1/100 second (WECo) */

/* for compatibility with goofed up version */
/*#define GOOF			/* Define if you want the goofed up version */

#ifdef GOOF
char	Version[] = "1.0";
#else
char	Version[] = "1.1";
#endif

#ifdef	NOSTRUCTASSIGN
#define	structassign(d, s)	memcpy(&(d), &(s), sizeof(d))
#else
#define	structassign(d, s)	d = s
#endif

#ifdef	NOENUM
#define	Ident1	1
#define	Ident2	2
#define	Ident3	3
#define	Ident4	4
#define	Ident5	5
typedef int	Enumeration;
#else
typedef enum	{Ident1, Ident2, Ident3, Ident4, Ident5} Enumeration;
#endif

typedef int	OneToThirty;
typedef int	OneToFifty;
typedef char	CapitalLetter;
typedef char	String30[31];
typedef int	Array1Dim[51];
typedef int	Array2Dim[51][51];

struct	Record
{
	struct Record		*PtrComp;
	Enumeration		Discr;
	Enumeration		EnumComp;
	OneToFifty		IntComp;
	String30		StringComp;
};

typedef struct Record 	RecordType;
typedef RecordType *	RecordPtr;
typedef int		boolean;

#define	NULL		0
#define	TRUE		1
#define	FALSE		0

#ifndef REG
#define	REG
#endif

#ifdef TIMES
#include 
#include 
#include 
#endif
#ifdef GETRUSAGE
#include 
#include 
#endif

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>


/* Function Prototype */
int Proc0(void);
int Proc1(REG RecordPtr PtrParIn);
int Proc2(OneToFifty *IntParIO);
int Proc3(REG RecordPtr *PtrParOut);
int Proc4(void);
int Proc5(void);
int Proc6(REG Enumeration EnumParIn, REG Enumeration *EnumParOut);
int Proc7(OneToFifty IntParI1, OneToFifty IntParI2, OneToFifty *IntParOut);
int Proc8(Array1Dim Array1Par, Array2Dim Array2Par, OneToFifty IntParI1, OneToFifty IntParI2);


/*
 * Package 1
 */
int		IntGlob;
boolean		BoolGlob;
char		Char1Glob;
char		Char2Glob;
Array1Dim	Array1Glob;
Array2Dim	Array2Glob;
RecordPtr	PtrGlb;
RecordPtr	PtrGlbNext;

Enumeration Func1(CharPar1, CharPar2)
CapitalLetter	CharPar1;
CapitalLetter	CharPar2;
{
	REG CapitalLetter	CharLoc1;
	REG CapitalLetter	CharLoc2;

	CharLoc1 = CharPar1;
	CharLoc2 = CharLoc1;
	if (CharLoc2 != CharPar2)
		return (Ident1);
	else
		return (Ident2);
}

boolean Func2(StrParI1, StrParI2)
String30	StrParI1;
String30	StrParI2;
{
	REG OneToThirty		IntLoc;
	REG CapitalLetter	CharLoc;

	IntLoc = 1;
	while (IntLoc <= 1)
		if (Func1(StrParI1[IntLoc], StrParI2[IntLoc+1]) == Ident1)
		{
			CharLoc = 'A';
			++IntLoc;
		}
	if (CharLoc >= 'W' && CharLoc <= 'Z')
		IntLoc = 7;
	if (CharLoc == 'X')
		return(TRUE);
	else
	{
		if (strcmp(StrParI1, StrParI2) > 0)
		{
			IntLoc += 7;
			return (TRUE);
		}
		else
			return (FALSE);
	}
}

boolean Func3(EnumParIn)
REG Enumeration	EnumParIn;
{
	REG Enumeration	EnumLoc;

	EnumLoc = EnumParIn;
	if (EnumLoc == Ident3) return (TRUE);
	return (FALSE);
}

extern Enumeration	Func1();
extern boolean		Func2();
extern boolean		Func3();

#ifdef	NOSTRUCTASSIGN
memcpy(d, s, l)
register char	*d;
register char	*s;
register int	l;
{
	while (l--) *d++ = *s++;
}
#endif
/* ---------- */


int Proc2(IntParIO)
OneToFifty	*IntParIO;
{
	REG OneToFifty		IntLoc;
	REG Enumeration		EnumLoc;

	IntLoc = *IntParIO + 10;
	for(;;)
	{
		if (Char1Glob == 'A')
		{
			--IntLoc;
			*IntParIO = IntLoc - IntGlob;
			EnumLoc = Ident1;
		}
		if (EnumLoc == Ident1)
			break;
	}
}

int Proc7(IntParI1, IntParI2, IntParOut)
OneToFifty	IntParI1;
OneToFifty	IntParI2;
OneToFifty	*IntParOut;
{
	REG OneToFifty	IntLoc;

	IntLoc = IntParI1 + 2;
	*IntParOut = IntParI2 + IntLoc;
}

int Proc3(REG RecordPtr *PtrParOut)
{
	if (PtrGlb != NULL)
		*PtrParOut = PtrGlb->PtrComp;
	else
		IntGlob = 100;
	Proc7(10, IntGlob, &PtrGlb->IntComp);
}

int Proc4()
{
	REG boolean	BoolLoc;

	BoolLoc = Char1Glob == 'A';
	BoolLoc |= BoolGlob;
	Char2Glob = 'B';
}

int Proc5()
{
	Char1Glob = 'A';
	BoolGlob = FALSE;
}

int Proc6(EnumParIn, EnumParOut)
REG Enumeration	EnumParIn;
REG Enumeration	*EnumParOut;
{
	*EnumParOut = EnumParIn;
	if (! Func3(EnumParIn) )
		*EnumParOut = Ident4;
	switch (EnumParIn)
	{
	case Ident1:	*EnumParOut = Ident1; break;
	case Ident2:	if (IntGlob > 100) *EnumParOut = Ident1;
			else *EnumParOut = Ident4;
			break;
	case Ident3:	*EnumParOut = Ident2; break;
	case Ident4:	break;
	case Ident5:	*EnumParOut = Ident3;
	}
}

int Proc8(Array1Par, Array2Par, IntParI1, IntParI2)
Array1Dim	Array1Par;
Array2Dim	Array2Par;
OneToFifty	IntParI1;
OneToFifty	IntParI2;
{
	REG OneToFifty	IntLoc;
	REG OneToFifty	IntIndex;

	IntLoc = IntParI1 + 5;
	Array1Par[IntLoc] = IntParI2;
	Array1Par[IntLoc+1] = Array1Par[IntLoc];
	Array1Par[IntLoc+30] = IntLoc;
	for (IntIndex = IntLoc; IntIndex <= (IntLoc+1); ++IntIndex)
		Array2Par[IntLoc][IntIndex] = IntLoc;
	++Array2Par[IntLoc][IntLoc-1];
	Array2Par[IntLoc+20][IntLoc] = Array1Par[IntLoc];
	IntGlob = 5;
}


int Proc1(PtrParIn)
REG RecordPtr	PtrParIn;
{
#define	NextRecord	(*(PtrParIn->PtrComp))

	structassign(NextRecord, *PtrGlb);
	PtrParIn->IntComp = 5;
	NextRecord.IntComp = PtrParIn->IntComp;
	NextRecord.PtrComp = PtrParIn->PtrComp;
	Proc3((RecordPtr *)NextRecord.PtrComp);
	if (NextRecord.Discr == Ident1)
	{
		NextRecord.IntComp = 6;
		Proc6(PtrParIn->EnumComp, &NextRecord.EnumComp);
		NextRecord.PtrComp = PtrGlb->PtrComp;
		Proc7(NextRecord.IntComp, 10, &NextRecord.IntComp);
	}
	else
		structassign(*PtrParIn, NextRecord);

#undef	NextRecord
}


int Proc0(void)
{
	OneToFifty		IntLoc1;
	REG OneToFifty		IntLoc2;
	OneToFifty		IntLoc3;
	REG char		CharLoc;
	REG char		CharIndex;
	Enumeration	 	EnumLoc;
	String30		String1Loc;
	String30		String2Loc;
	extern void		*malloc();

	register unsigned int	i;
#ifdef TIME
	long			time();
	long			starttime;
	long			benchtime;
	long			nulltime;

	starttime = time( (long *) 0);
	for (i = 0; i < LOOPS; ++i);
	nulltime = time( (long *) 0) - starttime; /* Computes o'head of loop */
#endif
#ifdef TIMES
	time_t			starttime;
	time_t			benchtime;
	time_t			nulltime;
	struct tms		tms;

	times(&tms); starttime = tms.tms_utime;
	for (i = 0; i < LOOPS; ++i);
	times(&tms);
	nulltime = tms.tms_utime - starttime; /* Computes overhead of looping */
#endif
#ifdef GETRUSAGE
	struct rusage starttime;
	struct rusage endtime;
	struct timeval nulltime;

	getrusage(RUSAGE_SELF, &starttime);
	for (i = 0; i < LOOPS; ++i);
	getrusage(RUSAGE_SELF, &endtime);
	nulltime.tv_sec  = endtime.ru_utime.tv_sec  - starttime.ru_utime.tv_sec;
	nulltime.tv_usec = endtime.ru_utime.tv_usec - starttime.ru_utime.tv_usec;
#endif

	PtrGlbNext = (RecordPtr) malloc(sizeof(RecordType));
	PtrGlb = (RecordPtr) malloc(sizeof(RecordType));
	PtrGlb->PtrComp = PtrGlbNext;
	PtrGlb->Discr = Ident1;
	PtrGlb->EnumComp = Ident3;
	PtrGlb->IntComp = 40;
	strcpy(PtrGlb->StringComp, "DHRYSTONE PROGRAM, SOME STRING");
#ifndef	GOOF
	strcpy(String1Loc, "DHRYSTONE PROGRAM, 1'ST STRING");	/*GOOF*/
#endif
	Array2Glob[8][7] = 10;	/* Was missing in published program */

/*****************
-- Start Timer --
*****************/
#ifdef TIME
	starttime = time( (long *) 0);
#endif
#ifdef TIMES
	times(&tms); starttime = tms.tms_utime;
#endif
#ifdef GETRUSAGE
	getrusage (RUSAGE_SELF, &starttime);
#endif
	for (i = 0; i < LOOPS; ++i)
	{

		Proc5();
		Proc4();
		IntLoc1 = 2;
		IntLoc2 = 3;
		strcpy(String2Loc, "DHRYSTONE PROGRAM, 2'ND STRING");
		EnumLoc = Ident2;
		BoolGlob = ! Func2(String1Loc, String2Loc);
		while (IntLoc1 < IntLoc2)
		{
			IntLoc3 = 5 * IntLoc1 - IntLoc2;
			Proc7(IntLoc1, IntLoc2, &IntLoc3);
			++IntLoc1;
		}
		Proc8(Array1Glob, Array2Glob, IntLoc1, IntLoc3);
		Proc1(PtrGlb);
		for (CharIndex = 'A'; CharIndex <= Char2Glob; ++CharIndex)
			if (EnumLoc == Func1(CharIndex, 'C'))
				Proc6(Ident1, &EnumLoc);
		IntLoc3 = IntLoc2 * IntLoc1;
		IntLoc2 = IntLoc3 / IntLoc1;
		IntLoc2 = 7 * (IntLoc3 - IntLoc2) - IntLoc1;
		Proc2(&IntLoc1);
	}

/*****************
-- Stop Timer --
*****************/

#ifdef TIME
	benchtime = time( (long *) 0) - starttime - nulltime;
	printf("Dhrystone(%s) time for %ld passes = %ld\n",
		Version,
		(long) LOOPS, benchtime);
	
	if(benchtime != 0)
	{
		printf("This machine benchmarks at %ld dhrystones/second\n",
			   ((long) LOOPS) / benchtime);
	}
	else
	{
		printf("This machine is too fast! Increase LOOPS value.\n");
	}
	
#endif
#ifdef TIMES
	times(&tms);
	benchtime = tms.tms_utime - starttime - nulltime;
	printf("Dhrystone(%s) time for %ld passes = %ld\n",
		Version,
		(long) LOOPS, benchtime/HZ);
	printf("This machine benchmarks at %ld dhrystones/second\n",
		((long) LOOPS) * HZ / benchtime);
#endif
#ifdef GETRUSAGE
	getrusage(RUSAGE_SELF, &endtime);
	{
	    double t = (double)(endtime.ru_utime.tv_sec
				- starttime.ru_utime.tv_sec
				- nulltime.tv_sec)
		     + (double)(endtime.ru_utime.tv_usec
				- starttime.ru_utime.tv_usec
				- nulltime.tv_usec) * 1e-6;
	    printf("Dhrystone(%s) time for %ld passes = %.1f\n",
		   Version,
		   (long)LOOPS,
		   t);
	    printf("This machine benchmarks at %.0f dhrystones/second\n",
		   (double)LOOPS / t);
	}
#endif
}

/* main function */
int main()
{
	Proc0();
	exit(0);
}

とりあえず、必要な関数を呼び出せるよう #include を追加。
動的にメモリ確保してると malloc.h が必要なんだな。

indlude

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>

関数プロトタイプ宣言を追加。

function protopype

/* Function Prototype */
int Proc0(void);
int Proc1(REG RecordPtr PtrParIn);
int Proc2(OneToFifty *IntParIO);
int Proc3(REG RecordPtr *PtrParOut);
int Proc4(void);
int Proc5(void);
int Proc6(REG Enumeration EnumParIn, REG Enumeration *EnumParOut);
int Proc7(OneToFifty IntParI1, OneToFifty IntParI2, OneToFifty *IntParOut);
int Proc8(Array1Dim Array1Par, Array2Dim Array2Par, OneToFifty IntParI1, OneToFifty IntParI2);

main()関数を最後に配置して、関数が後方参照されないように入れ替え。どれだけ意味があるかはちょっと分からないけど。
ベンチマーク関数は、Proc0(),Proc1(),Proc2(),Proc3(),Proc4(),Proc5(),Proc6(),Proc7(),Proc8() まで8つある。
他を呼び出していない単独関数はProc2(),Proc4(),Proc5(),Proc6(),Proc7(),Proc8()の6つ。

Proc1()は＝＞Proc3(),Proc6(),Proc7()を呼び出し、
Proc3()は＝＞Proc7()を呼び出しており、
Proc0は＝＞Proc5(),Proc4(),Proc7(),Proc8(),Proc1(),Proc6(),Proc2()を呼び出している。
最終的にmain()は実質的にmain()相当の機能がある司令塔になる関数、Proc0()を呼び出すので最後の最後に配置される。
確か、呼び出し元を後に持ってくることで、コンパイラが使う関数のポイントを把握しやすくなるはずなんだけど、TurboCみたいな1passのコンパイラでなければ普通に2pass目以降でこの辺りは整理されたはず。

Proc1()

Proc1(PtrParIn)
REG RecordPtr	PtrParIn;
{
#define	NextRecord	(*(PtrParIn->PtrComp))

	structassign(NextRecord, *PtrGlb);
	PtrParIn->IntComp = 5;
	NextRecord.IntComp = PtrParIn->IntComp;
	NextRecord.PtrComp = PtrParIn->PtrComp;
-	Proc3(NextRecord.PtrComp);
+	Proc3((RecordPtr *)NextRecord.PtrComp);
	if (NextRecord.Discr == Ident1)
	{
		NextRecord.IntComp = 6;
		Proc6(PtrParIn->EnumComp, &NextRecord.EnumComp);
		NextRecord.PtrComp = PtrGlb->PtrComp;
		Proc7(NextRecord.IntComp, 10, &NextRecord.IntComp);
	}
	else
		structassign(*PtrParIn, NextRecord);

#undef	NextRecord
}

Proc3()の呼び出しでWarningが出るので、キャストして止めた。
これで正しいのか、イマイチ自信がないが、警告は出なくなった。

test run

benchmania@testmachine:~$ gcc -O -DREG=register -o dhrystone1
benchmania@testmachine:~$ /dhrystone1 
Dhrystone(1.1) time for 50000000 passes = 3
This machine benchmarks at 16666666 dhrystones/second

benchmania@testmachine:~$ gcc -O -o dhrystone1 ./dhrystone1.c
benchmania@testmachine:~$ /dhrystone1 
Dhrystone(1.1) time for 50000000 passes = 2
This machine benchmarks at 25000000 dhrystones/second

もはやループ実行の速度が速すぎて、レジスタ割付けの意味があるんだか無いんだかよく分からない結果に。そもそも秒単位の精度しかないtime()で計測するのは無理があるような気がしないでもない。
ソースコードにはtime()を使う計測以外に、times()を使う方法と、getrusage()を使う方法での計測コードが実装されている。
一番高精度なのはリソース使用量を見るgetrusage()らしい。

getrusage()

/* define only one of the next three defines */
- /*#define GETRUSAGE		/* Use getrusage(2) time
+ #define GETRUSAGE		/* Use getrusage(2) time function */
/*#define TIMES			/* Use times(2) time function */
/*#define TIME			/* Use time(2) time function */

元のソースでは時間計測に必要な該当部分の #include が空欄だったので、適切なヘッダファイルを読めるように修正。

include

#ifdef TIMES
#include <sys/times.h>
// #include 
// #include 
#endif
#ifdef GETRUSAGE
#include <time.h> 
#include <sys/resource.h>
#endif

さらにループを10倍に増やしてみた。

LOOP COUNT

/* Accuracy of timings and human fatigue controlled by next two lines */
/*#define LOOPS	5000		/* Use this for slow or 16 bit machines */
/*#define LOOPS	50000		/* Use this for slow or 16 bit machines */
/*#define LOOPS	500000		/* Use this for faster machines */
- #define LOOPS	50000000	/* High value for i7 QuadCore MacBook Pro */
+ /*#define LOOPS	50000000	/* High value for i7 QuadCore MacBook Pro */
+ #define LOOPS	500000000	/* High value for Intel Core2 1.8GHz */

修正が終わったら、もう一度ベンチを回してみる。

test run-2

benchmania@testmachine:~$ gcc -O -o dhrystone1 ./dhrystone1.c
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 29.6
This machine benchmarks at 16875815 dhrystones/second
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 27.8
This machine benchmarks at 17979742 dhrystones/second
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 101.5
This machine benchmarks at 4923727 dhrystones/second
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 27.7
This machine benchmarks at 18027599 dhrystones/second

benchmania@testmachine:~$ gcc -O -DREG=register -o dhrystone1 ./dhrystone1.c
benchmania@testmachine:~$ ./dhrystone1 
Dhrystone(1.1) time for 500000000 passes = 28.0
This machine benchmarks at 17845225 dhrystones/second
benchmania@testmachine:~$ ./dhrystone1 
Dhrystone(1.1) time for 500000000 passes = 29.9
This machine benchmarks at 16739388 dhrystones/second
benchmania@testmachine:~$ ./dhrystone1 
Dhrystone(1.1) time for 500000000 passes = 29.1
This machine benchmarks at 17189801 dhrystones/second
benchmania@testmachine:~$

速くなったような、ならないような。タマに物凄く遅い値が出るのは何なんだろう。
最適化オプションを変えて試すと結構大きく変化があった。

test run-3 (Ryzen 5 3600X 3.8GHz)

benchmania@testmachine:~$ gcc -O -o dhrystone1 ./dhrystone1.c
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 10.8
This machine benchmarks at 46376812 dhrystones/second
benchmania@testmachine:~$ gcc ./dhrystone1.c -Wall -Wextra -O2 -march=native -std=gnu11 -o dhrystone1
benchmania@testmachine:~$ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 9.6
This machine benchmarks at 51948052 dhrystones/second

benchmania@testmachine:~$ gcc -O -DREG=register -o dhrystone1r ./dhrystone1.c
benchmania@testmachine:~$ ./dhrystone1r
Dhrystone(1.1) time for 500000000 passes = 10.6
This machine benchmarks at 47128130 dhrystones/second
benchmania@testmachine:~$ gcc ./dhrystone1.c -DREG=register -Wall -Wextra -O2 -march=native -std=gnu11 -o dhrystone1r
benchmania@testmachine:~$ ./dhrystone1r
Dhrystone(1.1) time for 500000000 passes = 9.7
This machine benchmarks at 51364366 dhrystones/second
benchmania@testmachine:~$

Raspberry Pi でも回してみる。
-O3 最適化はバキバキに効くみたいだけど、その後思い付きで試してみた、 NEON命令はどこいった？に書いてあった ARM NEON (asimd)命令を使ったベクトル命令での最適化オプションは効果が見られなかった。ただし、-O2 辺りからベクトル命令を積極的に使っているという話もあるので、既に使っていて改めて指定するまでもないのかもしれない。暇があればアセンブラソースを確認してみたい。

test run-4 (Raspberry Pi400 )

benchmania@testmachine:~/src/dhrystone$ ./dhrystone
Dhrystone(1.1) time for 500000000 passes = 124.8
This machine benchmarks at 4006062 dhrystones/second
benchmania@testmachine:~/src/dhrystone$ 

benchmania@testmachine:~/src/dhrystone $ make
gcc -O3 -Wall -I/usr/local/include -o dhrystone dhrystone1.c
benchmania@testmachine:~/src/dhrystone $ ./dhrystone 
Dhrystone(1.1) time for 500000000 passes = 22.2
This machine benchmarks at 22498243 dhrystones/second
benchmania@testmachine:~/src/dhrystone $ make clean
rm -f *.o *~ dhrystone
benchmania@testmachine:~/src/dhrystone $ make
gcc -O3 -Wall -march=armv8-a+simd -I/usr/local/include -o dhrystone dhrystone1.c
benchmania@testmachine:~/src/dhrystone $ ./dhrystone 
Dhrystone(1.1) time for 500000000 passes = 22.2
This machine benchmarks at 22526614 dhrystones/second
benchmania@testmachine:~/src/dhrystone $

test run-5 (Raspberry Pi4 8G)

benchmania@testmachine:~/src/dhrystone $ gcc ./dhrystone1.c -o ./dhrystone1a
benchmania@testmachine:~/src/dhrystone $ ls
dhrystone1.c  dhrystone1.c~  dhrystone1.s  dhrystone1a  dry.c
benchmania@testmachine:~/src/dhrystone $ gcc ./dhrystone1.c -O3 -Wall -march=armv8-a+simd -o ./dhrystone1b
benchmania@testmachine:~/src/dhrystone $ ls
dhrystone1.c  dhrystone1.c~  dhrystone1.s  dhrystone1a  dhrystone1b  dry.c
benchmania@testmachine:~/src/dhrystone $ ./dhrystone1a 
Dhrystone(1.1) time for 500000000 passes = 124.9
This machine benchmarks at 4004525 dhrystones/second
benchmania@testmachine:~/src/dhrystone $ ./dhrystone1b
Dhrystone(1.1) time for 500000000 passes = 22.0
This machine benchmarks at 22687387 dhrystones/second
benchmania@testmachine:~/src/dhrystone $

test run-6 (First Raspberry Pi 256M)

benchmania@testmachine:~/src/dhrystone $ gcc -O2 -Wall -I/usr/local/include dhrystone1.c -o dhrystone1
benchmania@testmachine:~/src/dhrystone $ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 243.4
This machine benchmarks at 2054281 dhrystones/second
benchmania@testmachine:~/src/dhrystone $

test run-7 (Core2 T5600 1.83GHz)

benchmania@testmachine:~/src/dhrystone $ gcc -O3 -march=core2 dhrystone1.c -o dhrystone1
benchmania@testmachine:~/src/dhrystone $ ./dhrystone1
Dhrystone(1.1) time for 500000000 passes = 18.9
This machine benchmarks at 26476746 dhrystones/second
benchmania@testmachine:~/src/dhrystone $

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up