本文转自公众号,欢迎关注
https://mp.weixin.qq.com/s/vNeecRHxSQhIglUSFFT_Pw
转自前言
性能是一个综合的体验,只有针对具体应用场景才有意义。有众多针对各个方面的基准性能测试方法,本文尽可能进行全面的测试。
以下直接串口或者SSH登录开发板在板上进行编译。
Coremark最常用的CPU基准性能测试程序。
准备下载源文件
wget https://github.com/eembc/coremark/archive/refs/heads/main.zip
复制代码解压
unzip main.zip
复制代码进入源码路径
cd coremark-main/
复制代码编辑文件
vi posix/core_portme.h
复制代码按键i进入编辑模式
根据实际的编译选项修改
#define COMPILER_FLAGS \ FLAGS_STR /* "Please put compiler flags here (e.g. -o3)" */ #endif
复制代码改为
#define COMPILER_FLAGS \ “-O3” /* "Please put compiler flags here (e.g. -o3)" */ #endif
复制代码按键esc
按键shift+:,输入wq回车保存修改
编译单线程版本
gcc -O3 -o coremark.1 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000 -I. -Iposix
复制代码多线程版本
gcc -O3 -o coremark.4 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DMULTITHREAD=4 -DUSE_FORK -DPERFORMANCE_RUN=1 -DITERATIONS=1000 -I. -Iposix
复制代码单核运行
root@IMX8-Tronlong:~/coremark-main# ./coremark.12K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 20857 Total time (secs): 20.857000 Iterations/Sec : 5274.008726 Iterations : 110000 Compiler version : GCC9.2.0 Compiler flags : -O3 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0x33ff Correct operation validated. See README.md for run and reporting rules. CoreMark 1.0 : 5274.008726 / GCC9.2.0 -O3 / Heap root@IMX8-Tronlong:~/coremark-main#
复制代码四核运行
root@IMX8-Tronlong:~/coremark-main# ./coremark.42K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 20923 Total time (secs): 20.923000 Iterations/Sec : 21029.489079 Iterations : 440000 Compiler version : GCC9.2.0 Compiler flags : -O3 Parallel Fork : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x33ff [1]crcfinal : 0x33ff [2]crcfinal : 0x33ff [3]crcfinal : 0x33ff Correct operation validated. See README.md for run and reporting rules. CoreMark 1.0 : 21029.489079 / GCC9.2.0 -O3 / Heap / 4:Fork root@IMX8-Tronlong:~/coremark-main#
复制代码进入网页
搜索i.MX可以找到,有两条记录
我们这里是1.6GHz,得分21029.489079,比如上述记录仪跑1.5GHz的高一点,比跑1.8GHz的低一点,所以跑分在上述范围之内。
Dhrystone1CPU整型运算性能的基准测量,版本1
准备下载源码
wget http://www.roylongbottom.org.uk/classic_benchmarks.tar.gz
复制代码解压
tar -xvf classic_benchmarks.tar.gz
复制代码进入源码路径
cd classic_benchmarks/source_code/dhrystone1
复制代码修改代码
dhry1.c 中
main添加返回值int
common_64bit/cpuidh.h中添加
int getDetails();void start_time(); void end_time(); void local_time();
复制代码common_64bit/cpuidc64.c中添加
#include <unistd.h>
复制代码以下两个函数需要汇编实现,我们这里没有实现注释掉
//_cpuida();//_calculateMHz();
复制代码gcc -O3 -o dhry1 dhry1.c ../common_64bit/cpuidc64.c -I../common_64bit
复制代码root@IMX8-Tronlong:~/classic_benchmarks/source_code/dhrystone1# ./dhry1 #################################################### getDetails and MHz Assembler CPUID and RDTSC CPU , Features Code 00000000, Model Code 00000000 Measured - Minimum 0 MHz, Maximum 0 MHz Linux Functions get_nprocs() - CPUs 4, Configured CPUs 4 get_phys_pages() and size - RAM Size 0.92 GB, Page Size 4096 Bytes uname() - Linux, IMX8-Tronlong, 5.4.70-g9f85d43 #60 SMP PREEMPT Tue Jul 26 19:25:28 CST 2022, aarch64 ########################################## Dhrystone Benchmark, Version 1.1 (Language: C or C++) Optimisation Opt 3 64 Bit 10000 runs 0.00 seconds 100000 runs 0.01 seconds 1000000 runs 0.07 seconds 2000000 runs 0.15 seconds 4000000 runs 0.29 seconds 8000000 runs 0.58 seconds 16000000 runs 1.17 seconds 32000000 runs 2.33 seconds Array2Glob8/7: O.K. 32000010 Microseconds for one run through Dhrystone: 0.07 Dhrystones per Second: 13720097 VAX MIPS rating = 7808.82 Press Enter
复制代码得分7808.82可以看到和Core 2Dua 1CP桌面级CPU差不多,可以说性能非常强了
CPU整型运算性能的基准测量,版本1
准备参考Dhrystone1
编译cd classic_benchmarks/source_code/dhrystone2gcc -O3 -o dhry2 dhry_1.c dhry_2.c ../common_64bit/cpuidc64.c -I../common_64bit
复制代码root@IMX8-Tronlong:~/classic_benchmarks/source_code/dhrystone2# ./dhry2 #################################################### getDetails and MHz Assembler CPUID and RDTSC CPU , Features Code 00000000, Model Code 00000000 Measured - Minimum 0 MHz, Maximum 0 MHz Linux Functions get_nprocs() - CPUs 4, Configured CPUs 4 get_phys_pages() and size - RAM Size 0.92 GB, Page Size 4096 Bytes uname() - Linux, IMX8-Tronlong, 5.4.70-g9f85d43 #60 SMP PREEMPT Tue Jul 26 19:25:28 CST 2022, aarch64 ########################################## Dhrystone Benchmark, Version 2.1 (Language: C or C++) Optimisation Opt 3 64 Bit Register option not selected 10000 runs 0.00 seconds 100000 runs 0.02 seconds 1000000 runs 0.13 seconds 2000000 runs 0.25 seconds 4000000 runs 0.50 seconds 8000000 runs 1.01 seconds 16000000 runs 2.01 seconds Final values (* implementation-dependent): Int_Glob: O.K. 5 Bool_Glob: O.K. 1 Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 16000010 Ptr_Glob-> Ptr_Comp: * -46168944 Discr: O.K. 0 Enum_Comp: O.K. 2 Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: * -46168944 same as above Discr: O.K. 0 Enum_Comp: O.K. 1 Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13 Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1 Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND STRING Microseconds for one run through Dhrystone: 0.13 Dhrystones per Second: 7943670 VAX MIPS rating = 4521.16 Press Enter
复制代码得分4521.16可以看到和Athlon 64桌面级CPU差不多.
浮点性能的基准测试
准备参考Dhrystone1
编译cd classic_benchmarks/source_code/linpackgcc -O3 -o linpack linpack.c ../common_64bit/cpuidc64.c -I../common_64bit
复制代码root@IMX8-Tronlong:~/classic_benchmarks/source_code/linpack# ./linpack #################################################### getDetails and MHz Assembler CPUID and RDTSC CPU , Features Code 00000000, Model Code 00000000 Measured - Minimum 0 MHz, Maximum 0 MHz Linux Functions get_nprocs() - CPUs 4, Configured CPUs 4 get_phys_pages() and size - RAM Size 0.92 GB, Page Size 4096 Bytes uname() - Linux, IMX8-Tronlong, 5.4.70-g9f85d43 #60 SMP PREEMPT Tue Jul 26 19:25:28 CST 2022, aarch64 ########################################## Unrolled Double Precision Linpack Benchmark - PC Version in 'C/C++' Optimisation Opt 3 64 Bit norm resid resid machep x[0]-1 x[n-1]-1 1.9 8.46778499e-14 2.22044605e-16 -1.11799459e-13 -9.60342916e-14 Times are reported for matrices of order 100 1 pass times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.00240 0.00008 0.00247 277.57 0.0072 0.0442 Calculating matgen overhead 10 times 0.00 seconds 100 times 0.02 seconds 1000 times 0.17 seconds 2000 times 0.34 seconds 4000 times 0.68 seconds 8000 times 1.35 seconds Overhead for 1 matgen 0.00017 seconds Calculating matgen/dgefa passes for 1 seconds 10 times 0.02 seconds 100 times 0.16 seconds 200 times 0.32 seconds 400 times 0.64 seconds 800 times 1.29 seconds Passes used 621 Times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.00144 0.00005 0.00149 462.06 0.0043 0.0265 0.00144 0.00005 0.00149 462.11 0.0043 0.0265 0.00144 0.00005 0.00149 462.12 0.0043 0.0265 0.00144 0.00005 0.00149 461.95 0.0043 0.0265 0.00144 0.00005 0.00149 462.05 0.0043 0.0265 Average 462.06 Calculating matgen2 overhead Overhead for 1 matgen 0.00017 seconds Times for array with leading dimension of 200 dgefa dgesl total Mflops unit ratio 0.00137 0.00005 0.00141 485.85 0.0041 0.0252 0.00137 0.00005 0.00141 485.85 0.0041 0.0252 0.00137 0.00005 0.00141 485.85 0.0041 0.0252 0.00137 0.00005 0.00141 485.78 0.0041 0.0252 0.00137 0.00005 0.00141 485.77 0.0041 0.0252 Average 485.82 Unrolled Double Precision 462.06 Mflops Press Enter
复制代码得分462.06 Mflops和Pentium 4差不多
浮点计算性能测试
准备参考Dhrystone1
编译cd classic_benchmarks/source_code/whetstonegcc -O3 -o whets whets.c ../common_64bit/cpuidc64.c -I../common_64bit -lm
复制代码root@IMX8-Tronlong:~/classic_benchmarks/source_code/whetstone# ./whets #################################################### getDetails and MHz Assembler CPUID and RDTSC CPU , Features Code 00000000, Model Code 00000000 Measured - Minimum 0 MHz, Maximum 0 MHz Linux Functions get_nprocs() - CPUs 4, Configured CPUs 4 get_phys_pages() and size - RAM Size 0.92 GB, Page Size 4096 Bytes uname() - Linux, IMX8-Tronlong, 5.4.70-g9f85d43 #60 SMP PREEMPT Tue Jul 26 19:25:28 CST 2022, aarch64 ########################################## Single Precision C Whetstone Benchmark Opt 3 64 Bit, Fri Jul 8 18:27:13 2022 Calibrate 0.01 Seconds 1 Passes (x 100) 0.03 Seconds 5 Passes (x 100) 0.13 Seconds 25 Passes (x 100) 0.65 Seconds 125 Passes (x 100) 3.23 Seconds 625 Passes (x 100) Use 1932 passes (x 100) Single Precision C/C++ Whetstone Benchmark Loop content Result MFLOPS MOPS Seconds N1 floating point -1.12475013732910156 448.191 0.083 N2 floating point -1.12274742126464844 471.630 0.551 N3 if then else 1.00000000000000000 0.000 0.000 N4 fixed point 12.00000000000000000 1995.731 0.305 N5 sin,cos etc. 0.49911010265350342 38.543 4.171 N6 floating point 0.99999982118606567 383.218 2.719 N7 assignments 3.00000000000000000 1595.638 0.224 N8 exp,sqrt etc. 0.75110864639282227 36.918 1.947 MWIPS 1932.248 9.999 A new results file, whets.txt, will have been created in the same directory as the .EXE files, if one did not already exist. Press Enter
复制代码得分1932.248和Core i52467M差不错
并发计算的基准测试
准备参考Dhrystone1
main函数天机返回类型int
添加#include <string.h>
编译cd classic_benchmarks/source_code/livermore_loopsgcc -O0 -o lloops lloops.c ../common_64bit/cpuidc64.c -I../common_64bit -lm
复制代码注意这里不能优化,否则
init函数中
for ( k=0 ; k<19977 + 34132 ; k++) { if (k == 19977) { fuzz = (double)(0.0012345); buzz = (double) (1.0) + fuzz; fizz = (double) (1.1) * fuzz; } buzz = (one - fuzz) * buzz + fuzz; fuzz = - fuzz; u[k] = (buzz - fizz) * scaled; }
复制代码k下标会溢出导致fault
运行root@IMX8-Tronlong:~/classic_benchmarks/source_code/livermore_loops# ./lloopsL.L.N.L. 'C' KERNELS: MFLOPS P.C. VERSION 4.0 Optimisation Opt 3 64 Bit #################################################### getDetails and MHz Assembler CPUID and RDTSC CPU , Features Code 00000000, Model Code 00000000 Measured - Minimum 0 MHz, Maximum 0 MHz Linux Functions get_nprocs() - CPUs 4, Configured CPUs 4 get_phys_pages() and size - RAM Size 0.92 GB, Page Size 4096 Bytes uname() - Linux, IMX8-Tronlong, 5.4.70-g9f85d43 #60 SMP PREEMPT Tue Jul 26 19:25:28 CST 2022, aarch64 Calculating outer loop overhead 1000 times 0.00 seconds 10000 times 0.00 seconds 100000 times 0.00 seconds 1000000 times 0.03 seconds 2000000 times 0.06 seconds 4000000 times 0.11 seconds 8000000 times 0.22 seconds Overhead for each loop 2.7561e-08 seconds Calibrating part 1 of 3 Loop count 4 0.00 seconds Loop count 16 0.00 seconds Loops 200 x 1 x Passes Kernel Floating Pt ops No Passes E No Total Secs. MFLOPS Span Checksums OK ------------ -- ------------- ----- ------- ---- ---------------------- -- 1 7 x 14 5 9.809800e+07 1.03 94.93 1001 5.114652693224671e+04 16 2 67 x 15 4 7.798800e+07 1.07 73.00 101 1.539721811668385e+03 15 3 9 x 18 2 6.486480e+07 1.02 63.82 1001 1.000742883066363e+01 15 4 14 x 23 2 7.728000e+07 1.00 77.29 1001 5.999250595473891e-01 16 5 10 x 15 2 6.000000e+07 1.03 58.00 1001 4.548871642387267e+03 16 6 3 x 25 2 5.952000e+07 1.02 58.48 64 4.375116344729986e+03 16 7 4 x 11 16 1.400960e+08 1.07 131.00 995 6.104251075174761e+04 16 8 10 x 5 36 7.128000e+07 1.02 70.22 100 1.501268005625795e+05 15 9 36 x 8 17 9.889920e+07 1.11 88.82 101 1.189443609974981e+05 16 10 34 x 6 9 3.708720e+07 1.09 34.09 101 7.310369784325296e+04 16 11 11 x 18 1 3.960000e+07 1.02 38.91 1001 3.342910972650109e+07 16 12 12 x 18 1 4.320000e+07 1.03 41.99 1000 2.907141294167248e-05 16 13 36 x 10 7 3.225600e+07 1.08 29.83 64 1.202533961842805e+11 15 14 2 x 8 11 3.523520e+07 1.01 34.78 1001 3.165553044000335e+09 15 15 1 x 32 33 1.056000e+08 1.01 104.88 101 3.943816690352044e+04 15 16 25 x 21 10 5.565000e+07 1.00 55.38 75 5.650760000000000e+05 16 17 35 x 17 9 1.081710e+08 1.01 107.23 101 1.114641772902486e+03 16 18 2 x 9 44 7.840800e+07 1.09 71.74 100 1.015727037502299e+05 15 19 39 x 14 6 6.617520e+07 1.00 66.35 101 5.421816960147207e+02 16 20 1 x 23 26 1.196000e+08 1.01 118.26 1000 3.040644339351239e+07 16 21 1 x 2 2 5.050000e+07 1.43 35.24 101 1.597308280710199e+08 15 22 11 x 47 17 1.775378e+08 1.02 174.51 101 2.938604376566697e+02 16 23 8 x 10 11 8.712000e+07 1.04 83.59 100 3.549900501563623e+04 16 24 5 x 43 1 4.300000e+07 1.00 42.81 1001 5.000000000000000e+02 16 Maximum Rate 174.51 Average Rate 73.13 Geometric Mean 65.73 Harmonic Mean 59.22 Minimum Rate 29.83 Do Span 471 Calibrating part 2 of 3 Loop count 8 0.00 seconds Loop count 32 0.00 seconds Loops 200 x 2 x Passes Kernel Floating Pt ops No Passes E No Total Secs. MFLOPS Span Checksums OK ------------ -- ------------- ----- ------- ---- ---------------------- -- 1 40 x 12 5 9.696000e+07 1.02 94.83 101 5.253344778937972e+02 16 2 40 x 12 4 7.449600e+07 1.02 73.00 101 1.539721811668385e+03 15 3 53 x 15 2 6.423600e+07 1.01 63.56 101 1.009741436578952e+00 16 4 70 x 22 2 7.392000e+07 1.02 72.44 101 5.999250595473891e-01 16 5 55 x 14 2 6.160000e+07 1.06 57.85 101 4.589031939600982e+01 16 6 7 x 22 2 5.913600e+07 1.04 57.05 32 8.631675645333210e+01 16 7 22 x 10 16 1.422080e+08 1.08 131.34 101 6.345586315784055e+02 16 8 6 x 5 36 8.553600e+07 1.22 70.18 100 1.501268005625795e+05 15 9 21 x 7 17 1.009596e+08 1.14 88.83 101 1.189443609974981e+05 16 10 19 x 5 9 3.454200e+07 1.01 34.09 101 7.310369784325296e+04 16 11 64 x 16 1 4.096000e+07 1.06 38.70 101 3.433560407475758e+04 16 12 68 x 16 1 4.352000e+07 1.04 41.82 100 7.127569130821465e-06 16 13 41 x 9 7 3.306240e+07 1.11 29.83 32 9.816387810944356e+10 15 14 10 x 9 11 3.999600e+07 1.10 36.30 101 3.039983465145392e+07 15 15 1 x 16 33 1.056000e+08 1.01 104.88 101 3.943816690352044e+04 15 16 27 x 19 10 5.745600e+07 1.05 54.80 40 6.480410000000000e+05 16 17 20 x 15 9 1.090800e+08 1.02 107.24 101 1.114641772902486e+03 16 18 1 x 9 44 7.840800e+07 1.09 71.74 100 1.015727037502299e+05 15 19 23 x 12 6 6.690240e+07 1.01 66.35 101 5.421816960147207e+02 16 20 8 x 15 26 1.248000e+08 1.03 121.09 100 3.126205178815431e+04 16 21 1 x 2 2 5.000000e+07 1.37 36.45 50 7.824524877232093e+07 16 22 7 x 37 17 1.778812e+08 1.02 174.53 101 2.938604376566697e+02 16 23 5 x 8 11 8.712000e+07 1.04 83.59 100 3.549900501563623e+04 16 24 31 x 35 1 4.340000e+07 1.02 42.41 101 5.000000000000000e+01 16 Maximum Rate 174.53 Average Rate 73.04 Geometric Mean 65.66 Harmonic Mean 59.26 Minimum Rate 29.83 Do Span 90 Calibrating part 3 of 3 Loop count 32 0.00 seconds Loop count 128 0.00 seconds Loops 200 x 8 x Passes Kernel Floating Pt ops No Passes E No Total Secs. MFLOPS Span Checksums OK ------------ -- ------------- ----- ------- ---- ---------------------- -- 1 28 x 16 5 9.676800e+07 1.03 94.27 27 3.855104502494961e+01 16 2 46 x 21 4 6.800640e+07 1.02 66.38 15 3.953296986903059e+01 16 3 37 x 20 2 6.393600e+07 1.02 62.73 27 2.699309089320672e-01 16 4 38 x 35 2 6.384000e+07 1.03 62.22 27 5.999250595473891e-01 16 5 40 x 18 2 5.990400e+07 1.05 57.29 27 3.182615248447483e+00 16 6 21 x 28 2 4.515840e+07 1.00 45.08 8 1.120309393467088e+00 15 7 20 x 13 16 1.397760e+08 1.07 131.09 21 2.845720217644024e+01 16 8 9 x 6 36 8.087040e+07 1.18 68.76 14 2.960543667875005e+03 15 9 26 x 9 17 9.547200e+07 1.08 88.63 15 2.623968460874250e+03 16 10 25 x 7 9 3.780000e+07 1.09 34.69 15 1.651291227698265e+03 16 11 46 x 20 1 3.827200e+07 1.01 38.01 27 6.551161335845770e+02 16 12 48 x 21 1 4.193280e+07 1.02 41.25 26 1.943435981130448e-06 16 13 31 x 11 7 3.055360e+07 1.03 29.63 8 3.847124199949431e+10 15 14 8 x 10 11 3.801600e+07 1.03 36.83 27 2.923540598672009e+06 15 15 1 x 29 33 1.071840e+08 1.00 106.99 15 1.108997288134785e+03 16 16 14 x 23 10 5.667200e+07 1.01 56.38 15 5.152160000000000e+05 16 17 26 x 20 9 1.123200e+08 1.00 112.05 15 2.947368618589361e+01 16 18 2 x 8 44 7.321600e+07 1.02 71.55 14 9.700646212337041e+02 16 19 28 x 17 6 6.854400e+07 1.05 65.36 15 1.268230698051003e+01 15 20 7 x 16 26 1.211392e+08 1.00 121.54 26 5.987713249475302e+02 16 21 1 x 1 2 4.000000e+07 1.11 36.17 20 5.009945671204667e+07 16 22 8 x 54 17 1.762560e+08 1.01 173.82 15 6.109968728263972e+00 16 23 7 x 11 11 8.808800e+07 1.06 82.76 14 4.850340602749970e+02 16 24 23 x 44 1 4.209920e+07 1.02 41.40 27 1.300000000000000e+01 16 Maximum Rate 173.82 Average Rate 71.87 Geometric Mean 64.28 Harmonic Mean 57.94 Minimum Rate 29.63 Do Span 19 Overall Part 1 weight 1 Part 2 weight 2 Part 3 weight 1 Maximum Rate 174.53 Average Rate 72.77 Geometric Mean 65.33 Harmonic Mean 58.92 Minimum Rate 29.63 Do Span 167 Press Enter
复制代码得分
Average Rate 72.77
和Celeron A差不多
下载源码
wget https://github.com/qinyunti/STREAM/archive/refs/heads/master.zip
复制代码解压
unzip master.zip
复制代码进入工作目录
cd STREAM-master/
复制代码gcc -O3 -DSTREAM_ARRAY_SIZE=5000000 stream.c -o stream.5M
复制代码Segmentation fault ./stream.5M------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 5000000 (elements), Offset = 0 (elements) Memory per array = 38.1 MiB (= 0.0 GiB). Total memory required = 114.4 MiB (= 0.1 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 31512 microseconds. (= 31512 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 3419.4 0.023552 0.023396 0.024346 Scale: 3352.3 0.024450 0.023864 0.026488 Add: 2739.5 0.043843 0.043803 0.043899 Triad: 2689.1 0.044729 0.044624 0.044947 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays ------------------------------------------------------------- root@IMX8-Tronlong:~/STREAM-master#
复制代码复制达到了3419.4MB/s
参考https://www.cs.virginia.edu/stream/ref.html
RAM压力测试准备下载代码
wget https://pyropus.ca./software/memtester/old-versions/memtester-4.6.0.tar.gz
复制代码解压代码
tar -xvf memtester-4.6.0.tar.gz
复制代码进入工作目录
cd memtester-4.6.0/
复制代码gcc -O3 memtester.c tests.c -o memtester
复制代码./memtester 128M 1
复制代码128M表示测试RAM大小
1表示测试一次
root@IMX8-Tronlong:~/memtester-4.6.0# ./memtester 128M 1memtester version 4.6.0 (64-bit) Copyright (C) 2001-2020 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 128MB (134217728 bytes) got 128MB (134217728 bytes), trying mlock ...locked. Loop 1/1: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok Done. root@IMX8-Tronlong:~/memtester-4.6.0#
复制代码https://iperf.fr/iperf-download.php#windows
下下载windows版本iPerf
我这里
PC端IP为192.168.137.1
开发板为192.168.137.84
PC端
.\iperf3.exe -s -i 2
复制代码开发板端
iperf3 -c 192.168.137.1 -i 1 -t 10
复制代码其中192.168.137.1为服务端IP
-i 1 : 测试结果报告时间间隔为 1 秒
-t 10 : 总测试时长为 10
达到了1000M网口的速率
UDP测试PC端
.\iperf3.exe -s -i 2
复制代码开发板端
iperf3 -u -c 192.168.137.1 -i 1 -t 10 -b 1000M
复制代码其中
-u : 工作在 UDP 模式
-c 192.168.137.1 : 服务端IP
-i 1 : 测试结果报告时间间隔为 1 秒
-t 10 : 总测试时长为 10 秒 Ø
-b 100M : 设定 UDP 传输带宽为 1000Mbps
iperf3 -u -c 192.168.137.1 -i 1 -t 10 -b 900M
复制代码查看emmc速率为HS200 MMC
SD卡为 CQHCI version 5.10
root@IMX8-Tronlong:~/memtester-4.6.0# dmesg | grep mmc[ 0.000000] Kernel command line: console=ttymxc1,115200 root=/dev/mmcblk1p2 rootwait rw clk_ignore_unused [ 1.738489] mmc0: CQHCI version 5.10 [ 1.773850] mmc0: SDHCI controller on 30b40000.mmc [30b40000.mmc] using ADMA [ 1.781321] mmc1: CQHCI version 5.10 [ 1.904485] mmc0: Command Queue Engine enabled [ 1.908974] mmc0: new HS200 MMC card at address 0001 [ 1.915089] mmcblk0: mmc0:0001 S40004 3.64 GiB [ 1.919880] mmcblk0boot0: mmc0:0001 S40004 partition 1 4.00 MiB [ 1.929142] mmcblk0boot1: mmc0:0001 S40004 partition 2 4.00 MiB [ 1.935944] mmcblk0rpmb: mmc0:0001 S40004 partition 3 4.00 MiB, chardev (237:0) [ 1.948443] mmcblk0: p1 p2 [ 2.635755] mmc1: CQHCI version 5.10 [ 2.639385] sdhci-esdhc-imx 30b50000.mmc: Got CD GPIO [ 2.676805] mmc1: SDHCI controller on 30b50000.mmc [30b50000.mmc] using ADMA [ 2.834214] mmc1: host does not support reading read-only switch, assuming write-enable [ 2.978345] mmc1: new ultra high speed SDR104 SDHC card at address 5048 [ 2.985757] mmcblk1: mmc1:5048 SD32G 29.7 GiB [ 2.991825] mmcblk1: p1 p2 p3 [ 3.798776] EXT4-fs (mmcblk1p2): recovery complete [ 3.805335] EXT4-fs (mmcblk1p2): mounted filesystem with ordered data mode. Opts: (null) [ 4.862350] EXT4-fs (mmcblk1p2): re-mounted. Opts: (null) [ 7.348978] EXT4-fs (mmcblk1p3): mounted filesystem with ordered data mode. Opts: (null) [ 7.552327] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) root@IMX8-Tronlong:~/memtester-4.6.0#
复制代码查看到
mmc0为emmc 4G
mmc1为sd卡 32G
查看挂载路径,emmc使用/run/media/mmcblk0p2 测试
root@IMX8-Tronlong:~/memtester-4.6.0# dfFilesystem 1K-blocks Used Available Use% Mounted on /dev/root 6053568 2055148 3671204 36% / devtmpfs 317420 4 317416 1% /dev tmpfs 482700 0 482700 0% /dev/shm tmpfs 482700 9168 473532 2% /run tmpfs 482700 0 482700 0% /sys/fs/cgroup tmpfs 482700 0 482700 0% /tmp tmpfs 482700 228 482472 1% /var/volatile /dev/mmcblk1p3 6053568 2044652 3681700 36% /run/media/mmcblk1p3 tmpfs 96540 4 96536 1% /run/user/0 /dev/mmcblk1p1 3086432 29268 3057164 1% /run/media/mmcblk1p1 /dev/mmcblk0p1 126015 29242 96774 24% /run/media/mmcblk0p1 /dev/mmcblk0p2 2409840 1719892 547820 76% /run/media/mmcblk0p2 root@IMX8-Tronlong:~/memtester-4.6.0#
复制代码bs/count 512MB | 指令 | 结果 | |
读 | 16k/262144 | time dd if=/run/media/mmcblk0p2/test.bin of=/dev/null bs=16k count=262144 | 124 MB/s |
4k/ | |||
1k/ | |||
写 | 16k/262144 | time dd if=/dev/zero of=/run/media/mmcblk0p2/test.bin bs=16k count=262144 | 13.1 MB/s |
4k/ | |||
1k/ |
root@IMX8-Tronlong:~/memtester-4.6.0# time dd if=/dev/zero of=/run/media/mmcblk0p2/test.bin bs=16k count=262144 dd: error writing '/run/media/mmcblk0p2/test.bin': No space left on device 42098+0 records in 42097+0 records out 689725440 bytes (690 MB, 658 MiB) copied, 52.7914 s, 13.1 MB/s real 0m52.816s user 0m0.017s sys 0m3.086s root@IMX8-Tronlong:~/memtester-4.6.0# time dd if=/run/media/mmcblk0p2/test.bin of=/dev/null bs=16k count=262144 42097+1 records in 42097+1 records out 689725440 bytes (690 MB, 658 MiB) copied, 5.57505 s, 124 MB/s real 0m5.579s user 0m0.027s sys 0m1.194s root@IMX8-Tronlong:~/memtester-4.6.0#
复制代码参考emmc
使用/run/media/mmcblk1p1 测试
测试bs/count 512MB | 指令 | 结果 | |
读 | 16k/262144 | time dd if=/run/media/mmcblk1p1/test.bin of=/dev/null bs=16k count=262144 | 71.2 MB/s |
4k/ | |||
1k/ | |||
写 | 16k/262144 | time dd if=/dev/zero of=/run/media/mmcblk1p1/test.bin bs=16k count=262144 | 20.1 MB/s |
4k/ | |||
1k/ |
root@IMX8-Tronlong:~/memtester-4.6.0# time dd if=/dev/zero of=/run/media/mmcblk1p1/test.bin bs=16k count=262144 dd: error writing '/run/media/mmcblk1p1/test.bin': No space left on device 191073+0 records in 191072+0 records out 3130535936 bytes (3.1 GB, 2.9 GiB) copied, 155.828 s, 20.1 MB/s real 2m36.103s user 0m0.189s sys 0m16.717s root@IMX8-Tronlong:~/memtester-4.6.0# time dd if=/run/media/mmcblk1p1/test.bin of=/dev/null bs=16k count=262144 191072+1 records in 191072+1 records out 3130535936 bytes (3.1 GB, 2.9 GiB) copied, 43.9761 s, 71.2 MB/s real 0m43.999s user 0m0.214s sys 0m11.490s
复制代码下载代码
wget https://www.sqlite.org/2022/sqlite-amalgamation-3400000.zip
复制代码解压
unzip sqlite-amalgamation-3400000.zip
复制代码进入源码路径
cd sqlite-amalgamation-3400000/
复制代码 gcc sqlite3.c shell.c -o sqlite -lpthread -ldl
复制代码root@IMX8-Tronlong:~/sqlite-amalgamation-3400000# ./sqliteSQLite version 3.40.0 2022-11-16 12:10:08 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite>
复制代码git clone https://github.com/sqlite/sqlite.git
复制代码test/speedtest1.c程序估计典型工作负载下SQLite的性能。
将speedtest1.c复制到上面的源码路径,一起编译
gcc sqlite3.c speedtest1.c -o speedtest1 -lpthread -ldl
复制代码编译
gcc sqlite3.c speedtest1.c -o speedtest1 -lpthread -ldl
复制代码运行测试
root@IMX8-Tronlong:~/sqlite-amalgamation-3400000# ./speedtest1-- Speedtest1 for SQLite 3.40.0 2022-11-16 12:10:08 89c459e766ea7e9165d0beeb1247 100 - 50000 INSERTs into table with no index...................... 0.462s 110 - 50000 ordered INSERTS with one index/PK..................... 0.770s 120 - 50000 unordered INSERTS with one index/PK................... 0.907s 130 - 25 SELECTS, numeric BETWEEN, unindexed...................... 1.003s 140 - 10 SELECTS, LIKE, unindexed................................. 1.084s 142 - 10 SELECTS w/ORDER BY, unindexed............................ 1.715s 145 - 10 SELECTS w/ORDER BY and LIMIT, unindexed.................. 0.907s 150 - CREATE INDEX five times..................................... 1.279s 160 - 10000 SELECTS, numeric BETWEEN, indexed..................... 0.741s 161 - 10000 SELECTS, numeric BETWEEN, PK.......................... 0.750s 170 - 10000 SELECTS, text BETWEEN, indexed........................ 1.632s 180 - 50000 INSERTS with three indexes............................ 1.443s 190 - DELETE and REFILL one table................................. 1.469s 200 - VACUUM...................................................... 1.175s 210 - ALTER TABLE ADD COLUMN, and query........................... 0.048s 230 - 10000 UPDATES, numeric BETWEEN, indexed..................... 0.721s 240 - 50000 UPDATES of individual rows............................ 1.189s 250 - One big UPDATE of the whole 50000-row table................. 0.164s 260 - Query added column after filling............................ 0.050s 270 - 10000 DELETEs, numeric BETWEEN, indexed..................... 2.142s 280 - 50000 DELETEs of individual rows............................ 1.492s 290 - Refill two 50000-row tables using REPLACE................... 3.321s 300 - Refill a 50000-row table using (b&1)==(a&1)................. 1.426s 310 - 10000 four-ways joins....................................... 3.378s 320 - subquery in result set...................................... 6.256s 400 - 70000 REPLACE ops on an IPK................................. 0.968s 410 - 70000 SELECTS on an IPK..................................... 0.645s 500 - 70000 REPLACE on TEXT PK.................................... 1.051s 510 - 70000 SELECTS on a TEXT PK.................................. 0.996s 520 - 70000 SELECT DISTINCT....................................... 0.618s 980 - PRAGMA integrity_check...................................... 3.308s 990 - ANALYZE..................................................... 0.418s TOTAL....................................................... 43.528s root@IMX8-Tronlong:~/sqlite-amalgamation-3400000#
复制代码以上从CPU:各个基准测试,存储:RAM,EMMC,SD卡,IO:以太网,数据库:sqlite3等方方面面360°无死角进行了性能测试,可以看出板子的性能是非常强的,几乎可以媲美PC桌面级处理器。