热度 22
2016-3-21 17:36
1248 次阅读|
0 个评论
By Toradex 秦海 在新项目需要选择一个全新的基于ARM嵌入式处理器平台的时候,难免会需要对新平台的性能进行评估,这时候有几种思路可以进行参考,一是ARM芯片厂家发布的性能指标,不过大多数是基于理论情况;二是购买同样平台的开发板移植应用进行实测,效果最直接但是可能需要花费较大精力和时间;那么这个时候也可以考虑使用针对性的benchmark软件在目标平台上面进行性能测试,算是一个折衷的方法,其结果有一定参考意义但由于影响benchmark软件结果的因素不仅仅是硬件本身,还有不同的BSP以及软件设定等,因此在这些设定一致性越大的前提下其结果的参考意义才越大,否则很可能结果不是预期想要的甚至南辕北辙. 基于上面思路,本文统一采用 Toradex 工业产品等级 ARM计算机模块以及其官方发布的最新版 Linux BSP V2.5Beta3最为测试基准平台,同时将对结果影响比较大的CPU主频和显示输出分辨率尽可能保持一致来进行测试,测试样本分别是基于 NVIDIA Tegra2 的Colibri T20 512M, 基于 NXP i.MX6DL 的Colibri i.MX6DL 512M以及基于 NXP Vybrid 的Colibri VF61 256M;其中前两个均为双核Cortex-A9 ARM核心,第三个为Cortex-A5和M4异构双核核心,不过这里只测试A5. 1). 本文涉及的硬件平台测试项目及工具如下 a). 硬件平台 上述三种接口兼容的Colibri ARM 计算机模块以及一块 Colibri Eva Board b). 测试项目及对应工具 - CPU测试: nbench - Memory测试: stream - Storage测试: dd, hdparm - Ethernet测试: iperf - CPU压力测试: stress - GPU压力测试: glmark2 注:所涉及工具除glmark2均已经在BSP中预装. 2). 测试进程以及结果 a). 预设 参考这里为两个A9平台关闭DVFS(dynamic voltage and frequency switching), Colibri T20 CPU主频设定为1GHz, Colibri i.MX6DL CPU主频设定为800MHz, Colibri VF61不支持DVFS则无需设定; 显示分辨率所有平台均统一设置为默认值640x480. b). CPU 测试 进入/usr/bin运行下面命令 # nbench - Colibri T20结果如下 =============LINUX DATA BELOW============= CPU : Dual L2 Cache : OS : Linux 3.1.10-V2.5b3+gc8ead50 C compiler : arm-angstrom-linux-gnueabi-gcc libc : static MEMORY INDEX : 5.042 INTEGER INDEX : 5.245 FLOATING-POINT INDEX: 6.401 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 -------------------------------------------------------------------------- - Colibri i.MX6DL 结果如下 ==============LINUX DATA BELOW========= CPU : Dual ARMv7 Processor rev 10 (v7l) L2 Cache : OS : Linux 3.14.28-V2.5b3+g0632def C compiler : arm-angstrom-linux-gnueabi-gcc libc : static MEMORY INDEX : 4.028 INTEGER INDEX : 4.177 FLOATING-POINT INDEX: 5.137 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 ---------------------------------------------------------------------------- Colibri VF61 结果如下 ==============LINUX DATA BELOW========= CPU : ARMv7 Processor rev 1 (v7l) L2 Cache : OS : Linux 4.1.15-v2.5b3+ge6d111c C compiler : arm-angstrom-linux-gnueabi-gcc libc : static MEMORY INDEX : 1.896 INTEGER INDEX : 2.337 FLOATING-POINT INDEX: 2.139 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 --------------------------------------------------------------------- nbench结果为单个CPU性能,可以看出T20和i.MX6同为A9核心基本处于统一水平,T20因为主频略高要有一定优势,而VF61因为是A5核心同时主频低则相对要差一些.另外系统还内置了另一个CPU测试工具lmbench,这里就不详述了. c). Memory 测试 运行下面命令 # stream - Colibri T20结果如下 ================================== STREAM copy latency: 33.38 nanoseconds STREAM copy bandwidth: 479.33 MB/sec STREAM scale latency: 35.58 nanoseconds STREAM scale bandwidth: 449.65 MB/sec STREAM add latency: 41.73 nanoseconds STREAM add bandwidth: 575.10 MB/sec STREAM triad latency: 42.90 nanoseconds STREAM triad bandwidth: 559.44 MB/sec --------------------------------------------------------- - Colibri i.MX6DL 结果如下 ================================= STREAM copy latency: 18.33 nanoseconds STREAM copy bandwidth: 873.08 MB/sec STREAM scale latency: 23.45 nanoseconds STREAM scale bandwidth: 682.30 MB/sec STREAM add latency: 26.90 nanoseconds STREAM add bandwidth: 892.26 MB/sec STREAM triad latency: 25.58 nanoseconds STREAM triad bandwidth: 938.16 MB/sec ------------------------------------------------------ - Colibri VF61 结果如下 ================================= STREAM copy latency: 30.53 nanoseconds STREAM copy bandwidth: 524.09 MB/sec STREAM scale latency: 30.78 nanoseconds STREAM scale bandwidth: 519.82 MB/sec STREAM add latency: 134.66 nanoseconds STREAM add bandwidth: 178.23 MB/sec STREAM triad latency: 149.24 nanoseconds STREAM triad bandwidth: 160.81 MB/sec ----------------------------------------------------------- d). Storage 测试 ./ 由于T20和VF61直接使用了Nand Flash,无法使用hdparm测试,所以我们统一采用我们采用dd来测试模块自带flash存储. 运行下面命令 sync;time -p bash -c "(dd if=/dev/zero bs=1024 count=100000 of=/test.file;sync)" //测试写速度 echo 3 /proc/sys/vm/drop_caches ;time dd if=/test.file of=/dev/null bs=1024 //测试读速度 - Colibri T20结果如下 读取测试,约为14.7MB/sec ======================= 100000+0 records in 100000+0 records out real 0m6.795s user 0m0.030s sys 0m1.830s ----------------------------------------- 写入测试,约为9MB/sec ======================== 100000+0 records in 100000+0 records out real 11.08 user 0.01 sys 2.19 ----------------------------------------- - Colibri i.MX6DL结果如下 读取测试,约为43.5MB/sec ======================== 100000+0 records in 100000+0 records out real 0m2.306s user 0m0.020s sys 0m0.680s -------------------------------------------- 写入测试,约为10MB/sec ========================= 100000+0 records in 100000+0 records out real 10.07 user 0.09 sys 3.64 ------------------------------------------- - Colibri VF61 结果如下 读取测试,约为24MB/sec ======================== sh (407): drop_caches: 3 100000+0 records in 100000+0 records out real 0m4.161s user 0m0.100s sys 0m3.180s ------------------------------------------ 写入测试,约为12.8MB/sec ======================== 100000+0 records in 100000+0 records out real 7.78 user 0.13 sys 3.85 ----------------------------------- ./ 使用hdparm测试外部8G SD卡读取速度 运行下面命令 hdparm -t /dev/mmcblk1p1 - Colibri T20结果如下 ==================== /dev/mmcblk0p1: Timing buffered disk reads: 52 MB in 3.02 seconds = 17.22 MB/sec ------------------------------------ - Colibri i.MX6DL 结果如下 ==================== /dev/mmcblk1p1: Timing buffered disk reads: 56 MB in 3.09 seconds = 18.13 MB/sec ------------------------------------ - Colibri VF61 结果如下 ===================== /dev/mmcblk0p1: Timing buffered disk reads: 54 MB in 3.07 seconds = 17.60 MB/sec ------------------------------------- e). Ethernet 测试 将测试目标板和Linux主机连接到同一局域网,目标板为100M网口. 在Linux主机端运行下面命令(以TCP测试为例,也可以更改参数进行其他测试) iperf -s 在目标板上面运行下面命令 iperf -c $hostip -t 60 -P 8 - Colibri T20结果如下 ========================= 0.0-60.1 sec 676 MBytes 94.3 Mbits/sec ------------------------------------------ - Colibri i.MX6DL 结果如下 ======================= 0.0-60.2 sec 677 MBytes 94.4 Mbits/sec --------------------------------------- - Colibri VF61 结果如下 ======================= 0.0-60.1 sec 674 MBytes 94.2 Mbits/sec --------------------------------------- f). CPU 压力测试 在三个平台上面分别运行下面命令 # stress -c 2 在另一终端中使用”top”命令查看CPU使用状态,两个CPU均已经满负荷 g). GPU 压力测试 首先需要安装glmark2工具,这里通过Toradex openembedded环境编译出了相关的ipk安装包,具体环境配置可以参考这里, 这里以Colibri i.MX6平台为例. 安装流程 opkg install libpng12_1.2.51-r0_armv7at2hf-vfp-neon.ipk opkg install glmark2_2014.03-r0_armv7at2hf-vfp-neon-mx6qdl.ipk 运行 glmark2-es2 ======================================================= glmark2 2014.03 ======================================================= OpenGL Information GL_VENDOR: Vivante Corporation GL_RENDERER: Vivante GC880 GL_VERSION: OpenGL ES 3.0 V5.0.11.p4.25762 ======================================================= use-vbo=false: FPS: 495 FrameTime: 2.020 ms use-vbo=true: FPS: 908 FrameTime: 1.101 ms texture-filter=nearest: FPS: 702 FrameTime: 1.425 ms texture-filter=linear: FPS: 664 FrameTime: 1.506 ms texture-filter=mipmap: FPS: 704 FrameTime: 1.420 ms shading=gouraud: FPS: 485 FrameTime: 2.062 ms shading=blinn-phong-inf: FPS: 248 FrameTime: 4.032 ms shading=phong: FPS: 151 FrameTime: 6.623 ms shading=cel: FPS: 114 FrameTime: 8.772 ms bump-render=high-poly: FPS: 159 FrameTime: 6.289 ms bump-render=normals: FPS: 426 FrameTime: 2.347 ms bump-render=height: FPS: 340 FrameTime: 2.941 ms kernel=0,1,0;1,-4,1;0,1,0;: FPS: 104 FrameTime: 9.615 ms kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 37 FrameTime: 27.027 ms light=false:quads=5:texture=false: FPS: 601 FrameTime: 1.664 ms blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 52 F rameTime: 19.231 ms effect=shadow:windows=4: FPS: 212 FrameTime: 4.717 ms columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5: update-method=map: FPS: 52 FrameTime: 19.231 ms columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5: update-method=subdata: FPS: 51 FrameTime: 19.608 ms columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:u pdate-method=map: FPS: 62 FrameTime: 16.129 ms speed=duration: FPS: 46 FrameTime: 21.739 ms default: FPS: 89 FrameTime: 11.236 ms default: FPS: 4 FrameTime: 250.000 ms default: FPS: 175 FrameTime: 5.714 ms default: FPS: 27 FrameTime: 37.037 ms fragment-steps=0:vertex-steps=0: FPS: 427 FrameTime: 2.342 ms fragment-steps=5:vertex-steps=0: FPS: 93 FrameTime: 10.753 ms fragment-steps=0:vertex-steps=5: FPS: 383 FrameTime: 2.611 ms fragment-complexity=low:fragment-steps=5: FPS: 173 FrameTime: 5.780 m s fragment-complexity=medium:fragment-steps=5: FPS: 51 FrameTime: 19.60 8 ms fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 156 FrameTime: 6.410 ms fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 156 FrameTim e: 6.410 ms fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 82 FrameTime: 12.195 ms ======================================================= glmark2 Score: 255 =======================================================