tag 标签: performance

相关博文
  • 2023-12-22 10:51
    1 次阅读|
    0 个评论
    QCA9880: DR900VX VS. DR600VX, the performance choice leads the future, 802.11 ac
    In the wave of the digital age, the QCA9880 chip has emerged and become a star in the field of wireless communications. This chip not only integrates advanced technology, but also inspires a series of products based on its innovation, bringing users an unprecedented experience. These two wireless network chips based on the Qualcomm-Atheros QCA9880 chip demonstrate excellent performance and multiple advanced technologies, providing strong support for wireless communication equipment. Here are some of their main features and comparisons: Comparison of two products based on QCA9880 chip: High performance version: DR900VX 2.4GHz max 26dBm & 5GHz max 25dBm output power IEEE 802.11ac compliant & backward compatible with 802.11a/b/g/n 3x3 MIMO Technology, up to 1.3Gbps Mini PCI Express edge connector RoHS compliance ensure a high level protection of human health and the environment from risks that can be posed by chemicals Supports Spatial Multiplexing, Cyclic-Delay Diversity (CDD), Low-Density Parity Check (LDPC) Codes, Maximal Ratio Combining (MRC), Space Time Block Code (STBC) Supports IEEE 802.11d, e, h, i, k, r, v time stamp, and w standards Supports Dynamic Frequency Selection (DFS) FCC,CE and IC Certification Economical version: DR600VX 2.4GHz max 24dBm & 5GHz max 23dBm output power IEEE 802.11accompliant & backward compatible with 802.11a/b/g/n 2x2 MIMO Technology, up to 867Mbps Mini PCI Express edge connector Support the Frequency 4920MHz~4980MHz RoHS compliance ensure a high level protection of human health and the environment from risks that can be posed by chemicals Supports Spatial Multiplexing, Cyclic-Delay Diversity (CDD), Low-Density Parity Check (LDPC) Codes, Maximal Ratio Combining (MRC), Space Time Block Code (STBC) Supports IEEE 802.11d, e, h, i, k, r, v time stamp, and w standards Supports Dynamic Frequency Selection (DFS) The two products are based on the same chip, but have differences in output power, MIMO technology, transmission rate, etc., making them suitable for different application scenarios and meeting users' needs for high performance and economy. The QCA9880 chip brings new development opportunities to wireless communications. Its excellent performance not only promotes product innovation, but also brings a better user experience to users. As technology continues to evolve, products based on the QCN9880 chip will continue to lead the industry trend and create a better, smarter future.
  • 热度 22
    2016-3-21 17:36
    1276 次阅读|
    0 个评论
    By Toradex 秦海 在新项目需要选择一个全新的基于ARM嵌入式处理器平台的时候,难免会需要对新平台的性能进行评估,这时候有几种思路可以进行参考,一是ARM芯片厂家发布的性能指标,不过大多数是基于理论情况;二是购买同样平台的开发板移植应用进行实测,效果最直接但是可能需要花费较大精力和时间;那么这个时候也可以考虑使用针对性的benchmark软件在目标平台上面进行性能测试,算是一个折衷的方法,其结果有一定参考意义但由于影响benchmark软件结果的因素不仅仅是硬件本身,还有不同的BSP以及软件设定等,因此在这些设定一致性越大的前提下其结果的参考意义才越大,否则很可能结果不是预期想要的甚至南辕北辙.   基于上面思路,本文统一采用 Toradex 工业产品等级 ARM计算机模块以及其官方发布的最新版 Linux BSP V2.5Beta3最为测试基准平台,同时将对结果影响比较大的CPU主频和显示输出分辨率尽可能保持一致来进行测试,测试样本分别是基于 NVIDIA Tegra2 的Colibri T20 512M, 基于 NXP i.MX6DL  的Colibri i.MX6DL 512M以及基于 NXP Vybrid 的Colibri VF61 256M;其中前两个均为双核Cortex-A9 ARM核心,第三个为Cortex-A5和M4异构双核核心,不过这里只测试A5.   1).  本文涉及的硬件平台测试项目及工具如下 a).  硬件平台 上述三种接口兼容的Colibri ARM 计算机模块以及一块 Colibri Eva Board b).  测试项目及对应工具 - CPU测试: nbench - Memory测试: stream - Storage测试: dd, hdparm - Ethernet测试: iperf - CPU压力测试: stress - GPU压力测试: glmark2 注:所涉及工具除glmark2均已经在BSP中预装.   2).  测试进程以及结果 a).  预设 参考这里为两个A9平台关闭DVFS(dynamic voltage and frequency switching), Colibri T20 CPU主频设定为1GHz, Colibri i.MX6DL CPU主频设定为800MHz, Colibri VF61不支持DVFS则无需设定; 显示分辨率所有平台均统一设置为默认值640x480. b). CPU 测试 进入/usr/bin运行下面命令 # nbench   - Colibri T20结果如下 =============LINUX DATA BELOW=============  CPU                 : Dual      L2 Cache            :    OS                  : Linux 3.1.10-V2.5b3+gc8ead50     C compiler          : arm-angstrom-linux-gnueabi-gcc                             libc                : static   MEMORY INDEX        : 5.042      INTEGER INDEX       : 5.245      FLOATING-POINT INDEX: 6.401   Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 --------------------------------------------------------------------------   - Colibri i.MX6DL 结果如下 ==============LINUX DATA BELOW========= CPU                 : Dual ARMv7 Processor rev 10 (v7l)     L2 Cache            :       OS                  : Linux 3.14.28-V2.5b3+g0632def     C compiler          : arm-angstrom-linux-gnueabi-gcc   libc                : static       MEMORY INDEX        : 4.028         INTEGER INDEX       : 4.177    FLOATING-POINT INDEX: 5.137  Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 ----------------------------------------------------------------------------   Colibri VF61 结果如下 ==============LINUX DATA BELOW========= CPU                 : ARMv7 Processor rev 1 (v7l)    L2 Cache            :    OS                  : Linux 4.1.15-v2.5b3+ge6d111c  C compiler          : arm-angstrom-linux-gnueabi-gcc  libc                : static   MEMORY INDEX        : 1.896   INTEGER INDEX       : 2.337  FLOATING-POINT INDEX: 2.139  Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 --------------------------------------------------------------------- nbench结果为单个CPU性能,可以看出T20和i.MX6同为A9核心基本处于统一水平,T20因为主频略高要有一定优势,而VF61因为是A5核心同时主频低则相对要差一些.另外系统还内置了另一个CPU测试工具lmbench,这里就不详述了.   c). Memory 测试 运行下面命令 # stream   - Colibri T20结果如下 ================================== STREAM copy latency: 33.38 nanoseconds    STREAM copy bandwidth: 479.33 MB/sec STREAM scale latency: 35.58 nanoseconds    STREAM scale bandwidth: 449.65 MB/sec   STREAM add latency: 41.73 nanoseconds  STREAM add bandwidth: 575.10 MB/sec  STREAM triad latency: 42.90 nanoseconds STREAM triad bandwidth: 559.44 MB/sec ---------------------------------------------------------   - Colibri i.MX6DL 结果如下 ================================= STREAM copy latency: 18.33 nanoseconds STREAM copy bandwidth: 873.08 MB/sec STREAM scale latency: 23.45 nanoseconds STREAM scale bandwidth: 682.30 MB/sec   STREAM add latency: 26.90 nanoseconds STREAM add bandwidth: 892.26 MB/sec               STREAM triad latency: 25.58 nanoseconds    STREAM triad bandwidth: 938.16 MB/sec  ------------------------------------------------------ - Colibri VF61 结果如下 ================================= STREAM copy latency: 30.53 nanoseconds                                          STREAM copy bandwidth: 524.09 MB/sec    STREAM scale latency: 30.78 nanoseconds    STREAM scale bandwidth: 519.82 MB/sec    STREAM add latency: 134.66 nanoseconds      STREAM add bandwidth: 178.23 MB/sec  STREAM triad latency: 149.24 nanoseconds    STREAM triad bandwidth: 160.81 MB/sec -----------------------------------------------------------   d). Storage 测试 ./ 由于T20和VF61直接使用了Nand Flash,无法使用hdparm测试,所以我们统一采用我们采用dd来测试模块自带flash存储. 运行下面命令 sync;time -p bash -c "(dd if=/dev/zero bs=1024 count=100000 of=/test.file;sync)"   //测试写速度 echo 3 /proc/sys/vm/drop_caches ;time dd if=/test.file of=/dev/null bs=1024              //测试读速度   - Colibri T20结果如下 读取测试,约为14.7MB/sec ======================= 100000+0 records in  100000+0 records out   real    0m6.795s user    0m0.030s sys     0m1.830s ----------------------------------------- 写入测试,约为9MB/sec ======================== 100000+0 records in 100000+0 records out  real 11.08 user 0.01 sys 2.19          -----------------------------------------   - Colibri i.MX6DL结果如下 读取测试,约为43.5MB/sec ======================== 100000+0 records in  100000+0 records out  real    0m2.306s user    0m0.020s    sys     0m0.680s -------------------------------------------- 写入测试,约为10MB/sec ========================= 100000+0 records in 100000+0 records out  real 10.07 user 0.09 sys 3.64 -------------------------------------------   - Colibri VF61 结果如下 读取测试,约为24MB/sec ======================== sh (407): drop_caches: 3  100000+0 records in  100000+0 records out   real    0m4.161s user    0m0.100s  sys     0m3.180s ------------------------------------------ 写入测试,约为12.8MB/sec ======================== 100000+0 records in  100000+0 records out  real 7.78  user 0.13  sys 3.85 -----------------------------------   ./ 使用hdparm测试外部8G SD卡读取速度 运行下面命令 hdparm -t /dev/mmcblk1p1   - Colibri T20结果如下 ==================== /dev/mmcblk0p1:                                                                  Timing buffered disk reads:  52 MB in  3.02 seconds =  17.22 MB/sec ------------------------------------   - Colibri i.MX6DL 结果如下 ==================== /dev/mmcblk1p1:                                                                  Timing buffered disk reads:  56 MB in  3.09 seconds =  18.13 MB/sec ------------------------------------   - Colibri VF61 结果如下 ===================== /dev/mmcblk0p1:                                                                  Timing buffered disk reads:  54 MB in  3.07 seconds =  17.60 MB/sec -------------------------------------   e). Ethernet 测试 将测试目标板和Linux主机连接到同一局域网,目标板为100M网口. 在Linux主机端运行下面命令(以TCP测试为例,也可以更改参数进行其他测试) iperf -s 在目标板上面运行下面命令 iperf -c $hostip -t 60 -P 8   - Colibri T20结果如下 =========================   0.0-60.1 sec   676 MBytes  94.3 Mbits/sec ------------------------------------------   - Colibri i.MX6DL 结果如下 =======================   0.0-60.2 sec   677 MBytes  94.4 Mbits/sec ---------------------------------------   - Colibri VF61 结果如下 =======================   0.0-60.1 sec   674 MBytes  94.2 Mbits/sec ---------------------------------------   f). CPU 压力测试 在三个平台上面分别运行下面命令 # stress -c 2 在另一终端中使用”top”命令查看CPU使用状态,两个CPU均已经满负荷 g). GPU 压力测试 首先需要安装glmark2工具,这里通过Toradex openembedded环境编译出了相关的ipk安装包,具体环境配置可以参考这里, 这里以Colibri i.MX6平台为例. 安装流程 opkg install libpng12_1.2.51-r0_armv7at2hf-vfp-neon.ipk opkg install glmark2_2014.03-r0_armv7at2hf-vfp-neon-mx6qdl.ipk 运行 glmark2-es2 =======================================================                             glmark2 2014.03    =======================================================                             OpenGL Information      GL_VENDOR:     Vivante Corporation       GL_RENDERER:   Vivante GC880      GL_VERSION:    OpenGL ES 3.0 V5.0.11.p4.25762   =======================================================                         use-vbo=false: FPS: 495 FrameTime: 2.020 ms  use-vbo=true: FPS: 908 FrameTime: 1.101 ms  texture-filter=nearest: FPS: 702 FrameTime: 1.425 ms  texture-filter=linear: FPS: 664 FrameTime: 1.506 ms                   texture-filter=mipmap: FPS: 704 FrameTime: 1.420 ms    shading=gouraud: FPS: 485 FrameTime: 2.062 ms     shading=blinn-phong-inf: FPS: 248 FrameTime: 4.032 ms      shading=phong: FPS: 151 FrameTime: 6.623 ms  shading=cel: FPS: 114 FrameTime: 8.772 ms    bump-render=high-poly: FPS: 159 FrameTime: 6.289 ms  bump-render=normals: FPS: 426 FrameTime: 2.347 ms   bump-render=height: FPS: 340 FrameTime: 2.941 ms                         kernel=0,1,0;1,-4,1;0,1,0;: FPS: 104 FrameTime: 9.615 ms             kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 37 FrameTime: 27.027 ms  light=false:quads=5:texture=false: FPS: 601 FrameTime: 1.664 ms        blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 52 F rameTime: 19.231 ms  effect=shadow:windows=4: FPS: 212 FrameTime: 4.717 ms columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5: update-method=map: FPS: 52 FrameTime: 19.231 ms  columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5: update-method=subdata: FPS: 51 FrameTime: 19.608 ms                             columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:u pdate-method=map: FPS: 62 FrameTime: 16.129 ms                                  speed=duration: FPS: 46 FrameTime: 21.739 ms                            default: FPS: 89 FrameTime: 11.236 ms                              default: FPS: 4 FrameTime: 250.000 ms                               default: FPS: 175 FrameTime: 5.714 ms                                default: FPS: 27 FrameTime: 37.037 ms                                fragment-steps=0:vertex-steps=0: FPS: 427 FrameTime: 2.342 ms    fragment-steps=5:vertex-steps=0: FPS: 93 FrameTime: 10.753 ms    fragment-steps=0:vertex-steps=5: FPS: 383 FrameTime: 2.611 ms    fragment-complexity=low:fragment-steps=5: FPS: 173 FrameTime: 5.780 m s fragment-complexity=medium:fragment-steps=5: FPS: 51 FrameTime: 19.60 8 ms fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 156 FrameTime: 6.410 ms fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 156 FrameTim e: 6.410 ms  fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 82 FrameTime:  12.195 ms =======================================================                                                           glmark2 Score: 255  =======================================================
  • 热度 14
    2011-12-30 20:25
    2956 次阅读|
    0 个评论
    I have seen a number of articles mentioning SuperSpeed IP sales and  various performance numbers (only theoretical).  I would like to use  this platform to inform the users the actual speed achieved on USB 3.0 Development board by IP designers at SLS The  core efficiency is dependent on a number of factors like 8b/10b    encoding, packet structure and framing, link level flow control and    protocol overhead. At   5Gbps signalling rate with 8b/10b encoding, the raw throughput is    500MBps.  When link flow control, packet framing, and protocol overhead    are considered, it is realistic for 400MBps or more to be delivered  to   an application. Various tests performed with SLS SuperSpeed USB core on GigaByte A75 Motherboard have indicated the performance ~2.1Gbps  with mass storage interface and ~2.7Gbps with raw interface.   Any other experiences with the speed/performance of SuperSpeed cores are most welcome!
  • 热度 21
    2010-5-7 17:42
    3480 次阅读|
    1 个评论
    Recently, scientists from NCSU have discovered a  technique that improves performance of  applications up to 20% on a multicore system. And the technique does not need any change in existing code. To understand its significance, we should first answer the question “why it is difficult  to parallelize a desktop application?”. If we take a word processing application as an example, it will have a  loop waiting for a key to be pressed,  and when it is pressed, it applies the format/styles to the character, displays the character on screen and returns to the wait loop. It is difficult to convert this logic into pieces of code that can execute in parallel, as the whole code is in a sequential flow and each step will wait for the previous step to complete. Such single threaded applications have little scope of parallelization and have not been able to get benefits of multiple cores. That is why the news from the scientists at the North Carolina State university (NCSU) is exciting. They have found a simple, effective technique that allows existing programs to get benefits of speed improvement on a multicore system, that too without any change in the code. What is their magic? Most of the applications need dynamic memory and they  allocate and free chunks of memory as per their needs. Programs don't care for the return value of the memory free()  function. The NCSU scientists simply moved the free()  function to another core from the main core. This allows free() code to run in parallel to main code. Since applications use malloc() and free() extensively in code, just moving free() code to another core gives improvements of  20% in performance! This “simple drop in replacement” is achieved by linking applications with a new memory management library and it needs no change in the application code. It is a low hanging fruit of performance improvement and it is surprising how it has missed notice so long! Simple techniques like this will help desktop user to get visible, tangible improvements in performance for their existing applications on multicore systems!
  • 热度 15
    2010-1-22 11:12
    2383 次阅读|
    1 个评论
    Multicore processors are processors which have more than one core inside. Today, it is not uncommon to have desktops having two or four cores. This trend is picking up and will only accelerate in coming years. There is a background to the raising popularity of multicore technologies. Even more important is the impact multicore is going to have on the mainstream programming. If we look at the semiconductor industry, Moore's law has held its sway over the last two decades. Though it is still holding good, there has been a perceptible slow down recently. Moore's law says that the number of transistors will double in every 18 months. Though transistor count is doubling, but performance is not keeping in the same pace. Performance kept pace till 2002 due to technologies like pipelining, caching and superscalar designs. After that the gap has started becoming visible as the returns from these technologies began to yield diminishing returns. For example, between 1993 and 1999, CPU speeds increased 10 fold. The first 1GHz CPU was released in 2000. In last 9 years, it has gone up only to 3.3GHz, a growth that is considerably slower than the previous six years. Today, a 5GHz or 6GHz processor looks remote. Power is another factor behind the slowdown. Power consumed is related to frequency and increasing frequency makes a huge drain on the power. The challenges of designing appropriate heat syncs, airflows in servers and desktops become pronounced as frequency increases. This is referred to as "Power wall". Having exploited many optimisation techniques and hit the power wall, semiconductor companies have run out of steam to increase the CPU speed. Instead, they are now packing more cores in the processor, instead of increasing the speed of the processors. This has profound implications for the industry. As long as the CPU speed was on an up curve, software vendors were getting free lunch of improved performance without doing anything. If Windows performance was not good on a desktop, one could go to a higher speed desktop and applications can run faster. Now this free ride has ended. One cannot get a higher speed processor; instead, one will get 2 cores in place of one! What has happened is that the semiconductor companies have passed the onus of improving performance has moved from hardware to software. The big question is whether the software world prepared for it? Unfortunately, software world is not fully prepared for the changes happening and is caught on the wrong foot. It is not going to be easy for the software to get performance improvements so easily as in the past. But, many efforts are going on in the research labs, universities and other places on the nature of the problem and many solutions are being tried out. Start-ups and venture capitalists are focusing on the area and lot of innovation is happening. How software community adjusts to the new realities of multicore and how programming will get reshaped in coming years will be interesting to watch! In coming weeks, I will discuss these software challenges in detail and how software community is trying to address them. I look forward to your valuable comments and feedbacks in the days to come!
相关资源