tag 标签: cortex

相关博文
  • 热度 21
    2015-10-2 18:07
    1868 次阅读|
    0 个评论
    I’m a huge fan of the Cortex-M family of MCUs. A wide variety of vendors have brought 32 bit processing to the low end of the market, sometimes for astonishing sub-dollar prices. In some cases these parts even have hardware floating point and SIMD instructions.   But there is a dark secret to some of these devices. They all sport flash, and programs can run from that memory. But flash is generally slow. Consider ST’s STM32F4, which can run at 168 MHz – that’s 6 ns/instruction! With some parts (as we’ll see, not the ST device) increasing clock rates means adding wait states when running from flash, so there may be little performance improvement gained by cranking up the clock.   The STM32F4 user manual shows the problem:   One partial solution that’s often employed is cache. The most-recently-used instructions are stored in a zero-wait state cache so are immediately available when needed. But that kind of memory is real-estate hungry, so is always tiny. Figure on the order of a few to 32KB or so. Most real programs are constantly banging into cache misses, each of which stalls the CPU as it injects wait states while pausing for the slow flash to catch up. Cache also makes execution time completely unpredictable, which is a major concern for many real-time systems.   ST takes a unique approach. First, each flash read is 128 bits wide - a single fetch brings in 16 bytes of memory. Given that instructions are 16 or 32 bits, the average read brings in around 6 instructions, all at the cost of one set of wait states (from the table above). Absent this approach, figure on around six times as many wait states. That’s a pretty startling speedup.   But wait – there’s more. The MCU does have a cache, but not your father’s version. It stores 64 of those 128 bit "words," netting an effective cache size of 1KB, which is pretty tiny. Unlike most other caches, though, it doesn’t store the most recently fetched instructions. Instead, only the 16 bytes at the destination of a recent branch are saved. The first branch to a particular address is slowed by the wait states, but subsequent branches to the same address aren’t (unless the cache fills, in which case a least-recently-used algorithm kicks in). With 64 entries, a lot of branch destination code is stored. Wait states will occur infrequently. (It’s not clear to me if interrupts count, but one would think they are. If so, that’s could be a huge improvement in ISR performance for frequently-invoked interrupts).   What about the instructions after those 16 bytes? They run at zero wait states, because the MCU prefetches another 128 bits while executing the cached instructions.   This is the architecture:   "ART" is ST’s acronym for Adaptive Real-Time Memory Accelerator.   The efficacy of this very nice approach will be correlated with the density of branches. More branches means the cache fills faster and CPU stalls will increase.   Here’s a snippet of the compiled code from a program I wrote for a Cortex M3 some time ago. It’s a little strange-looking as it was designed to test the speed of the MCU in doing certain kinds of math:   This is doing a bit of floating point math, and you can see a lot of branches to the floating-point library. But there are only a few distinct calls; all of those to __aeabi_fmul will run very fast, at zero wait states, after the first call to it. The cache will fill relatively slowly. And, the STM32F4 has hardware-floating point so the compiler will insert instructions in-line rather than calls to the library.   Obviously, different code will exhibit differing branch densities. But this is an interesting way to overcome slow flash.   What about your processor? How does it deal with flash’s inherent slowness?
  • 热度 24
    2011-11-21 21:04
    1617 次阅读|
    0 个评论
    本文来自KITARM公司Kopad平板电脑系列A9的用户体验,原文衔接地址——   http://blog.jianghu.taobao.com/u/Njc1NTI5NjE=/blog/blog_detail.htm?aid=48888999 前几天买了一个安卓系统平板,试用几天,效果还真是不错啊,比我想象的要好很多。其实不管硬件有多强劲,还是要看使用者的直观感受,如果基本的使用都不流畅,硬件再好也没用。所以在这里我以自己的使用体验为主发表一下自己的看法,不是专业的评测。需要专业评测的可以去百度一下 “瑞萨评测”,很多的,而且也很详细。仁者见仁,智者见智,不喜欢本文的也不要拍砖,请绕行吧。 一、简单介绍。 1、老板人很好,专业工程师出身。交流中从他身上感到的是文人的气质,不是商人的那种唯利是图。而且很真诚,就冲他的实话实说我等了他近一个月才买到这款平板。当然他也给了我很大的优惠,在此谢谢了。 2.外观:机器的模具和淘宝上的大多数模具是差不多的,厚度和一般手机一样。(你要是非得拿超薄型手机比,我没办法。) 亮点是钢琴烤漆防划痕外壳,黑色中显现出的金属颗粒很有质感。做工也很精细,硬朗,缝隙很小,使劲按也没有松松垮垮的感觉,绝对拿的出手。这是让我最满意的地方。(比我的HTC的手机要好。)就是容易留下手印,我在考虑是否买一些淘宝上的碳纤维贴纸保护一下。好像有点多余,呵呵。 全部配件 接口部分,做工很好。 3.硬件 硬件和海纳M9是一样的。(其实连外壳也差不多,区别在于按键。不过这个机器比海纳M9要便宜一点。) Cpu是瑞萨a9,双核的。gpu是PowerVR SGX530的,不是店铺的英文介绍中提到的Mali-400 MP,这一点老板应该注意改过来。 内存是DDR2 512M。4G nand硬盘。7寸800*480的电容显示屏,16:9的比率,我更喜欢4:3的,可惜不好找。电容屏是两点触控的。这些还是和海纳是一样的。   二、使用感受。(这个是重点) 1、开机时间。从按下开机键开始到出现界面,是21秒,这是我用秒表测得。安卓优化大师软件记录是18.826秒。应该说是很迅速。       2、屏幕效果。使用的是有机玻璃的,不容易划伤,因此老板曾建议我不用贴什么膜。我自己用着也是感觉用不着贴,太灵敏了,手轻轻挨上就有反应,不会出现大力划的情况。缺点就是阳光强烈的情况下,反光严重,把亮度开到最高你也会清晰的看到自己的脸。当然也强烈建议你别再阳光下看任何平板和手机,对眼睛简直就是折磨。 看图时屏幕细腻清晰,边缘也没有模糊的感觉,分辨率已经够用。正视的感觉很好,上视和右视角度较好,左视和下视的角度不太理想。不过谁也不会喜欢总是侧着看吧。 3、放视频。放高清视频也没有压力,测试了rmvb格式的《冰河世纪》,清晰流畅,没有任何问题。放rmvb格式的《变形金刚3》时,同事都觉得很震撼。可惜,不会截视频的图。  4、上网很流畅。打开网页迅速,没有顿屏的现象。安装好flasher11.0就可以在线看土豆和优酷了,但是只能标清,高清有些卡,不是网速的问题。但是用pps看高清都一点也不卡,应该跟软件有关系。 5、游戏。愤怒的小鸟,水果忍者,植物大战僵尸,刺客信条等均可流畅运行。我不喜欢玩游戏浪费时间,所以只测试了几个常用游戏。要了解更多的游戏情况,可参看淘宝上有关海纳M9的介绍,或者是其他有关瑞萨的测评文章。 刺客信条,就玩的这,过不去了。呵呵。 6、gps搜星能力强。我的htc 6850在阳台上只能搜到3、4颗星,同样的位置平板可以搜到6颗到7颗。就是搜星的速度没有我的手机快,大约需要等50秒左右。而我的手机是秒定的,当然是在更新了星图的情况下。也不知道安卓是否也有这种预先下载星图用以辅助gps定位的软件。   7、wifi信号强劲。同一楼层,距离在6米,隔两道墙还有3格信号,连接迅速,速度不怎么受影响。再远就时断时续,没有使用价值了。我把无线路由放在二楼邻居家,在楼下同一房间信号是满的,下载速度没有衰减,512M ADLS速度可以达到100K。同一位置我电脑上的usb无线网卡,就只有3个信号,但下载速度是一样的。再到隔壁房间与路由器的直线距离大约3米,隔了楼顶与一道墙,信号还剩3格,速度好像没有受到影响。距离再远一些信号就时有时无了。   8、电池续航。给老板预约的是5000mAh的,得以测试了。没关wifi和gps的情况下,不停地放电影,中午吃午饭停了不到一小时,放了7小时20分钟后显示还有8%的电量,但这时不知什么原因花屏了,摸了一下机器不热。没敢继续放,已经很给力了。 周末带儿子去参加我同学婚礼,去时使用导航1小时多10分钟左右,中间儿子玩游戏(主要是僵尸和切西瓜)1个多小时,回来用导航1个小时,最后到家显示还剩51%的电量。应该说导航电量绝对够用,况且还有车载的充电器,不用担心跑半道没电的情况。   9、贴心设计:电源指示灯的设计比较人性化。开机使用时是蓝色的,充电时是红色的,充电满后,灯是自动熄灭的。这个设计很有用,只要看一眼灯的颜色就知道机器是什么状态。尤其是充满电自动灭灯,很实用,电充没充满一目了然。   10、小建议。对于屏幕左边的四个功能键,在黑暗中使用不是很方便,只能知道大概的位置,常常按不准确,还会出现误操作。要是能设计一个像诺基亚n75,n85的感应按键那样,有个电容式的感应器,手只要轻轻触摸,按键背景灯会亮起来就好了。这样在黑暗的地方使用也方便了,这只是一个小的建议。     三、测评。随便找了几个软件,测了一下,这不是我的重点,仅供参考吧。 说实话,我一向对跑分不感兴趣,因为不同版本的软件跑出来的没有可比性,同版本跑出来的也未必可信。毕竟计算软件是最不靠谱的东西,同一个机器“优化”就比不“优化”强。只能是一个参考。一开始用的软件不是最新版,排名比较靠前。这是用目前最新版的软件跑的,你可以比较一下,它排名前后的机子的价格,就知道性价比了。当然,如果说你更看重品牌,不在乎性价比,那就不一样了。 安卓跑分的测试结果。1280分。 quadrant Advanced 1.16测试。1084分。   超级兔子2.3.1测试。3110分。 结果还是和网上的瑞萨类平板差不多,喜欢的话可以看看,专业网站做的评测。  
  • 热度 21
    2011-11-3 17:49
    1985 次阅读|
    0 个评论
    According to Peter Greenhalgh, an ARM engineer who served as the lead designer on the A7, the key to ARM Holdings plc's "big-little" dual core scheme is the ability to switch as fast and seamlessly as possible between processing on the Cortex-A15 MPCore and the new Cortex-A7 core.     "We've designed them so that you can transition really seamlessly and quickly and got the software to do it in a seamless way," Greenhalgh said in an interview in San Francisco. Last month, ARM (Cambridge, England)  rolled out the power efficient A7 core and also described the big-little scheme, where A7 is implemented alongside ARM's high-end A15 as part of a heterogeneous power-driven multicore strategy. Big-little enables a smartphone to rapidly switch from using one core or the other, depending on the task load, to ensure optimal power efficiency. The Cortex-A7, on its own, will enable sub-$100 entry level smartphones, according to ARM. The biggest challenge associated with the design project, according to Greenhalgh, was to devise a scheme that would enable implementing big-little to run existing software. But, he said, smartphones have for years had various operating points, and it was possible for the ARM team to leverage the existing power infrastructure framework to determine when the handset should switch from one core to the other. "You've already got all of the software on the OS which is able to save operating points, all you do is that when you get to the lowest level on the A15, invoke the switching software," Greenhalgh said. "The key is having both processors with identical instruction sets so the same things will run on them and you can switch back and forth very quickly." According to Mike Inglis, executive vice president and general manager of ARM's processor division, a typical smartphone can leverage the A7 core for most applications, such as phone calls, data access and casual gaming, but switch to the A15 core for more demanding applications such as HD gaming and rich web services. ARM says the A7 provides up to 70 percent power savings compared to the A15 on common workloads. But, according to Greenhalgh, that actual power savings is highly dependent on the application being used. "If you are just running games, it won't be better," he said. But in a typical workflow, involving gaming, listening to audio flies, running GPS navigation and other applications, the 70 percent savings is achievable, he said. But, Greenhalgh added, 70 percent power savings in the processor won't translate into 70 percent improvement in battery life. A processor typically comprises only about 30 percent of a smartphone's total power budget. Greenhalgh said big-little is similar in some respects to the approach that ARM licensee Nvidia Corp. has undertaken with its next-generation Tegra processor, codenamed Kal-El. He described Nvidia's work as "a good first step," but said that big-little takes the approach further. "We find it very interesting to see what Nvidia has done," Greenhalgh said. "It's fantastic, and it validates big-little. You can use all of those techniques and you can lay the microarchitecture on their as well." Greenhalgh said the many licensees in the ARM ecosystem bolster the innovation on ARM processors by building upon them. "Everyone has their own take," Greenhalgh said. "You still allow partners to differentiate in power and performance." Greenhalgh, who has worked at ARM for 10 years, said the A7 project was similar to other design projects he has worked on in the past. He declined to describe the size of the team involved in the project. The A and ARM's big-little concept have generated a good deal of excitement. But Greenhalgh has already moved on to his next project, which he described as exciting but could not talk much about.     Dylan McGrath EE Times
  • 热度 30
    2011-7-11 12:05
    3724 次阅读|
    1 个评论
    上周末打回来一款TI OMAP3 DM3730 CORTEX A8 1Ghz的板子,系统已经稳定运行。这款板的特点是超小,超薄,功能有机的集合在一块板上,主板为4层板或者2层板即可实现需要的功能。 show一下高清片片 图片来自: http://www.kitarm.com/news/330-kitarm-release-tablet-solution-with-omap3-dm3730-cortex-a8--dsp-dual-core.html
  • 热度 27
    2011-6-25 15:52
    6176 次阅读|
    1 个评论
    Samsung Exynos 4210 ARM CortexA9 dual core SoC EVM board General Description Exynos 4210 is a system-on-a-chip (SoC) based on the 32-bit RISC processor for smartphones, tablet PCs, and Netbook markets. Exynos 4210 provides the best performance features such as dual core CPU, highest memory bandwidth, world's first native triple display, 1080p video decode and encode hardware, 3D graphics hardware, and high-speed interfaces such as SATA and USB. Exynos 4210 uses the CortexA9 dual core, which is 25% DMIPS faster than the CortexA8 core. It provides 6.4GB/s memory bandwidth for heavy traffic operations such as 1080p video en/decoding, 3D graphics display, and native triple display. The application processor supports dynamic virtual address mapping. This feature will help the software engineers to fully utilize the memory resources with ease. Exynos 4210 provides the best 3D graphics performance and native triple display. The native triple display, in particular, supports WSVGA resolution of two main LCD displays and 1080p HDTV display throughout HDMI, simultaneously. This is possible due to the capability of Exynos 4210 to support separate post processing pipelines. Exynos 4210 lowers the Bill of Materials (BOM) by integrating the following IPs: world's first DDR3 interfaces that will prepare bit cross with DDR2; 8 channels of I2C for a variety of sensors; SATA2; the GPS baseband; and a variety of USB derivatives (USB Host 2.0, Device 2.0, and HSIC interfaces with PHY transceivers to be connected with 802.11n, Ethernet, HSPA+, and 4G LTE modem). The application processor also supports industry's first DDR based eMMC 4.4 interfaces to increase the file system's performance. Exynos 4210 is available as FCMSP Package on Package (PoP), which has a 0.45mm ball pitch with LPDDR2 configuration. The MCP will depend upon the customer's requirement. Features ARM CortexA9 dual core subsystem with 64-/128-bit SIMD NEON - 32KB (Instruction)/32KB (Data) L1 Cache and 1MB L2 Cache - 1.2Hz and 1.0GHz Core Frequency: Voltage 1.2V 64-bit Multi-layered bus architecture Internal ROM and RAM for secure booting, security, and general purposes Memory Subsystem: - SRAM/ROM/NOR/NAND Interface with x8 or x16 data bus - OneNAND Interface with x16 data bus - 2-ports 32-bit 800Mbps LPDDR2/DDR2/DDR3 Interfaces 8-bit ITU 601/656 Camera Interface Multi-format Video Hardware Codec: 1080p 30fps (capable of decoding and encoding MPEG-4/H.263/H.264) and 1080p 30fps (capable of decoding MPEG-2/VC1) JPEG Hardware Codec 3D and 2D graphics hardware, supporting OpenGL ES 1.1/2.0, and OpenVG 1.1 LCD single or dual display, supporting 24bpp RGB, MIPI Native triple display, supporting WSVGA LCD dual display and 1080p HDMI, simultaneously Composite TV-out and HDMI 1.3a interfaces GPS baseband integration with GPS RF interface 2-ports (4-lanes and 2-lanes) MIPI DSI and MIPI CSI interfaces 1-channel AC-97, 2-channel PCM, and 3-channel 24-bit I2S audio interface, supporting 5.1 channel audio 1-channel S/PDIF interface support for digital audio 8-channel I2C interface support for PMIC, HDMI, and general-purpose multi-master 3-channel high-speed SPI 4-channel high-speed UART (up to 3Mbps data rate for Bluetooth 2.1 EDR and IrDA 1.0 SIR) USB 2.0 Device 1-channel, supporting FS/HS (12Mbps/480Mbps) with on-chip PHY USB 2.0 Host 1-channel, supporting LS/FS/HS (1.5Mbps/12Mbps/480Mbps) with on-chip PHY USB HSIC 2-channel, supporting (480Mbps) with on-chip PHY Asynchronous Direct Modem Interface with 16KB DPSRAM 4-channel SD/MMC interface, supporting SD 2.0, HS-MMC 4.3, and 1ch HS-MMC 4.4 DDR 4-bit interface muxed with HS-MMC 4.3 SATA AHCI 1-channel, supporting SATA1 (1.5Gbps) and SATA2 (3.0Gbps) with on-chip PHY 32-channel DMA Controller 14x8 keypad support 10-channel 12-bit multiplexed ADCs Configurable GPIOs Real time clock, PLLs, timer with PWM, and watchdog timer Another development board Along with that CPU the Origen board also has a Mali400 GPU and the DDR3 RAM is 1GB capacity. The board will cost devs  and has a wealth of connectivity options. Those connectivity options include interfaces for HDMI, SD cards, WiFi, Bluetooth, stereo sound, LCD, JTAG debug, and a camera. Software for the system will be offered from Linaro.  provide its Linaro Evaluation Builds of Android and Ubuntu directly to devs from its website. The goal of the new Origen board is to help developers speed the time to market for new smartphones, tablets, and connected screens. The Exynos processor and DDR3 RAM are both on a small daughter board that will allow for future upgrades to the Origen board.  
相关资源