    2015-10-17 20:44
    Get ready -- embedded vision is getting here faster than you might expect. All that is required is the economics to make it affordable and the computing power to make it feasible. It looks like all of the pieces are now in play, and we can expect to see an explosion of embedded vision-enabled systems appearing on the scene as early as 2016.   Can you remember when the first cell phone equipped with a digital camera appeared on the market circa 2000? At that time, many people expressed the opinion that they had no use for such a beast -- all they wanted to do with their cell phone was to make phone calls. Today, the same naysayers can’t imagine life without camera-equipped smartphones.   Originally, smartphones -- and, later, tablet computers -- had only one camera on the back to take pictures of things other than the user. It wasn't long before we moved to have one camera on the back and another on the front, where the one on the front is used to take "selfies" and for video chatting.   Now, some smartphones and tablets have three cameras -- one on the back to take pictures of other things, and two on the front to augment traditional capabilities with stereoscopic processing for things like gesture recognition.   Other systems keep one camera on the front, but have two on the back. Why? Well, suppose you are in a furniture store, for example. You could take a picture of a piece of furniture using your smartphone, which -- using the stereoscopic capabilities provided by its dual cameras -- could then automatically determine the size of the furniture. Later, when you return home, you could use a special app to see how that object would look (and fit) in your home.   Trust me. It won’t be long before even a humble smartphone boasts at least four cameras -- two in the back and two in the front -- that it uses to perform tasks like face detection (and recognition in the future), people detection, gesture detection, object detection (and recognition in the future), motion detection, and... the list goes on.     The thing is that it's hard enough to do this sort of thing with one camera. The amount of computationally-intensive processing required to perform even relatively rudimentary machine vision tasks staggers the imagination, so implementing things like stereoscopic gesture detection and object detection with four-plus cameras boggles the mind.   The trick is to offload the main application processor and to process the video streams in real-time using a specialized vision processor. All of this explains why Cadence Design Systems has just announced the Tensilica Vision P5 digital signal processor (DSP). According to the press release: This imaging and vision DSP core offers up to 13X performance boost, with an average of 5X less energy usage on vision tasks compared to the previous generation IVP-EP imaging and video DSP. The Tensilica Vision P5 DSP is built from the ground up for applications requiring ultra-high memory and operation parallelism to support complex vision processing at high resolution and high frame rates. As such, it is ideal for off-loading vision and imaging functions from the main CPU to increase throughput and reduce power. End-user applications that can benefit from the DSP’s capabilities include image and video enhancement, stereo and 3D imaging, depth map processing, robotic vision, face detection and authentication, augmented reality, object tracking, object avoidance and advanced noise reduction. The Tensilica Vision P5 DSP core includes a significantly expanded and optimized Instruction Set Architecture (ISA) targeting mobile, automotive advanced driver assistance systems (or ADAS, which includes pedestrian detection, traffic sign recognition, lane tracking, adaptive cruise control, and accident avoidance) and Internet of Things (IoT) vision systems. The advances in the Tensilica Vision P5 DSP further improve the ease of software development and porting, with comprehensive support for integer, fixed-point and floating-point data types and an advanced toolchain with a proven, auto-vectorizing C compiler. The software environment also features complete support of standard OpenCV and OpenVX libraries for fast, high-level migration of existing imaging/vision applications with over 800 library functions.   I think we are poised to experience some very interesting times. What do you think about all of this? In the meantime, click here for more information on the Tensilica Vision P5 DSP core.
    2013-5-8 15:17
    整理2011年3月和8月的两篇日记,形成此文,随意分享,说得不对,多多包涵。    多core多线程,矢量处理器内核,这些似乎是目前SDR的通信用处理器的一条殊途同归之路。   放眼看去,ceva,Tensilca,cognovo以及Sandbridge的号称用于4G的SDR的处理器,都有这样的基本特征。(由于涉及知识产权问题,此处不好将他们的系统架构图,内核架构图贴出来了。且以上公司大多被收购了,品牌不知道还能否保留住。但咱们还是尊重他们的知识产权,不在此透露太多技术细节。)   当然这些都是可以看到。还有那些不能轻易看到细节的,比如说高通的。   其实这样的一个结构的趋势,主要还是基于对4G的基础系统的分析,即对OFDM系统的基础处理流程的分析。OFDM系统本身就是一个可以有很高并行度的处理的系统,才把SDR的内核的处理器架构引向了这个方向。     公司 系统架构 tool comment cognovo HW: MCE (ARM,sequencer,dual VSP,turbo,system RAM,HARQ RAM,RFIF) SW: SDM OS, PHY kernel library kernel SDK, system SDK 128MAC per cycle each VSP Tensilica HW: multi Connx DPUs (BBE16,SSP16,BSP3,Turbo16,PIF) custimized ISA and user defined interface, HW easy integration SW: kernel library TIE language, XPRES compiler, processor and software developer toolkit 16*3(BBE16)=48MAC per cycle; CEVA HW: XC321 single core SW: LTE software kit optimized c compiler, IDE 32MAC per cycle Sandbridge HW:SB3500(3 SBX node,ARM9,HAB bridge,pheripherals,digRF)   48MAC per cycle                                   终究还是有了多年的通信系统信号处理的工程背景,和在芯片设计公司沾染芯片架构设计的一些浅薄认知。感觉到这个SDR的处理器的设计,指令集的设计,算法的编写的曲折与趣味。   先来看看都是些什么人在做这些事情。   最终芯片的内核必然还是那些个做内核设计的人来完成的,肯定都是一些绝顶聪明之人。   而通信系统的分析必然还需要那些个做通信信号处理的人来完成,也必不是等闲之辈。   而这两个领域不能完全说风马牛不相及,但毕竟还是隔行如隔山,水深水浅,只有行内人才知道。理解对方领域的东西,互相取挑剔,去磨合,去思考,再回来思考自己的行内的内容,进一步循环再循环,最后演进出咱们看到每一代的SDR的处理器。   而真的能够做到尽善尽美,需要付出多少的代价。就比如说TI这样的大公司,多少各个方面的专家和大师呢?而TI的DSP芯片的份额却还是在日渐萎缩的。而自称是多线程处理器架构的鼻祖的SB3500的这个处理器。其中FFT的运算,居然没有考虑每一级运算之间的scale。估计他们的通信专家们都是和做内核,做指令集以及做算法库的人交流也并不很畅通吧。也许根本没有通信信号处理的专家,而只是自认为已经充分掌握通信系统需求的一帮人自娱自乐罢了。   DSP这样的有着特殊处理器结构的内核的东西,确实是跟着系统分析走的,有着很强烈的系统适用性的特征。而不同的人对系统的理解都不一样,导致的各个DSP的细节设计上出入很大。于是要想编译器能自动生成很高效的代码,在我看来也真的是很搞笑的。所以要想在不同的DSP之间做到代码的一个很好的可移植性,其实也是相当难的。诸如说intrinsic这样的东西,已经很大程度上表达了编译器的大师们尽量减少用户使用时候,平台移植工作量的问题了。但是还是有太多的细节要求DSP软件工程师永远要纠结在每个DSP的细节里面。不过这何尝不是DSP设计者与DSP软件工程师间交流的一种方式呢?   说到intrinsic,就再多废话几句。   我要说intrinsic是个骗子,compiler guys肯定会跳起来,把我骂个狗血淋头。呵呵。   intrinsic就是长得比较好看一点,对于DSP的程序来说,整个看起来还是遵循着C的规则。而真正的一颗DSP,例如有强大的vector engine的DSP处理器,期待就用标准的C代码,compiler能自动变成并行的指令,这只是一个传说。所以,DSP工程师必须学会用intrinsic,其实就是把汇编用看起来跟C一个规则的外表包起来的东东。要真正写好DSP的处理算法,不了解DSP的内核有多少种多少个register是不可能;不掌握DSP的指令集是不可能的;不掌握每个intrinsic也是不可能的。期待着compiler帮助你做着所有的事情,只是天方夜谭。   intrinsic也有他的好处。就是他毕竟将汇编指令封装了一层,所以对于指令集的升级维护等工作会比较方便一点。但是对于指令集升级,真正的工作源于它为什么要升级---哈,对了,肯定是DSP内核发生了改变,例如从8MAC升级到了16MAC甚至64MAC甚至128MAC;而遇到这样的case,你原来的算法的代码肯定期待提高并行度,所以你还是必须要用新的intrinsic。哈,所以问题还是又转回来了。所有DSP工程师们就是在这里挣扎了,也在这里享受乐趣。不再写汇编代码,写intrinsic,用汇编的思路和汇编的手法。  
    2012-10-15 20:11
    The recent news that their licensees have shipped 2 billion IP cores is obviously a reason for rejoicing at Tensilica's world-wide command center. So, if you have had occasion to stroll past Tensilica's corporate headquarters and RD center in Santa Clara, California, recently, you may have wondered at the "For Lease" sign planted firmly in front of the building.   What? How can this be? What is going on? In these days of doom and gloom, it's common to see this sort of thing and assume a worse-case scenario. But fear not my braves, because there is nothing to fear but fear itself (as my dear old dad used to tell me). A little bird tells me that things are going so swimmingly well at Tensilica that they've been hiring furiously, with the result that the current building is bursting at the seams (I hear dire tales of people working out of conference rooms and crammed into closets). Thus, on 20 October 2012 (just a few days' time as I pen these words), we will be seeing a mass exodus from Tensilica's current building as the little scamps all move down the road into a brand-spanking new facility.   If only I had the time, on moving day I would take a lawn chair and some cold drinks and ensconce myself on the lawn outside the new building and cheer them on their way. Sad to relate, however, I am up to my armpits in alligators fighting fires without a paddle (I never metaphor I didn't like), so I will just have to wish them all the best from the comfort of the pleasure dome (my office).  
    2009-2-10 18:09
    在 2 月 16-19 日巴塞罗那举行的世界移动大会上,Tensilica将展示其业界领先的音频、视频以及应用于无线移动设备和基站系统的下一代基带 DSP 内核产品。重要系统及半导体厂商将展示基于 Tensilica 技术的产品,包括基于 Tenslica 可配置处理器的 4G /LTE, PicoCell 及 FemtoCell, WiFi, 移动数字音频,移动数字电视以及基带通信 SOC 等诸多设计。 演示产品 I :基带 DSP Tensilica 公司 Xtensa® 可定制数据平面处理器广泛应用于客户基带 DSP 产品中。数据平面处理器单元( DPUs )面积小、灵活,可快速定制提供最佳的速度、功耗和性能,故而为下一代高速无线广播的理想建构模块。 Tensilica 数据平面处理器单元( DPUs )可广泛应用于如下领域:通用 DSP 速度太慢、基于 RTL 模块的定制逻辑不能提供足够的灵活度以及扩展验证需求、设计风险增加。 Tensilica 基带 DSP 客户之一 Ibiquity Digital ,将演示其做为手机配件形式的低功耗、便携式地面 HD Radio 接收器。 Ibquity 应用 Tensilica 处理器实现了在其基带 DSP 和音频 DSP 设计。 演示产品 II: 应用于编写设备的领先音频 DSP Tensilica 经市场验证的 HiFi 音频 DSP 已经被应用在大量手机及移动消费电子设备设计中。基于其最低功耗、可授权音频 DSP 以及最完整的优化音频应用软件库, HiFi 音频 DSP 被世界前十大半导体厂商中的五家,广泛使用在移动及数字电视产品中。 本届巴塞罗那世界移动大会上, Tensilica 将演示由其合作伙伴 SRS Labs 和 AM3D 提供的基于 HiFi 音频 DSP 的声音增强后处理软件。 演示产品 III: 便携产品的标清视频 Tensilica 公司 388VDO DSP 是经验证的多格式标清视频 DSP ,在多款便携设备中得以应用。其具有全面的软件可编程性,灵活地支持多个视频标准。 Tensilica 公司 XtensaDSP 为移动电话相机图像的预处理以及 LCD 显示器和移动电话投影图像增强的视频后处理量身定制。在最新一代智能电话和应用软件商店的出现上,我们看到为设备进行的视频编解码的固件升级变得越来越常见。完整的视频软件可编程性提供了手持设备制造商新的增值方式,以进一步提高客户满意度。 在本届巴塞罗那世界移动大会上, Tensilica 将演示一款量产的视频卡,其采用 388VDO 进行可编程多格式标清视频设计。  
    2009-2-1 14:18
    Carbon Design Systems 公司是 自动创建、验证和实现系统级仿真模型的领先供应商,目前,Tensilica与Carbon公司 将 Tensilica 处理器模型集成到 Carbon 公司 SoC Designer 平台上。 Tensilica 处理器已经被完全集成到了 SoC Designer 平台上,能够帮助用户执行精确的架构分析和芯片前的固件开发。