    第一回 谷歌布阵出奇谋 Transformer横空定乾坤 诗曰: 滚滚代码东逝水,浪花淘尽英雄。 循环卷积转头空,参数依旧在,几度夕阳红。 白发学者芯片上,惯看秋月春风。 一壶咖啡喜相逢,AI多少事,都付笑谈中。 话说天下大势,分久必合,合久必分。自辛格顿老仙以反向传播算法一统江湖,深度学习门派分立。有循环门(RNN)仗着时序秘法盘踞文本疆域,卷积派(CNN)凭空间绝技割据图像河山。两派相争数十年,虽各有胜负,却难破"长程遗忘""梯度消散"之困局。 忽一日,谷歌祭出绝世秘籍《Attention Is All You Need》。但见那: 自注意力阵法玄妙,左手执Q键,右手握K剑,背悬V值旗幡。 千层位置编码如星斗列阵,万道多头机制似八门金锁。 任尔百步之外词句关联,皆在弹指间算得分明。 此阵一出,循环门长老LSTM吐血三升:"吾镇守序列要塞三十年,竟不知全局关联可瞬息贯通!" 卷积派掌门ResNet仰天长叹:"吾等堆叠百层卷积,不及此阵半分通透!" 自此Transformer一统江湖,史称"架构革命"。 第二回 GPT聚义起东山 语言模型夺半壁 且说OpenAI帮主山姆·阿尔特曼,观Transformer威势,暗藏雄图。密令座下三杰: 拉德福德 练得《无监督多任务心经》,集八千亿语料,铸就GPT-3金身,1750亿参数震烁寰宇 布朗 参透《思维链奥义》,以"逐步推演"之法解数学谜题,破译九章算术 苏茨克弗 布《人类反馈强化阵》,令ChatGPT口吐莲花,百万书生竞折腰 一时间,语言模型派气焰滔天。左护法BERT固守编码要塞,右先锋T5执掌翻译雄关。然其根基终在文本世界,遇物理规律便露破绽。曾有门徒问:"水从何来?" GPT答曰:"字里行间自有泉涌。" 众皆哗然。 第三回 世界模型举义旗 物理法则战虚妄 却说那法兰西老帅杨立昆,早观语言模型虚浮之弊。振臂高呼: "诸君只见文字幻象,岂不知真实世界在传感器中?当铸世界模型,直取物理本源!" 特斯拉教主马斯克应声而起,亮出FSD V12法宝。此物: 吞八百万行车影像,吐转向刹车指令 识得雨雪冰霜路况,暗合牛顿力学真章 更兼英伟达黄仁勋献上Omniverse幻境,虚实交融练兵 深度学习三巨头之杰弗里·辛顿抚掌大笑:"吾二十年前所悟反向传播,终在此刻得证大道!" 第四回 小模型暗度陈仓 效率革命惊朝野 正当巨擘鏖战千亿参数时,忽有奇兵突出: 法国隐士Mistral炼成混合专家阵(MoE),四十五亿参数舞动如龙 微软张量骑士团悟得LoRA心法,七成显存顷刻释放 斯坦福闪电门(FlashAttention)破时空桎梏,计算速度三倍飞升 语言模型派护法Hugging Face叹曰:"昔日需八卡并行,今朝一卡可驭,此乃天道轮回!" 第五回 论文如星照前路 群雄逐鹿问鼎途 且看当今武林图谱: 秘籍 创派宗师 镇山绝学 《Attention Is All You Need》 谷歌八骑士 自注意力乾坤阵 《Scaling Laws》 卡普兰 算力幂律推演术 《Chain-of-Thought》 谷歌推理堂 思维链九转还魂法 《PaLM-E》 谷歌机械阁 具身智能人机合体诀 更有后起之秀DeepMind祭出AlphaTensor,直指数学本源;MIT悟得物理推理网,单帧画像测重力。江湖风云再起,未知鹿死谁手。 尾声 这一场大模型争霸,早惊动九天玄女。但见云端显现十六字谶语: 文本幻境终有尽 物理求真路未央 效率为王谁能料 人机共生是沧桑 众豪杰闻言,或若有所思,或怅然若失。正是:莫道参数遮望眼,智能本在尘世中。欲知后事如何,且待量子计算破空来!
    In the late 70s, there was a TV programme called WKRP in Cincinnati . The station manager was the son of the station owner. He mismanaged the station and could not be fired, because, after all, "What mother would fire her own son?" I worked for such a station as a contract engineer. It was smaller, and the manager was younger and less competent than the guy in charge at WKRP. The station transmitter was located in a riverbed, which provided an excellent ground and no structures within nearly a thousand feet of the tower. The building was made of concrete and contained a Gates BC-1T one-kilowatt transmitter, along with some audio processing and monitoring equipment. The single tower was next to the building, and the matching network was just inside the building. Because of the concern of flooding, there was a large sump pump and a plank forming a tall door jam at the bottom of the only door. Just as long as the power did not fail, the pump could easily remove any water that seeped through the floor or concrete walls. One day, I got a call from the owner's son that the station was off the air. I rushed out to the transmitter. When I arrived, I discovered that what had been an open field only days ago was now a large lake with a tower in the middle. I asked him what happened. "I went out to see if the transmitter was OK, and when I opened the door, the transmitter turned off," he said. He had some waist-high boots for me. I yanked them on, and we went out to the transmitter shack. It was scary walking through two feet of strong river current to the transmitter several hundred feet from the shore. By this time, the water level was below the plank, and we went into the building. The sump pump had removed most of the water. I opened the transmitter and made an assessment. Every transformer was damaged except the modulation transformer, because it was well sealed. The transmitter was dead. Every minute that a station is off the air means lost revenue, and this particular station was so poorly managed that it was losing money even when it was on the air. I had to work quickly. I decided that I needed to replace the main power transformer (which was 220-6,000V), the intermediate power (IPA) transformer (which was 220-1,000V), and the modulation choke. I contacted a friend who lived nearby and asked if he had any transformers lying around. He said he had a few television power transformers. I said I'd take them. He said that there was a big transformer lying around at an FM station some 40 miles away, but it weighed a ton. I was sure that his statement was hyperbole, but I did not want to take any chances. I got every man who worked at the station into my little Toyota, and we went to the distant station. By the time we arrived, it was dark. The distant FM station had a graveyard of old transmitter parts in the basement. I dug through the parts until I found a big transformer. The trip to the distant station was not bad, but after we loaded the transformer into my trunk, the Toyota did not handle too well. When we got back to the station, the river had acquired a layer of ice. We all worked to get the transformer from my car to the transmitter shack and placed it next to the transmitter. Everybody, except me, went home. I watched them cross the semi-frozen river in the darkness, and then they disappeared at the shore of this strange new lake. I stared into the darkness and heard the eerie cracking of the ice as the water receded beneath it. I suddenly realised that I was alone. I was all alone. I felt like the last living cell in a dying corpse. I closed the door to preserve the heat. When I turned around, I saw directly in front of me a dead transmitter. I wondered if it was really possible to take old discarded parts to reanimate a dead transmitter. But I knew that I had until sunrise to find out. I wired the two-television transformer in series to get about 800V centre tapped, more or less. I ran some long wires from where the power transformer was originally mounted to the temporary transformer. The next problem was the modulation choke. The modulator used a push-pull amplifier consisting of two 833A power triodes. The balance prevented saturation of the transformer. Running the RF amplifier plate current through the transformer would have saturated the core and reduced the fidelity, so the designers bypassed the RF plate current through a choke. Since the choke was unusable, I ran the RF plate current through the secondary of the modulation transformer. I decided not to run the transmitter at full power until I could replace the modulation choke. After many hours of work, I stood back and looked at the transmitter. Would it come back to life? Or would it explode? With great trepidation, I reached for the plate switch. I wondered if a giant spark would jump out of the transmitter and zap me. With all the courage that I could muster, I pressed the plate switch. I looked at the meters, and they had proper readings. I screamed, "It's alive! It's aliiive!" I knew that the villagers would have their little pitchforks ready to eat bacon and eggs with their morning news and music as the sun rose over the receding river. I had not disappointed them. Before going home, I brought the damaged power transformer and the modulation choke to a local motor repair shop. Then I ordered a replacement IPA transformer. Finally, I went home and got some sleep. Within a week, the transmitter was completely repaired and back to full power.   This story was submitted by Frank Karkota for Frankenstein's Fix, a design contest hosted by EE Times (US). Frank Karkota started work in broadcasting in the late 1960s as an engineer and subsequently worked as a contract/consultant broadcast engineer. From 1968 to 1970, he worked with a team that maintained a tactical troposcatter system in Vietnam. He later operated a small company, ComPol Inc., that manufactured SCA receivers.
    众所周知,视觉系统对于理解和推理视觉场景的组成特性至关重要。这个领域的挑战在于对象之间的复杂关系、位置、歧义、以及现实环境中的变化等。作为人类,我们可以很轻松地借助各种模态,包括但不仅限于视觉、语言、声音等来理解和感知这个世界。现如今,随着Transformer等关键技术的提出,以往看似独立的各个方向也逐渐紧密地联结到一起,组成了“多模态”的概念。多功能通过引入灵活的提示引擎,包括点、框、涂鸦(scribbles)、掩模、文本和另一幅图像的相关区域,实现多功能性;可组合通过学习联合视觉-语义空间,为视觉和文本提示组合实时查询,实现组合性,如图1所示;可交互通过结合可学习的记忆提示进行交互,实现通过掩模引导的交叉注意力保留对话历史信息;语义感知通过使用文本编码器对文本查询和掩模标签进行编码,实现面向开放词汇分割的语义感知。超大规模视觉通用感知模型由超大规模图像、文本主干网络以及多任务兼容解码网络组成,它基于海量的图像和文本数据构成的大规模数据集进行预训练,用于处理多个不同的图像、图像-文本任务。此外,借助知识迁移技术能够实现业务侧小模型部署。超大规模视觉通用感知模型面临的挑战:(1)网络参数量庞大,通常超十亿参数,训练稳定性、收敛性、过拟合等问题相较于小网络挑战大很多。(2)原始数据集包含数十亿异质低质量图片与海量文本,多步训练以利用异质的多模态多任务数据,流程复杂,存在灾难性遗忘,难以定位精度等问题。(3)实验成本高,通常需要上千块GPU并行训练数周,需要研究者有敏锐的分析能力和扎实的知识基础。(4)工程挑战多,海量数据的吞吐,大型GPU集群上的并行算法,超大参数量模型的内存管理。提示工程大多数视觉数据集由图像和相应文本标签组成,为了利用视觉语言模型处理视觉数据集,一些工作已经利用了基于模版的提示工程,text_descriptions=[f"Thisisaphotoofa{label}"forlabelincifar100.classes]  text_tokens=clip.tokenize(text_descriptions).cuda()除了此类大型视觉语言基础模型外,一些研究工作也致力于开发可以通过视觉输入提示的大型基础模型。例如,最近META推出的SAM能够执行与类别无关的分割,给定图像和视觉提示(如框、点或蒙版),指定要在图像中分割的内容。这样的模型可以轻松适应特定的下游任务,如医学图像分割、视频对象分割、机器人技术和遥感等从模型训练、模型分发、模型商业化,美图体系化地同创作者和开发者共建模型生态:(1)模型训练:提供二次训练能力,并持续不断地为创作者提供服务,包括培训、社区和模型创作大赛。(2)模型分发:创作者和开发者共建的模型可以在美图的产品内进行分发,在分发过程中持续优化模型。(3)模型商业化:行业客户可通过MiracleVision的API和SDK进行商业使用,创作者和开发者通过商业合作获得经济收益。通用视觉-语言学习的基础模型UNITER:结合了生成(例如掩码语言建模和掩码区域建模)和对比(例如图像文本匹配和单词区域对齐)目标的方法,适用于异构的视觉-语言任务。Pixel2Seqv2:将四个核心视觉任务统一为像素到序列的接口,使用编码器-解码器架构进行训练。Vision-Language:使用像BART或T5等预训练的编码器-解码器语言模型来学习不同的计算机视觉任务。模型整体结构上,抛弃了CNN,将BERT原版的Transformer开箱即用地迁移到分类任务上面,在使用大规模训练集的进行训练时,取得了极好的效果。同时,在大规模数据集上预训练好的模型,在迁移到中等数据集或小数据集的分类任务上以后,也能取得比CNN更优的性能。模型整体结构如下图所示,完全使用原始BERT的Transformer结构,主要是对图片转换成类似token的处理,原文引入了一个patch的概念,首先把图像划分为一个个的patch,然后将patch映射成一个embedding,即图中的linearprojection层,将输入转换为类似BERT的输入结构,然后加上positionembedding,这里的position是1D的,最后加上一个learnableclassificationtoken放在序列的前面,classification由MLP完成。这里我们用RAM提取了图像的语义标签,再通过将标签输入到Grounding-DINO中进行开放世界检测,最后再通过将检测作为SAM的提示分割一切。目前视觉基础大模型可以粗略的归为三类:textuallypromptedmodels,e.g.,contrastive,generative,hybrid,andconversational;visuallypromptedmodels,e.g.,SAM,SegGPT;heterogeneousmodalities-basedmodels,e.g.,ImageBind,Valley.CoCa通过将所有标签简单地视为文本,对web-scalealt-text和annotatedimages进行了从头开始端到端的预训练,无缝地统一了表示学习的自然语言监督。因此,CoCa在广泛的下游任务上实现了最先进的性能,零样本传输或最小的任务特定适应,跨越视觉识别(ImageNet,Kinetics-400/600/700,Moments-in-Time)、跨模式检索(MSCOCO、Flickr30K、MSR-VTT)、多模式理解(VQA、SNLI-VE、NLVR2)和图像字幕(MSCOCO、NoCaps)。在ImageNet分类中,CoCa获得了86.3%的zero-shottop-1准确率,frozenencoderandfinetuneclassifier是90.6%,finetuneencoder可以到91.0%。截止目前国内外已经发布了许多包括NLP,CV和多模态在内的大规模模型,但是这些模型在应用落地上还是有待进一步探究的,目前应用落地较好的有华为的盘古,在电网和金融圈都有应用;智源的悟道系列在诗词图文上都有广泛应用,可以帮助学生看图写作,根据文字生成插图等;百度的文心也发布了在金融方面的应用。但截止目前为止大模型在实际中的应用还不是很理想,大模型发展的初衷是使用一个预训练好的大模型代替一堆小作坊似的根据不同任务训练的小模型,通过模型蒸馏知识迁移等技术在小模型上使用少量数据集达到超过原来小模型性能的目标。CV大模型在应用上的一个难点是与实际应用相结合,目前社会中用的较多的视觉相关的深度学习模型主要包括物体检测,人脸识别以及缺陷检测(部分)相比NLP模型在实际中的使用少很多,因此将CV模型与实际生产相结合发现更多的应用场景很关键。另外一个CV大模型应用的难点就是如何快速高效的使用蒸馏和知识迁移技术提升下游任务的性能,这两点难题的解决在CV大模型的实际应用中都刻不容缓。总结起来,将大模型应用于更高分辨率的下游视觉任务具有以下好处:提高感知能力、改善定位精度、提升语义理解、改善细节保留和边缘清晰度、增加鲁棒性和泛化能力,以及推动研究进展。这些好处使得大模型在处理高分辨率图像时能够获得更准确、更细致和更真实的结果。随着深度学习和计算资源的不断发展,我们可以期待更先进的大模型和相关技术的出现,进一步推动计算机视觉在高分辨率图像任务中的应用和突破
    上传者: 开心就很好了
    自动驾驶是高安全型应用,需要高性能和高可靠的深度学习模型,VisionTransformer是理想的选摔。现在主流的自动驾驶感知算法基本都使用了VisionTransformer相关技术,比如分割、2D/3D检测,以及最近大火的大模型(如SAM),VisionTransformer在自动驾驶领域的落地方面遍地开花。5一方面,在自动驾驶或图像处理相关算法岗位的面试题中,VisionTransformer是必考题,需要对其理论知识有深入理解,并且在项目中真实的使用过相关技术。Transformer出自于Google于2017年发表的论文《Attentionisallyouneed》,最开始是用于机器翻译,并且取得了非常好的效果。但是自提出以来,Transformer不仅仅在NLP领域大放异彩,并且在CV、RS等领域也取得了非常不错的表现。尤其是2020年,绝对称得上是Transformer的元年,比如在CV领域,基于Transformer的模型横扫各大榜单,完爆基于CNN的模型。为什么Transformer模型表现如此优异?它的原理是什么?它成功的关键又包含哪些?本文将简要地回答一下这些问题。我们知道Transformer模型最初是用于机器翻译的,机器翻译应用的输入是某种语言的一个句子,输出是另外一种语言的句子。vari*int=nilfmt.Println("i.size:",unsafe.Sizeof(i))//8vari8*int8=nilfmt.Println("i8.size:",unsafe.Sizeof(i8))//8vars*string=nilfmt.Println("s.size:",unsafe.Sizeof(s))//8varps*struct{}=nilfmt.Println("ps.size:",unsafe.Sizeof(ps))//8varsi[]int=nilvarsi1[]int=nilfmt.Println("si.size:",unsafe.Sizeof(si))//24variiinterface{}=nilfmt.Println("ii.size:",unsafe.Sizeof(ii))//16我们以生成我,爱,机器,学习,翻译成<bos>,i,love,machine,learning,<eos>这个例子做生成过程来解释。训练:把“我/爱/机器/学习”embedding后输入到encoder里去,最后一层的encoder最终输出的outputs[10,512](假设我们采用的embedding长度为512,而且batchsize=1),此outputs乘以新的参数矩阵,可以作为decoder里每一层用到的K和V;将<bos>作为decoder的初始输入,将decoder的最大概率输出词向量A1和‘i’做crossentropy(交叉熵)计算error。将<bos>,“i”作为decoder的输入,将decoder的最大概率输出词A2和‘love’做crossentropy计算error。将<bos>,“i”,“love”作为decoder的输入,将decoder的最大概率输出词A3和’machine’做crossentropy计算error。将<bos>,“i”,"love",“machine”作为decoder的输入,将decoder最大概率输出词A4和‘learning’做crossentropy计算error。将<bos>,“i”,"love",“machine”,“learning”作为decoder的输入,将decoder最大概率输出词A5和终止符做crossentropy计算error。那么并行的时候是怎么做的呢,我们会有一个mask矩阵在这叫seqmask,因为他起到的作用是在decoder编码我们的targetseq的时候对每一个词的生成遮盖它之后的词的信息。funcmain(){s:=[]string{"a","b","c"}fmt.Println("s:origin",s)changes1(s)fmt.Println("s:f1",s)changes2(s)fmt.Println("s:f2",s)changes3(s)fmt.Println("s:f3",s)}funcchanges1(s[]string){vartmp=[]string{"x","y","z"}s=tmp}funcchanges2(s[]string){//item只是一个副本,不能改变s中元素的值fori,item:=ranges{item="d"fmt.Printf("item=%s;s[%d]=%s",item,i,s[i])}}funcchanges3(s[]string){fori:=ranges{s[i]="d"}}首先我们需要为每个输入向量(也就是词向量)创建3个向量,分别叫做Query、Key、Value。那么如何创建呢?我们可以对输入词向量分别乘上3个矩阵来得到Q、K、V向量,这3个矩阵的参数在训练的过程是可以训练的。注意Q、K、V向量的维度是一样的,但是它们的维度可以比输入词向量小一点,比如设置成64,其实这步也不是必要的,这样设置主要是为了与后面的Mulit-head注意力机制保持一致(当使用8头注意力时,单头所处理的词向量维度为512/8=64,此时Q、K、V向量与输入词向量就一致了)。我们假设输入序列为英文的"ThinkingMachines"想要深度理解Attention机制,就需要了解一下它产生的背景、在哪类问题下产生,以及最初是为了解决什么问题而产生。首先回顾一下机器翻译领域的模型演进历史:机器翻译是从RNN开始跨入神经网络机器翻译时代的,几个比较重要的阶段分别是:SimpleRNN,ContextualizeRNN,ContextualizedRNNwithattention,Transformer(2017),下面来一一介绍。「SimpleRNN」:这个encoder-decoder模型结构中,encoder将整个源端序列(不论长度)压缩成一个向量(encoderoutput),源端信息和decoder之间唯一的联系只是:encoderoutput会作为decoder的initialstates的输入。这样带来一个显而易见的问题就是,随着decoder长度的增加,encoderoutput的信息会衰减。funcmain(){varc=make(chanint)fmt.Printf("c.pointer=%p\n",c)//c.pointer=0xc000022180gofunc(){c<-1addChannel(c)close(c)}()foritem:=rangec{//item:1//item:2fmt.Println("item:",item)}}funcaddChannel(donechanint){done<-2fmt.Printf("done.pointer=%p\n",done)//done.pointer=0xc000022180}在测试模型的时候,Test:decoder没有label,采用自回归一个词一个词的输出,要翻译的中文正常从encoder并行输入(和训练的时候一样)得到每个单词的embedding,然后decoder第一次先输入bos再此表中的id,得到翻译的第一个单词,然后自回归,如此循环直到预测达到eos停止标记typevisitstruct{a1 unsafe.Pointera2 unsafe.PointertypType}funcdeepValueEqual(v1,v2Value,visitedmap[visit]bool)bool{if!v1.IsValid()||!v2.IsValid(){returnv1.IsValid()==v2.IsValid()}ifv1.Type()!=v2.Type(){returnfalse}//Wewanttoavoidputtingmoreinthevisitedmapthanweneedto.//Foranypossiblereferencecyclethatmightbeencountered,//hard(v1,v2)needstoreturntrueforatleastoneofthetypesinthecycle,//andit'ssafeandvalidtogetValue'sinternalpointer.hard:=func(v1,v2Value)bool{switchv1.Kind(){casePointer:ifv1.typ.ptrdata==0{//not-in-heappointerscan'tbecyclic.//Atleast,allofourcurrentusesofruntime/internal/sys.NotInHeap//havethatproperty.Theruntimeonesaren'tcyclic(andwedon'tuse//DeepEqualonthemanyway),andthecgo-generatedonesare//allemptystructs.returnfalse}fallthroughcaseMap,Slice,Interface://Nilpointerscannotbecyclic.Avoidputtingtheminthevisitedmap.return!v1.IsNil()&&!v2.IsNil()}returnfalse}ifhard(v1,v2){//ForaPointerorMapvalue,weneedtocheckflagIndir,//whichwedobycallingthepointermethod.//ForSliceorInterface,flagIndirisalwaysset,//andusingv.ptrsuffices.ptrval:=func(vValue)unsafe.Pointer{switchv.Kind(){casePointer,Map:returnv.pointer()default:returnv.ptr}}addr1:=ptrval(v1)addr2:=ptrval(v2)ifuintptr(addr1)>uintptr(addr2){//Canonicalizeordertoreducenumberofentriesinvisited.//Assumesnon-movinggarbagecollector.addr1,addr2=addr2,addr1}//Shortcircuitifreferencesarealreadyseen.typ:=v1.Type()v:=visit{addr1,addr2,typ}ifvisited[v]{returntrue}//Rememberforlater.visited[v]=true}switchv1.Kind(){caseArray:fori:=0;i<v1.Len();i++{if!deepValueEqual(v1.Index(i),v2.Index(i),visited){returnfalse}}returntruecaseSlice:ifv1.IsNil()!=v2.IsNil(){returnfalse}ifv1.Len()!=v2.Len(){returnfalse}ifv1.UnsafePointer()==v2.UnsafePointer(){returntrue}//Specialcasefor[]byte,whichiscommon.ifv1.Type().Elem().Kind()==Uint8{returnbytealg.Equal(v1.Bytes(),v2.Bytes())}fori:=0;i<v1.Len();i++{if!deepValueEqual(v1.Index(i),v2.Index(i),visited){returnfalse}}returntruecaseInterface:ifv1.IsNil()||v2.IsNil(){returnv1.IsNil()==v2.IsNil()}returndeepValueEqual(v1.Elem(),v2.Elem(),visited)casePointer:ifv1.UnsafePointer()==v2.UnsafePointer(){returntrue}returndeepValueEqual(v1.Elem(),v2.Elem(),visited)caseStruct:fori,n:=0,v1.NumField();i<n;i++{if!deepValueEqual(v1.Field(i),v2.Field(i),visited){returnfalse}}returntruecaseMap:ifv1.IsNil()!=v2.IsNil(){returnfalse}ifv1.Len()!=v2.Len(){returnfalse}ifv1.UnsafePointer()==v2.UnsafePointer(){returntrue}for_,k:=rangev1.MapKeys(){val1:=v1.MapIndex(k)val2:=v2.MapIndex(k)if!val1.IsValid()||!val2.IsValid()||!deepValueEqual(val1,val2,visited){returnfalse}}returntruecaseFunc:ifv1.IsNil()&&v2.IsNil(){returntrue}//Can'tdobetterthanthis:returnfalsecaseInt,Int8,Int16,Int32,Int64:returnv1.Int()==v2.Int()caseUint,Uint8,Uint16,Uint32,Uint64,Uintptr:returnv1.Uint()==v2.Uint()caseString:returnv1.String()==v2.String()caseBool:returnv1.Bool()==v2.Bool()caseFloat32,Float64:returnv1.Float()==v2.Float()caseComplex64,Complex128:returnv1.Complex()==v2.Complex()default://NormalequalitysufficesreturnvalueInterface(v1,false)==valueInterface(v2,false)}}这便是encoder的整体计算流程图了,Transformer模型中堆叠了多个这样的encoder,无非就是输出连接输入罢了,常规操作。最后再附上一个Transformer的代码实现,读者有兴趣可以跟着自己复现一下Transformer模型的代码。  packagemain  import(    "log"    "sync"  )  funcinit(){    log.SetFlags(log.Lshortfile)  }  funcmain(){    lock:=sync.Mutex{}    //Go1.18新增,是一种非阻塞模式的取锁操作。当调用TryLock()时,    //该函数仅简单地返回true或者false,代表是否加锁成功    //在某些情况下,如果我们希望在获取锁失败时,并不想停止执行,    //而是可以进入其他的逻辑就可以使用TryLock()    log.Println("TryLock:",lock.TryLock())    //已经通过TryLock()加锁,不能再次加锁    lock.Lock()  }
