debugging-如何使用-有什么中文资料面包板社区

相关博文

Basic product, but subtle troubleshooting

热度 23

用户3607565

2015-10-30 21:12

1600 次阅读|

0 个评论

Debugging and troubleshooting are major parts of the engineer's world. I've always regarded them as the most difficult of engineering challenges because it takes a combination of factors: technical expertise, experience and practice, lessons learned from mistakes, ability to think in both big picture and detail modes, willingness to question everything, ability to think outdid the box, a dose of luck, and more. Troubleshooting a once-working product is a humbling, constant reminder that even "simple" can be tricky. Obviously, some troubleshooting challenges are more difficult than others (I define "troubleshooting" as finding out what's wrong with something that has worked well and been in the field for a while, while "debugging" is trying to get a prototype or pilot run unit working to spec). Intermittent problems are often the worst, not only because they are hard to track down, but you are often not sure that you really found the source—or the problem just go away for a while on its own? I was reminded of this lesson when my flashlight started acting strange. It's a quality Maglite unit from Mag Instrument Inc., not some cheap throwaway, and it became frustratingly intermittent. The incandescent bulb would flicker, go off and then back on when jostled or even when just resting on the counter. I figured this would be an easy problem to resolve—after all, it's just a flashlight, and has no software, no active components, nothing except a bulb, alternate-action sealed switch, and two batteries, that's it. Sponsor video, mouseover for sound The nearly invisible grit on the threads of the end cap prevented solid electrical contact between the internal and external threads, and also prevented the end cap from screwing down and seating properly onto the body, resulting in strange intermittent operation of this high-quality flashlight. I checked for corrosion of the contacts (there was none), and even rigged up a separate circuit so I could check the bulb for an internal intermittent by tapping the glass enclosure (it was solid). I tested the sealed on/off switch with a continuity checker and that was fine. My summary: every component was fine, but the assembled product was not. Adding to the confusion was that at one point, I thought the unit was off, but it wasn't. Later, it came back on by itself and drained the batteries, so I had an added problem of now-dead batteries, which added to the confusion while I was collected my data and evidence. Still, I was determined not be defeated by a mere flashlight. Long story short: I looked at the end-cap which screws onto the body, Figure, and which is the current path from the negative side of the battery to the body and then up to the switch. It seemed to screw down solidly but I had to check everything. I cleaned the machined external threads of the end cap and the internal threads of the body, sprinkled some powdered graphite onto them to both enhance conductivity and also "grease" them for smoother torque-down, and felt a nice, solid "thunk" as the bottom cap seated on the body when I threaded it on. (The instructor at the Loctite mini-course I took on threaded fasteners emphasized that dirt on threads not only result in a poor final assembly, it fools you into thinking the nut is on tight because it increases the apparent torque.) That did it. The flashlight’s performance was now steady, no intermittent behavior, and the bulb was also brighter. Problem solved—and the diagnosis was obvious, in retrospect: the screw-on end cap was not making good contact. That's really no surprise, since so many intermittents in electronic products are due to mechanical issues. In fact, the first rules of troubleshooting are to check the power source and all connections, and I should have known better! While there are a lot of case studies on debugging and troubleshooting, there isn't much technical literature about these as a discipline, and it's easy to understand why: the subject is tough to get your hands and mind around. The only resource I know is the excellent book Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems by David Agans. I strongly recommend you take a look at it, even if you are an experienced engineer. (Don't be put off because it is published by the American Management Association—it's a down-in-the-trenches book, not some high-level abstract treatise). Have you ever had a simple product with a vexing problem that took a significant amount of time and mental energy to troubleshoot? Did you feel relieved, foolish, or some other emotion when you finally found and resolved the problem—that is, assuming you did?
When the Internet of Things break

热度 25

用户3519980

2015-3-27 14:56

2518 次阅读|

0 个评论

The "Things" in our world are becoming increasingly complicated, and the failure modes a whole lot more interesting ("interesting" as in the old curse, "may you live in interesting times"). In the world of IoT, what is fixing broken things going to be like? Let's look at where we are, and extrapolate. My "check engine soon" light came on when I was driving home from the (first) visit to the auto shop (an oxygen sensor problem, supposedly fixed). I took it back so they could look at it again and reset it -- it came on again on the way home from the second trip, as I'm writing this, so it goes back for a third visit now. This reminds me of my high-tech, high-efficiency, computerized heating/air-conditioning unit, which has been down for almost a month, with six tech visits so far, interleaved with ordering parts from the manufacturer. It's still not working. On the sixth visit they brought in an expert on this unit, and the manager is coming out with him for visit seven after ordering enough parts to rebuild the unit if necessary, and the manager swears he won't leave until we're happy. OK, we have an extra bedroom, I'm ready. But as makers of complex stuff, how can we all do our part to keep this sort of thing from happening with our own products? Well, education is important, but it has to be the right education. I didn't think any of the techs who worked on my A/C were incompetent, but they were under time pressure to get in and get out, didn't want to install new parts when old ones were fine, and made a couple of basic debug process errors. I'm sure that all these guys know lots more than I ever will about A/C, but some somewhat-understandable (and a few "huh?") mistakes were made that led to extra trips. The first was that the first six visits were by different techs, a scheduling process failure. This was a hard failure, the best kind when identifying a problem. What if the problem was intermittent, or had only shown up when the unit was controlled over the Internet or by the power company's mesh radio network, and the problem involved interaction among networking, security, radios, and software? And maybe even malware? How many trips would that take? The length of time this is taking brings attention to the required practical debugging skills needed to minimize fix-time for low-to-medium-tech electronics-controlled devices. Those skills aren't ubiquitous. And worse, even high-efficiency residential heat pumps are relatively simple, compared to networked devices with multiple processors, multiple radios, a few million (or tens of million) lines of code, and chips the size of your fingernail that are so dense that if you de-cap them they don't even look interesting any more like chips used to. (The Museum of Modern Art in NYC once had an exhibit of blow-ups of complex chips -- they were exquisite modern art, many of them signed by their artists right there on the silicon). So now, let's talk about IoT -- WHO'S GOING TO FIX THIS SMART STUFF WHEN IT BREAKS? Think about what fixing broken IoT devices will be like. The more responsibility you give to electronics, the more annoying a failure can be -- with great power comes great responsibility, as Spiderman says. Downtime can affect lives in annoying and even profound ways. Defect opportunities increase exponentially with product complexity, and while increasing reliability of individual components and good modularity helps offset that, exponentials are hard to beat in the fullness of time. And many of these babies are complex, and connected to other things that are complex. What new skills will be needed to diagnose and fix them? We need to be thinking about *which* fundamentals and *which* debugging skills are needed in a world of IoT, and how to train people on those skills. And the designers of IoT products have to design them with service in mind. If the field-replaceable-unit is cheap, fine, put in a new one -- but what if it's built into your house? Or your factory? Or it's your refrigerator? Or the problem is somewhere else in the network? It's a simple fact that some field guys might not know Ohm's Law or the difference between WPA-PSK and WEP, that there may be malware involved in a problem, that techs can't pull out a protocol analyzer and diagnose a failed handshake on an encrypted link, etc., etc. So they might have little choice but to just start replacing components until the problem goes away, despite the cost and repair time -- when in fact the root cause might be a mis-typed port number or password, or that the router vendor once shipped revs of software with DLNA turned off and THIS guy is running it (or the owner, aware of security problems with DLNA, had turned it off manually). Sometimes the problem gets fixed totally by accident, without ever knowing what it was so you can keep it from happening again. I don't know about you, but I don't like that future much. Come to think of it, I'm living in it now, and it's getting pretty hot in the house. The spread of IoT devices into society will go much slower than the people who sell them would like unless we think forward to the support structure it's going to take to fix these things when they break. Otherwise, the backlash from field problems will slow adoption of new tech for everyone (fear is a great demotivator). How do you quickly fix your radio-controlled, Internet-connected door lock product that has trapped someone in their garage? Or help someone when your Internet-connected smart refrigerator product sometimes forgets to order milk? Think ahead to how your product support and a tech will handle such problems. Remote- and self- diagnosis, mail-in programs, and online self-diagnosis charts to reduce truck-rolls is certainly a requirement, when they work, but they won't always (bad power supply, loose connection, interfering radios next door, ...). And never forget security -- if you make bypassing security easy for techs ("For the super-user password, look on P. 2 of the service manual"), you might make it easy for bad guys. (More venting on that subject another day!) The more complicated things are, the more skilled the diagnostic/repair people have to be, and/or the easier it has to be to service a failure. A big part of the fix is to teach fundamental skills well, so techs have the basics of how things actually work to fall back on to figure stuff out, even if they've never seen it before -- not just "how to fix device x", but "basic things that anything that does what x does MUST do right". And you have to build escalation into the service process -- you can't send Superman to open every hard-to-open jar of pickles because there's only one Superman and there are a lot of pickle jars, and maybe he's got other jobs more pressing. You need to know how and when to escalate, and how to do it fast. So what does the IoT equivalent of teaching basic physics like refrigeration cycles and Ohm's Law to mid-level techs look like? It certainly involves a more-than-cursory understanding of software, security, and networking -- far beyond just knowing about specific implementations (Linux, IP, ...), though that too. What else? And what does a serviceable IoT device look like? Perhaps it exposes test points that can quickly explain why it's not talking (LEDs and test points are cheap, phone calls and truck rolls are expensive). When it can talk, it self-diagnoses and tells the tech where it hurts in language that a tech with basic skills can understand, even without knowing that device. It calls in sick, if it can -- maybe you can even fix it before the failure affects the customer. It provides key information that makes it easy to decide when to escalate to the next level up in the service pyramid. It lets authorized parties diagnose it, ideally remotely, without becoming insecure. What else? Postscript Though we're still offline, the A/C problem was tentatively identified as being caused by the power supply failing in a way that over-voltaged multiple FRUs, and although each part had been replaced with a known-good unit with no resulting change in behavior, the entire set of failed units was never replaced with known-good units *at the same time*. The techs were following common practice (modulo a couple of goofs) in an attempt to isolate and correct a single failure at the lowest cost. If the current diagnosis is correct, instead of seven visits the repair would optimally have taken four visits if the parts weren't stocked locally, or two if they were and the expert made the second visit with parts in hand. From my selfish viewpoint, 1-2 weeks instead of 3-4. Steve Bunch is a CS/EE software architect with experience in operating systems, system architecture, networking, radio, and software security, and was responsible for the creation of the embedded software platform that was used in several hundred million cellphones. He is currently involved in a startup operating in the networking spa ce.
Is continuity really easy to check?

热度 21

用户3607565

2014-4-11 17:50

2062 次阅读|

0 个评论

One of the most basic tests engineers can make when verifying or debugging a circuit is to check for DC continuity. After all, it's important, it's easy to do, and having poor or no continuity will ruin performance of many systems. Fortunately, in most cases continuity is also easy to check. If it's a PC board, an installed short cable, or a long cable on the bench, you can usually check it out with a meter in a few seconds. To make it easier, many DMMs have an audible continuity mode which lets you know if the resistance is below a fixed amount, such as an Ohm, so you can "buzz out" the pathway without having to look at the meter reading. Instead, you can look at your probes and where you are placing them. But there are times when something as simple as continuity is actually hard to assess, due to the physical layout. I'm not talking about a tight PCB, or a multi-layer one with internal ground and power planes, populated with vias that could have hairline cracks. I mean where the two ends of the single wire under test (I'll call it the WUT) are very far apart. I thought about this a few weeks ago, when I caught a part of an episode of This Old House . The show mentioned a meter that lets you check continuity of a wire from one end alone, or so it seemed to me. Great, I thought, tell me more: Maybe there's some sort of transponder you put at the far end and it somehow it relays back to the source (your end) the continuity of the single wire. The one-wire continuity problem was one I had faced in the past, when I was helping a friend fixing up a very old two-story house. We found a single red-insulated wire coming out of a defunct wall-switch box on one floor, and similarly situated wire on the other floor. Were they the ends of same wire? Did they have continuity now? Did they ever have it? Since there were no other wires of known good continuity nearby, we had a challenge. "Continuity" means a path for electrons to flow, but they need a return path to the source. We tried using the grounding screw on the junction box at one end and a similar one at the other end for the return path connections, but the reading indicated "open circuit." We had little faith in this result, however, because ground connections in old buildings are often corroded or disconnected. So the determination problem was still there. In the end, we did the simple, obvious thing: We took a long piece of basic #20 wire, ran it from one floor to the other through the windows and outside, and used it as the return path. That solved the problem, which let us run a valid quick test (and the WUT was OK). I thought this Extech continuity-test unit would solve a long-standing problem, but I misunderstood its function. Out of curiosity, I looked into the unit they cited on This Old House, the Extech CT20 Remote Local Continuity Tester. It's certainly handy, but it doesn't let you check one-wire continuity. I must have misunderstood what they said it did. To use it, you connect your two wires at the far end (one of which must be known to be continuous), then buzz them out from the near end. It even has some diodes in the far-end jumper cable so you can even buzz out a bunch of wires in the cable from the source end, if you attach their multiple-clip jumper. Next time, I'll just have to get out the TDR (time domain reflectometer) to check that one wire—perhaps that will do the trick. (Well, not really an available option at my job site.) As far as I can tell, there is no easy or low-cost way to verify the continuity of a single wire over a distance without a known good return-current path. Are there any basic, low-cost test instruments you'd like to see, that would solve such "basic" and obvious problems?
面向 VM Debugging Awareness 的应用程序界面

热度 24

用户1627830

2011-10-18 15:29

1179 次阅读|

0 个评论

自 2006 年以来，Lauterbach 支持 Java 程序的调试，适用于 Java 虚拟机 J2ME CLDC、J2ME CDC 和 Kaffe。由于虚拟机越来越受到欢迎，因此虚拟机供应商的数目正在快速增涨。目前，并非所有虚拟机都是开源的，为了让虚拟机供应商及其用户能够根据其虚拟机特性，灵活的调整调试功能，Lauterbach 从 2010 年中开始致力于开发一种新的解决方案。以 Android Dalvik 虚拟机在 ARM 核的实现，做为停止模式下开发虚拟机应用程序接口的范例. 两个“调试世界” 对于系统开发者，Android 是一个开源软件栈，包括以下组件（见图 3）： • Linux 内核及其硬件驱动程序。 • Android Runtime 与 Dalvik 虚拟机以及一系列程序库：经典 Java 内核库，Android 特殊库、C/C++ 程序库。 • Java 应用程序及其支持的应用构架。 Android 软件可采用各种语言编写： • Linux 内核、一些程序库与 Dalvik 虚拟机代码可采用C、C++ 或 Assembler 编写。 • 虚拟机应用程序及其支持的应用构架可采用 Java 语言编写。每个代码块都在单独的“调试世界”内测试。调试 C/C++ 程序和汇编程序代码通过使用 JTAG 接口，采用 C/C++ 和汇编器编写的Android 程序可以在目标硬件上以停止模式调试。在停止模式调试时，TRACE32 调试器可直接与 Android 硬件平台的处理器通讯（见图 4）。停止调试模式的特点是：当处理器被停止以进行调试时，整个 Android 系统亦停止运行。停止模式调试具有以下主要优势： • 只需一个有效的 JTAG 接口即可实现调试器与处理器之间的通讯。 • 无需在目标上加载调试服务程序，因此非常适合于测试已发布软件。 • 它允许实时测试，因此能够有效调试仅在实时情况下才出现的问题。目前，停止模式调试暂不支持在 Dalvik VM 等虚拟机上调试 VM 应用程序，因此要实现所有软件层上均能够透明调试仍然需要一段时间。劳特巴赫工程师精心为您准备在线视频讲座，欢迎观看！ http://v.youku.com/v_show/id_XMzExNzcwMTI4.html 您可以登录“劳特巴赫(Lauterbach)中国公司”官方微博 http://weibo.com/lauterbach 官方博客 http://blog.sina.com.cn/lauterbachchina 官方网站 http://www.lauterbach.com/frames.html?country=cn%3fhome_c.html 留言与专家进行互动，为您做免费咨询解答。
面向 VM Debugging Awareness 的应用程序界面

热度 30

用户1627830

2011-10-18 15:21

1365 次阅读|

1 个评论

自 2006 年以来，Lauterbach 支持 Java 程序的调试，适用于 Java 虚拟机 J2ME CLDC、J2ME CDC 和 Kaffe。由于虚拟机越来越受到欢迎，因此虚拟机供应商的数目正在快速增涨。目前，并非所有虚拟机都是开源的，为了让虚拟机供应商及其用户能够根据其虚拟机特性，灵活的调整调试功能，Lauterbach 从 2010 年中开始致力于开发一种新的解决方案。以 Android Dalvik 虚拟机在 ARM 核的实现，做为停止模式下开发虚拟机应用程序接口的范例. 两个“调试世界” 对于系统开发者，Android 是一个开源软件栈，包括以下组件（见图 3）： • Linux 内核及其硬件驱动程序。 • Android Runtime 与 Dalvik 虚拟机以及一系列程序库：经典 Java 内核库，Android 特殊库、C/C++ 程序库。 • Java 应用程序及其支持的应用构架。 Android 软件可采用各种语言编写： • Linux 内核、一些程序库与 Dalvik 虚拟机代码可采用C、C++ 或 Assembler 编写。 • 虚拟机应用程序及其支持的应用构架可采用 Java 语言编写。每个代码块都在单独的“调试世界”内测试。调试 C/C++ 程序和汇编程序代码通过使用 JTAG 接口，采用 C/C++ 和汇编器编写的Android 程序可以在目标硬件上以停止模式调试。在停止模式调试时，TRACE32 调试器可直接与 Android 硬件平台的处理器通讯（见图 4）。停止调试模式的特点是：当处理器被停止以进行调试时，整个 Android 系统亦停止运行。停止模式调试具有以下主要优势： • 只需一个有效的 JTAG 接口即可实现调试器与处理器之间的通讯。 • 无需在目标上加载调试服务程序，因此非常适合于测试已发布软件。 • 它允许实时测试，因此能够有效调试仅在实时情况下才出现的问题。目前，停止模式调试暂不支持在 Dalvik VM 等虚拟机上调试 VM 应用程序，因此要实现所有软件层上均能够透明调试仍然需要一段时间。劳特巴赫工程师精心为您准备在线视频讲座，欢迎观看！ http://v.youku.com/v_show/id_XMzExNzcwMTI4.html 您可以登录“劳特巴赫(Lauterbach)中国公司”官方微博 http://weibo.com/lauterbach 官方博客 http://blog.sina.com.cn/lauterbachchina 官方网站 http://www.lauterbach.com/frames.html?country=cn%3fhome_c.html 留言与专家进行互动，为您做免费咨询解答。

更多...

标签: debugging