Debugging and troubleshooting are major parts of the engineer's world. I've always regarded them as the most difficult of engineering challenges because it takes a combination of factors: technical expertise, experience and practice, lessons learned from mistakes, ability to think in both big picture and detail modes, willingness to question everything, ability to think outdid the box, a dose of luck, and more.
Troubleshooting a once-working product is a humbling, constant reminder that even "simple" can be tricky. Obviously, some troubleshooting challenges are more difficult than others (I define "troubleshooting" as finding out what's wrong with something that has worked well and been in the field for a while, while "debugging" is trying to get a prototype or pilot run unit working to spec). Intermittent problems are often the worst, not only because they are hard to track down, but you are often not sure that you really found the source—or the problem just go away for a while on its own?
I was reminded of this lesson when my flashlight started acting strange. It's a quality Maglite unit from Mag Instrument Inc., not some cheap throwaway, and it became frustratingly intermittent. The incandescent bulb would flicker, go off and then back on when jostled or even when just resting on the counter. I figured this would be an easy problem to resolve—after all, it's just a flashlight, and has no software, no active components, nothing except a bulb, alternate-action sealed switch, and two batteries, that's it.
I checked for corrosion of the contacts (there was none), and even rigged up a separate circuit so I could check the bulb for an internal intermittent by tapping the glass enclosure (it was solid). I tested the sealed on/off switch with a continuity checker and that was fine. My summary: every component was fine, but the assembled product was not. Adding to the confusion was that at one point, I thought the unit was off, but it wasn't.
Later, it came back on by itself and drained the batteries, so I had an added problem of now-dead batteries, which added to the confusion while I was collected my data and evidence. Still, I was determined not be defeated by a mere flashlight.
Long story short: I looked at the end-cap which screws onto the body, Figure, and which is the current path from the negative side of the battery to the body and then up to the switch. It seemed to screw down solidly but I had to check everything. I cleaned the machined external threads of the end cap and the internal threads of the body, sprinkled some powdered graphite onto them to both enhance conductivity and also "grease" them for smoother torque-down, and felt a nice, solid "thunk" as the bottom cap seated on the body when I threaded it on. (The instructor at the Loctite mini-course I took on threaded fasteners emphasized that dirt on threads not only result in a poor final assembly, it fools you into thinking the nut is on tight because it increases the apparent torque.)
That did it. The flashlight’s performance was now steady, no intermittent behavior, and the bulb was also brighter. Problem solved—and the diagnosis was obvious, in retrospect: the screw-on end cap was not making good contact. That's really no surprise, since so many intermittents in electronic products are due to mechanical issues. In fact, the first rules of troubleshooting are to check the power source and all connections, and I should have known better!
While there are a lot of case studies on debugging and troubleshooting, there isn't much technical literature about these as a discipline, and it's easy to understand why: the subject is tough to get your hands and mind around. The only resource I know is the excellent book Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems by David Agans. I strongly recommend you take a look at it, even if you are an experienced engineer. (Don't be put off because it is published by the American Management Association—it's a down-in-the-trenches book, not some high-level abstract treatise).
Have you ever had a simple product with a vexing problem that took a significant amount of time and mental energy to troubleshoot? Did you feel relieved, foolish, or some other emotion when you finally found and resolved the problem—that is, assuming you did?
文章评论(0条评论)
登录后参与讨论