热度 22
2014-9-29 18:13
2056 次阅读|
0 个评论
We are all familiar with saying "the devil is in the details". All too often those details are hidden deep within a datasheet where you can easily overlook them. When a datasheet reference circuit is copied into a product, the designer must still be fully aware of how the circuit functions and anticipate unexpected problems that might arise from slight deviations. Take a recent case of an LT1640 hot-swap controller IC, often used in a hot-plug telecom fan tray. I was asked to reverse-engineer this so our technicians would know how to power it on the bench without a using a chassis. Nothing complicated about it, just the usual slow turn-on of a pass MOSFET in series with the load, thereby slowing the dV / dt and limiting the inrush current to the load input-filter capacitors. After drawing some schematics, I connected it to my -48V power supply and a resistive-only load, hit it a few times with a grabber clip at -48V to emulate true metallic contact bounce, and saw the nasty little surprise shown in figure 1 . Figure 1: What caused this negative-going glitch in a hot-swap circuit? For a circuit whose main purpose is to prevent sudden surges upon power-up, this one failed miserably. Now what? Well, maybe that's why the customer sent this unit in for repair. An inrush-current power-suckout on hot-insertion can cause a momentary voltage sag that results in an entire system reset. I could easily imagine how a technician plugged in this fan tray and the entire shelf came crashing down. There must be a problem with the fan tray, right? Unfortunately, because we engineers are such experts in fault-fixing, our esteemed-and-mighty management does not require our customers to include such mundane details as actually describing the failure mode of whatever they send in for repair. So, we were forced to guess. A close-up of the premature MOSFET turn-on is shown in Figure 2 . On power-up the series-pass MOSFET is conducting for 800 µs, plenty of time to wreak havoc on the rest of the system. Figure 2: A MOSFET conducts for 800 µs (low-going part of blue trace). That's enough to cause a system reset. It so happened that this card included an identical and totally isolated slow-start circuit for the usual redundant second -48V supply. It too failed in exactly the same way. Figure 3 shows the recommended slow-start circuit copied from the Linear Technology LT1640 Hot Swap Controller datasheet . Figure 3: The hot-swap circuit published by Linear Technology in its LT1640 datasheet. No discharge path On startup, after the undervoltage (UV) input is satisfied, capacitor C1 is slowly charged by a 45 µA current source from the LT1640. Any event that requires turning off the pass MOSFET Q1 causes the LT1640 GATE pin to discharge C1 and the MOSFET C GS with a 50 mA current sink. This is clearly explained in the data sheet electrical specifications. The customer's unit included a small capacitor between the UV pin and V EE , as suggested in the datasheet. Note that there is no discharge path for C1 other than the LT1640 gate current sink. When the fan tray is removed from the shelf, it loses V DD and the LT1640 can no longer sink current. With sufficient capacitance at V DD and C1 at 150 nF, this is not a problem. The LT1640 should discharge C1 in its last dying gasp (unfortunately, this aspect is not discussed in the datasheet). So, the original designer of the customer's product included only an EMI input filter with minimal capacitance. To verify my failure mode assumption, I measured the Q1 gate-source voltage before and after power-up ( Figure 4 ), note the oscilloscope ground is now moved to the MOSFET source because I really hate trying to think in terms of negative voltages. Figure 4: Because C1 doesn't discharge after initial power-up, it caused the glitch in Figure 1. Just as I thought. After a first power-up, C1 and the MOSFET gate remain charged when the fan tray is unplugged from the shelf. So on the next power-up, C1 is still charged and the MOSFET remains conducting. C1 should have discharged. That's exactly what the LT1640 is trying to do: discharge C1 at the new power-up, right? But take a close look at the time—about 1.5 ms to discharge at 50 mA current sink. Seems kind of long for a 150 nF C1, doesn't it? Some back-of-the-napkin calculation (I = C dV/dt) indicates C1 seems more like 10 µF. My bench DMM indicated that the customer's C1 was about 8.5 µF, measured in-circuit. I didn't trust that reading because the DMM measures C at 1 V and could be biasing some junctions on in the LT1640, giving a false reading. I mean, really, why would a design engineer stick a 10 µF capacitor where the datasheet reference circuit called for 0.15 µF? Pulling out the heavy artillery, I drove the DUT C1 in-circuit through a 1K resistor with a ±400mV square wave from a function generator, a low enough voltage so as to not bias any silicon junctions in the LT1640, and displaying a convenient eight divisions peak-to-peak on the oscilloscope scale. With this trick, five divisions on the resulting risetime = 5/8 = 62.5%, close enough to the standard 63% RC time constant. The cursors mark 8 ms risetime to five divisions. Knowing the resistance, some more back-of-the-napkin calculation makes C1 about 8 µF, just as determined before. Yes, the design engineer wanted a slower MOSFET turn-on time and had inserted a 10 µF, ±20% capacitor into a circuit that called for 0.15 µF, completely unaware of the problem it would cause. The data sheet didn't warn against this, nor did it warn of the possibility that the MOSFET gate could remain charged when the unit was unplugged from the system. With only a tiny leakage discharge current for C1, the MOSFET could remain in the conducting state for hours or days afterwards (kind of like DRAM), just waiting for another unsuspecting technician to hot-plug it into a system and bring it crashing down. With a capacitor this large, the designer should have included a bleeder resistor. I'm really surprised that this design flaw wasn't discovered by the fan tray OEM during product testing prior to production release. My only theory is that their management laid off the contract designer as soon as the PCB was laid out and figured they didn't help to sort out any resulting bugs. Or maybe they just ignored the problem, hoping it would go away by itself. Just for fun, I intend to try simulating this in LTspice. It will be interesting to see if this problem can be virtually reproduced. Glen Chenier Engineer