When I got out of university, my first job was as a member of a team designing CPUs for mainframe computers. Fortunately, integrated circuits were already in use by this time (it would have been a such a pain designing a mainframe computer at the transistor level), but their geometries were huge compared to today's deep-subµm devices.
Take the case of RAM and ROM memory chips, for example. Today, when I'm wandering around a store like Staples or Best Buy and I see a USB Flash memory stick containing say 16GB for a relatively small amount of money I think "Ho Hum" and carry on my merry way. I must have 10 or more such memory sticks in my backpack and I cannot say how many are scattered around my office (at conferences companies put presentations on them and give them away).
It's important to note that we're talking about gigabytes here, where each gigabyte is a thousand megabytes and each megabyte is a million bytes. If someone had used terms like 16GB to me back in 1980 I would have laughed my head off – even a single megabyte seemed to be a HUGE amount of memory at that time.
How embarrassing...
I remember once being instructed to create a "quick and dirty" test for a memory cabinet. Remember that we're talking mainframe computers here. A single memory cabinet was the size of one of today's large fridge-freezers. In the upper portion of the cabinet were a bunch of very large circuit boards, each perhaps 2 feet x 2 feet square and each carrying hundreds of memory chips (the chips themselves contained relatively small amounts of memory). The lower half of the cabinet contained the power supplies for that unit.
I was working in an engineering environment in which we were working on multiple cabinets (CPU, memory, peripheral devices...) and pulling boards out and plugging them in all the time. All that was required of me was to create a really simple test that could be run to check that the memory was functioning at the most rudimentary level (i.e. all of the boards were plugged in, powered up, and working).
So I set to with gusto and abandon. Purely for the sake of this discussion, let's say that the cabinet contained 100,000 words of memory, where each word was 64 bits wide (truth to tell, I can no longer recall the nitty-gritty details).
You have to remember that I was new to all of this, so I just did what seemed to make sense. First of all I wrote zeros into every part. Next, for each word in the memory I wrote a pattern on 01010101...0101 and read it back; then I wrote a pattern of 10101010...1010 and read that back; then I moved on to the next part.
Creating this test didn't take long. When my boss passed by I called him over and (using my command line interface) proudly executed the test, which soon completed and displayed something like Main Memory is OK on the screen.
My boss pondered this for a moment and then said: "That's very interesting. And the thing that's really interesting about it is that this memory cabinet is empty – the rack containing all of the memory boards is currently downstairs in the service department."
I opened the access panel to the upper portion of the cabinet and he was right – it was totally empty. Good grief, I felt like an idiot (but where could we find one at that time of the day ... sorry, I couldn't help myself).
So how was it that my tests passed when there wasn't even any memory in the cabinet? (The answer is given at the end of this column).
Of course I've learned a lot of "stuff" since those days of yore; for example...
Whose fault is it anyway?
For the purpose of these discussions, we shall take the term fault to refer to a physical failure mechanism such as a broken wire. Meanwhile, the term fault-effect refers to the way in which a fault manifests itself to the outside world.
In the case of memory devices, faults can be categorized as being either functional or dynamic. Functional faults include bad memory cells or bad access to these cells, while dynamic faults refer to timing failures.
One set of functional faults are predominantly associated with the interconnect (both on the circuit board and in the device). The majority of these will be stuck-at, bridging, or open faults. A stuck-at fault is a short between a signal and a ground or power plane, so (assuming we're working with Positive Logic) these are referred to as stuck-at-0 and stuck-at-1 faults, respectively.
Bridging faults are similar to stuck-ats in that they share common mechanisms (such as solder splashes at the board level or internal shorts at the device level). In the case of a bridging fault, however, the unwanted connection is between two or more signals rather than between a signal and a power plane.
Finally, an open fault refers to the lack of a desired connection, such as a broken track or a bad solder joint at the board level or a disconnected bonding wire at the device level. Open faults are referenced as open-0, open-1, or open-Z depending on the way in which they manifest themselves (where "Z" indicates a high-impedance value). For example, an open-0 fault indicates that a signal or input has become disconnected from its driving device(s), and that this signal or input will consequently "float" to a weak logic 0 value.
The "nameless" test sequence
Assuming for the moment that we're interested in a single RAM device (either in isolation or embedded in the middle of a circuit), the first thing we need to do is to test our access to the device in the form of its address and data busses. The reason we perform these tests first is that they are relatively quick and painless, and it's only after we've proved that we can actually "talk" to the device that we would wish to proceed to the time-consuming process of verifying its internal structures.
Before we look at the tests themselves, let's first consider a group of eight wires named a through h. Let's assume that we can drive signals into one end of these wires and monitor the results at the other end. Our task is to determine the minimum number of test patterns that are required to detect every possible stuck-at, bridging, and open fault on these wires as illustrated in Figure 1.
Figure 1: What is the minimum number of test patterns that are required to detect every possible stuck-at, bridge, and open fault on eight wires?
First of all, we know that we must check that each wire can be driven to a logic 0 and a logic 1. This will ensure that there are no stuck-at faults and, ignoring any weird capacitive effects, no open faults. To do this we could use just two test patterns, 000000002 and 111111112, but this would not reveal any bridging faults. In order to detect bridging faults we have to ensure that every wire can carry the opposite logic value to every other wire.
One of the simplest test sequences is the "walking ones," in which each wire is driven with a logic 1 while all of the other wires are driven with logic 0s. Thus, for n wires this sequence requires n test patterns, which, at a first glance, doesn't appear to be an unduly excessive requirement as illustrated in Figure 2(a).
Figure 2: The "nameless" sequence requires fewer tests than a "walking ones".
For a variety of reasons, however, we often wish to use the smallest possible test sequence that we can. An alternative test sequence that I call the "nameless" sequence (because I made it up myself and had never actually seen it documented anywhere until after I'd penned this piece) commences by dividing the wires into two groups. We start by driving the "left-hand" group with logic 1s and the "right-hand" group with logic 0s; then we proceed to divide each group into two sub-groups, and to drive each "left-hand" sub-group with logic 1s and each right-hand sub-group with logic 0s. This continues until we have alternating logic 1s and logic 0s on each wire, at which point we terminate the sequence by simply inverting all of the wires as illustrated in Figure 2(b).
[To be continued at How it used to be: Testing RAMs and ROMs (Part 2)]
文章评论(0条评论)
登录后参与讨论