 

用户1016725

文章：23 阅读：63262 评论：2 赞：85

 好友  私信个人主页

文章 23
原创 0
阅读 63262
评论 2
赞 85

原创 verification for the opensparc t1 ddr2 controller

 2008-7-17 16:39  12773 3 3 分类: 汽车电子

From OskiWiki

Jump to: navigation, search

This article discusses a formal test-planning process and associated verification strategy for a non-trivial open-source example (the Sun OpenSPARC? T1 - DDR2 controller). The principles presented here can be applied to DDR2 controllers in particular and generalized for other designs.

[edit]

Introduction

Model checking (referred to as Formal verification elsewhere in this article) usage has been increasing in recent times. This is partly due to the increase in performance and capacity of formal engines and partly due to the standardization of property specification languages. However, formal verification can still not lay claim to mainstream acceptance as a necessary part of verification sign-off. This can be attributed to a dearth of published material on three key aspects of effective formal verification - structured test-planning, re-usable verification IP and good verification strategy.

Recently, there have been several attempts made to address the test-planning aspects (Foster et al [1]). Formal verification IP re-use has also been reasonably well-documented [2]. However, there remains a need to disseminate effective verification strategies and methods to simplify complexity without necessarily compromising the completeness of the desired proof. Without good strategy most formal verification projects will be overwhelmed by complexity issues [3].

In this article we use the open-source OpenSPARC T1 DDR2 controller design [4] as a context to demonstrate formal verification planning and implementation. The design is of comparable complexity to many modern memory controllers which should render the presented ideas applicable to the verification of similar designs.

We discuss the re-use of formal verification IP to shorten schedules, use of interesting design abstractions and explain how cut-points can be a powerful compositional reasoning technique. The verification strategy employed to handle the design complexity is described in detail. Most memory controllers have common design strategies, even though they have been designed independently. This is partly because common design techniques permeate the industry over time and partly because of the requirements imposed by the standard specifications [5]. Therefore, we believe that the complexity issues described here are pertinent to most DDR2/DDR controller implementations. By extension, the techniques used to tackle the complexity are applicable to the whole class of memory controller designs and can be leveraged for other design types.

[edit]

OpenSPARC Design

The OpenSPARC-T1 processor is a highly integrated processor that implements the 64-bit SPARC V9 architecture. The processor, codenamed “Niagara” has been made freely available [6] by Sun to promote participation in SPARC processor architecture and application design. The design is representative of cutting edge processor architecture with multi-threading, multi-CPU and other advanced features.

Our focus area comprises the four on-chip dynamic random access memory (DRAM) controllers directly interfacing to the double data rate-synchronous DRAM (DDR2 SDRAM) modules. The controllers implement 144-bit interface per channel with 25 GB/sec peak total bandwidth. The source download package [7] for the chip contains the Verilog RTL source, documentation and a comprehensive simulation-based regression environment.

[edit]

DDR2 Controller Review

The functional block diagram of the DDR2 controller is shown below.

Fig. 1 DDR2 Controller Functional Diagram

The shaded region in the figure (sourced from "OpenSPARC T1 Micro-Architecture Specification Document"[8]) above shows the address and control logic block inside the controller. There exists high concurrency in the control path due to the multiple independent sources of command words to the DDR2 DIMMs. In order of arbitration priority, these sources are –

Periodic Refresh requests.
Pending CAS Requests.
Scrub Row-Address-Strobe (RAS) requests.
Incoming Read RAS requests.
Incoming Write RAS requests.

The read-write priority can change depending upon per-bank write-starvation counters and handling of write-before-read hazards.

This concurrency in the design makes the control logic in the shaded box a good candidate for formal verification. Also, the control logic is encapsulated as a separate Verilog module which means that we need make no effort to isolate the target logic.

The data-path contains elements like ECC-based error correction which due to the arithmetic operations involving large data-paths is not ideal for formal analysis. Suitable abstractions, such as black-boxing the ECC module and applying data-tagging patterns would make the data-path more amenable to analysis. However, for the purposes of this paper, we shall restrict the discussion to the control logic in the shaded area in Figure 1. above. The control logic module can be easily isolated in the OpenSPARC source (block name - dram_que).

[edit]

Formal Test-planning

The initial planning stages of a 7-step verification plan [9] are shown in Figure 2.

Fig. 2 Formal Verification Planning Sequence

The intuitive steps impose a structure upon the planning effort. We briefly outline the preliminary steps for the DDR2 controller verification.

Identification involves determination of design components that are in the formal verification “sweet-spot”. Blocks that have concurrency with limited sequential depth are good candidates. We have already identified the DDR2 control path logic as our target for the reasons mentioned in the previous section.

Description involves a high-level functional specification of the target block. This can be derived from internal implementation specifications as well as the standard protocols that the target implements. In our case, the JEDEC DDR2 protocol specification serves as the interface description.

Interface Description refers to the documentation of the inputs and outputs of the target block. The DRAM section of the OpenSPARC micro-architecture document describes the interfaces in detail. The point of the interface listing is to identify missing constraints (are all my inputs constrained appropriately?) and missing assertions (do I have assertions on all my outputs, wherever applicable?).

Generating Requirements refers to the creation of a set of English language properties derived from the high-level design description. Typically, these are classified as interface and end-to-end requirements. We shall restrict our verification scope to the DDR2 interface.

A set of English requirements for the control logic might read as follows –

Activate cannot be issued to a non-Idle bank.
Read and write commands can only be issued to active banks.
Check that the minimum Write to Read turnaround time is (CL-1) + (BL/2) + T_WTR …

The next step is to translate the requirements to a Formal Description i.e. a formal property specification language. It is here that a set of canned, pre-verified set of properties for a standard protocol interface like DDR2 can significantly reduce implementation time.

[edit]

Formal Verification IP

The DDR2 protocol is an industry standard specification for SDRAM memory modules [4]. Compliant interface implementations must obey the protocol rules in the specification. A set of DDR2 properties must therefore be re-usable in all DDR2 verification. The re-use of proven, canned sets of properties is an important element of formal verification planning and re-use. It can significantly shorten verification schedules with better ROI since less time is spent implementing properties and verifying correctness.

A DDR2 verification IP can be packaged as a Verilog module with an interface signature derived from the DDR2 specification. This is similar to packaging simulation assertions/checkers/monitors in modules that have the same ports as the observed interface, except that all the ports are inputs.

In this particular case, we had a set of pre-verified DDR2 properties available to us. It is instructive to go over the form of the VIP module. A small subset of the properties demonstrating the VIP construction can be downloaded from the links at the bottom of the page.

The DDR2 Formal VIP is articulated into module-level and per-bank properties with minimum auxiliary code sharing between different properties. There are two approaches to implementing property sets for non-trivial protocols that require supporting RTL logic to track design states. One approach is to code one or more monolithic FSMs and then writing properties that are derived from the states (or state transitions) in the large FSMs. The other approach is to minimize the shared logic between the different properties. The former approach is considered better for implementing constraints and the latter for assertions.

Since the DDR2 control path has only outputs, all our properties are assertions. The approach was to come up with a set of properties that each target distinct design attributes, but together cover all of the protocol. This minimizes code sharing and duplication. It keeps the logic cone of the assertions pared down to the minimum relevant state elements.

The basic structure of the VIP is shown below with some sample SVA properties.

[edit]

Verification Strategy

In this section we present a step-by-step discussion of the verification strategy. Note that while some of the elements, like initialization and exploiting symmetry, were anticipated a priori, the other implementation-specific abstractions were refined as we gained familiarity with the target design. This is fairly typical of the verification process.

[edit]

CSR Value Selection and Initialization

The DDR2 protocol specifies a large set of timing parameters. These parameters are typically stored in Controller Core CSRs accessible through a register programming interface. These (and other) CSRs need to be initialized by software before normal operation can commence. The programming of the registers is well-defined and typically done during boot-up. This highly sequential operation of significant depth is best done in simulation. We use the OpenSPARC simulation environment to take the design through initialization to an initial “normal” operation state from which point the formal engines take over.

Fig. 3 Simulation-based Initialization

Once initialized, it is important to block off the register write interface so that the CSRs do not change during formal analysis. Suitable constraints are written on the design inputs to disable register writes.

Further, the DDR2 protocol defines an enumeration sequence of command words to initialize the DDR2 SDRAM module. A DDR2 controller cycles through this sequence after boot-up. Once again, this sequential “hump” needs to be crossed before we can begin formal analysis of the controller design. We again turn to simulation to take the design over the enumeration sequence. The initial state for our formal analysis is the state of the design after CSR initialization and DDR2 enumeration sequence is complete.

[edit]

Identifying Formal Sweet-spots

Figure 4 shows the DDR2 protocol state-diagram. The states and transitions outside the shaded area involve significant sequential depth that can compromise formal analysis. The design inputs were constrained such that these state transitions are avoided.

Fig. 4 DDR2 Formal Sweet-spots

Simulation-based techniques are very good at exercising well-defined state transitions with high sequential depth that are avoided in formal analysis. This demonstrates how the two verification techniques are complimentary to each other.

[edit]

Exploiting Design Symmetry

Design symmetry can be used to reduce analysis complexity. For instance, a 32-bit data-path can be reduced to a single bit if the data-path is manipulated atomically (no bit-slice manipulation). In this case, the symmetry of the data-path allows us to consider a single bit in lieu of the 32 bits.

The DDR2 controller is capable of accessing four DDR2 modules concurrently with a 4-bit chip-select. The symmetry of the implementation means that if we verify the correctness for a single module (one bit of chip-select) while allowing the other modules to be accessed (but not monitored) then we have reasonable confidence in the completeness of the proof.

Further, the DDR2 controller implements eight banks. Similar to the chip-selects, there is symmetry in the per-bank queue management logic. Verifying the operation for a single bank would be equivalent to verifying each bank separately. Note that to exploit this symmetry the formal VIP must have corresponding symmetry in its implementation of per-bank properties.

[edit]

Abstracting Large Counters

Counters can add significant sequential depth to formal analysis. The DDR2 controller has a surfeit of counters. Most of these are small but the following are large enough to significantly impact performance –

Refresh Interval – The DDR2 controller issues periodic Refresh commands to the DDR2 modules. The period is established by a 13-bit counter.
Scrub Interval – The controller issues internal scrub commands at periodic intervals. The period is established by a 12-bit counter (see elaboration below).
Write Starvation Counters – The controller gives higher priority to Read as opposed to Write. To prevent write starvation, a 6-bit counter begins counting after the Write Queue is full. A pending Write has priority over Read when the counter is full.
CAS Latency Counters – Each bank has an 8-bit CAS latency counter that blocks further RAS picks if a CAS has been pending for more than 64 clock cycles.

These counters can add substantial amount of sequential depth (diameter) to the analysis. As a result, the analysis might not converge. We show how cut-points can serve as equivalent abstractions. Consider the scrub-interval decision logic –

assign que_scrb_read_en = que_scrb_time & 
    ~(que_scrb_write_valid | que_prev_scrb_wr_pending);

assign que_scrb_time = 
    (que_scrb_cnt[11:0] >=  chip_config_reg[20:9])  &    
    que_data_scrub_enabled & ~que_hw_selfrsh;

The listing shows how the decision about the scrub timing interval is dependent on the value of the que_scrb_time net. This net is driven by an equation involving the comparison of the interval counter with a programmed interval in a CSR. If we place a cut-point on this net i.e. we direct the formal engines to treat this net as if it were a primary input, then we remove the interval counter from the logic cone of (relevant) assertions on the controller outputs. This increases the likelihood of analysis convergence.

We have replaced the counter-driven logic by a simpler abstraction (an arbitrary un-driven net) that eliminates the sequential depth imposed by the counter. An analysis failure might arise due to the cut-point, which is easy to debug. On the other hand, a proof obtained with the cut-point is applicable to the original design as well. This is because we have replaced the specific counter logic that drives (constrains) the net, with an abstraction that under-constrains the analysis. For each of the counters listed earlier, a similar cut-pointing was done to eliminate them from the analysis.

[edit]

Handling Address Comparisons

The controller assigns higher priority to Read over Write unless there is a Read after Write (RAW) hazard. It handles RAW hazards by comparing an incoming Read address with all the write addresses currently pending in the write queue. The corresponding decision logic is shown below –

assign que_b0_wr_index_pend = (
        ( que_rd_addr_picked[31:0] == {writeqbank0addr0[35],   
                    writeqbank0addr0[33], writeqbank0addr0[31:5], 
	   writeqbank0addr0[2:0]}) & writeqbank0vld0_arb 
           | …
           |  (que_rd_addr_picked[31:0] == {writeqbank0addr7[35],                  
                     writeqbank0addr7[33], writeqbank0addr7[31:5],
	    writeqbank0addr7[2:0]}) & writeqbank0vld7_arb)
       & que_wr_addr_picked_vld & que_rd_addr_picked_vld ;

The net que_b0_wr_index_pend is asserted when a RAW hazard is detected. The 32-bit address comparison pulls in all the 32-bits of the address space, not to mention the recorded address bits in the Write queues. Consider the modified equation -

wire free;   // Undriven net considered primary input

assign que_b0_wr_index_pend = (
           writeqbank0vld0_arb | writeqbank0vld1_arb | …
           writeqbank0vld6_arb | writeqbank0vld7_arb) 
          &  que_wr_addr_picked_vld & que_rd_addr_picked_vld 
          &  free;

We have removed the address comparison terms from the equation driving the net. Instead we introduce a free variable that can be arbitrarily asserted by the formal analysis to introduce a hazard condition. This simpler abstraction does not rule out any hazard condition that is admitted by the original equation.

Therefore, a proof with the simpler equation is applicable to the original design. We have accomplished two things with this abstraction – First, we have eliminated the 32-bit address words stored in the write queue from the logic cone of relevant assertions. Second, we have reduced the need to monitor the 32-bit incoming read address. The free net removes the address logic from the simplified hazard equation.

Fig. 5 Eliminating Address-bits from COI

Address comparisons of this type are common. For instance, code that checks whether an address falls within an acceptable range. This technique can be judiciously used in these situations to eliminate complexity without compromising the quality of the proof.

[edit]

Applying Verification Patterns

Patterns can be defined as generalized solutions in a given engineering domain that find recurring application to the problems of that domain. We illustrate the use of one such verification pattern, the Floating Pulse, which was used to implement a DDR2 SVA property that uncovered a potential bug in the control-path. The pattern expresses intent to –

Specify that a single bit value can be asserted for only one cycle in any cycle of an infinite sequence.

We use the pattern to implement the following DDR2 property –

No more than 4 ACTIVATE commands may be issued to the DDR2 SDRAM within a window of T_FAW clock cycles.

We can use the Floating Pulse pattern here to fix the lower bound of our sampling window of width T_FAW clock cycles. Then a counter that counts off the T_FAW cycles is used to define the sampling window within which we check if more than 4 ACTIVATEs were issued.

We first setup a pulse element through the following assumption. This is an SVA implementation of the Floating Pulse pattern.

 assume_floating_pulse_pattern : 
    assume property (   
        @(posedge clk)  
         pulse ##0 (1'b1)[*0:$] |=> !pulse
    );

We bind the pulse to (any and all) occurrence of an ACTIVATE command. The pulse assertion marks the start of our sampling window. We then load a counter at the start of the window and begin counting down.

always @ (posedge CK or negedge RST_N) begin 
    if (!RST_N) 
        num_banks_activated <= 4'd0; 
    else if(pulse && activate_issued)
        num_banks_activated <= 4'd1; 
    else if (activate_issued && !num_banks_activated ) 
         num_banks_activated <= num_banks_activated + 4'd1; 
end

always @ (posedge CK or negedge RST_N) begin 
    if (!RST_N) tfaw_counter <= 4'd0; 
    else if(tfaw_counter!= 4'd0) 
        tfaw_counter <= tfaw_counter - 4'd1; 
    else if(pulse && activate_issued) // start of window 
        tfaw_counter <= T_FAW - 4'b1; 
end

The following assertion ties everything together.

property p_tFAW_rolling_window; 
    @(posedge CK) disable iff (!RST_N)  
    !((tfaw_counter != 0) && (num_banks_activated > 4'd4));
endproperty

This property failed for the OpenSPARC T1 DDR2 controller (control-path) block. The controller issued five ACTIVATE commands within a T_FAW window violating the protocol. It is possible that some element outside the design controls the ACTIVATE commands but purely in the context of the controller design this can be construed as an error of omission. Figure 6 shows the property violation on the last clock cycle.

Fig.6 tFAW Window Violation

[edit]

Summary

In this paper we presented elements of verification strategy for a DDR2 controller design. We showed how simulation-assisted initialization, exploiting design symmetry, using cut-points, design abstractions and verification patterns can be used to reduce complexity. The techniques and abstractions presented have been found to be equally effective in the verification of other classes of designs. The work presented is in the public domain. The design and verification material can be downloaded below.

[edit]