End-of-Unit Problems: System Integration

Work through these problems to reinforce your understanding of top-down design, datapath-controller partitioning, timing analysis, pipelining, and system-level design trade-offs.

Section A: Top-Down Design and Modularity (4 problems)

Problem 1

A digital combination lock accepts a 4-digit code (each digit 0-9) entered sequentially. Decompose the system using top-down design into hierarchical modules. Identify each module, its inputs, outputs, and its role in the system.

Show Solution

**Level 0 — Top-level system:**

┌─────────────────────────────────────────────┐
│            Combination Lock System           │
│                                              │
│  Keypad ──→ [Controller] ──→ Lock Actuator   │
│              ↕                                │
│         [Code Memory]                        │
└─────────────────────────────────────────────┘

**Level 1 — Subsystem decomposition:** | Module | Inputs | Outputs | Function | |--------|--------|---------|----------| | Keypad Encoder | 10 key lines | 4-bit BCD digit, KEY_VALID | Encodes pressed key to binary | | Digit Comparator | entered digit, stored digit | MATCH | Compares one digit against stored code | | Sequence Counter | KEY_VALID, RESET | digit_index[1:0] | Tracks which digit (0-3) is being entered | | Code Memory | digit_index | stored_digit[3:0] | Stores the 4-digit secret code | | Main Controller (FSM) | MATCH, digit_index, KEY_VALID | UNLOCK, RESET, ERROR | Orchestrates the verification sequence | | Lock Actuator | UNLOCK | physical lock signal | Drives the solenoid/motor | | Timeout Timer | CLK, RESET | TIMEOUT | Resets system after inactivity | **Level 2 — Detailed module breakdown:** - **Keypad Encoder:** 4x3 matrix scanner + priority encoder + debounce circuit - **Main Controller FSM states:** IDLE, CHECK_D0, CHECK_D1, CHECK_D2, CHECK_D3, UNLOCK, ERROR_LOCKOUT - **Code Memory:** 4 x 4-bit register file (can be ROM for fixed code or RAM for programmable code) **Design hierarchy:**

Combination Lock System
├── Keypad Encoder
│   ├── Matrix Scanner
│   ├── Priority Encoder
│   └── Debounce Circuit
├── Main Controller FSM
├── Sequence Counter (2-bit)
├── Code Memory (4 x 4-bit)
├── Digit Comparator (4-bit equality)
├── Timeout Timer (counter)
└── Lock Actuator Driver

**Key design principle:** Each module has a well-defined interface. The controller FSM coordinates all modules without knowing their internal implementation.

Problem 2

Explain the difference between structural and behavioral decomposition in top-down design. For a 4-bit ALU that supports ADD, SUB, AND, OR, give both decompositions.

Show Solution

**Structural decomposition** breaks the system into physical components and their interconnections. **Behavioral decomposition** breaks the system into functional operations without specifying implementation. **Behavioral decomposition of 4-bit ALU:**

4-bit ALU
├── Addition operation (A + B)
├── Subtraction operation (A - B)
├── Bitwise AND operation (A & B)
└── Bitwise OR operation (A | B)

Each operation is described by what it does, not how it is built. **Structural decomposition of 4-bit ALU:**

4-bit ALU
├── 4-bit Adder/Subtractor
│   ├── XOR array (4 gates, for B complement)
│   ├── 4-bit Ripple Carry Adder
│   │   ├── Full Adder bit 0
│   │   ├── Full Adder bit 1
│   │   ├── Full Adder bit 2
│   │   └── Full Adder bit 3
│   └── Carry-in = SUB control
├── AND array (4 AND gates)
├── OR array (4 OR gates)
└── Output MUX (4-bit, 4-to-1)
    └── Select = Op[1:0]

**Block diagram (structural):**

A[3:0] ──┬──────────────────────┬──────┬──────┐
         │                      │      │      │
B[3:0] ──┼──[XOR w/ SUB]──[Adder]  [AND]  [OR]│
         │        │           │      │      │  │
         │        └───────────┼──────┼──────┼──┘
         │                    ↓      ↓      ↓
         │              ┌─────────────────────┐
Op[1:0]──┴─────────────→│    4-to-1 MUX       │
                        └─────────┬───────────┘
                                  ↓
                              Result[3:0]

| Op[1:0] | Operation | MUX selects | |---------|-----------|-------------| | 00 | ADD | Adder output | | 01 | SUB | Adder output (B complemented, Cin=1) | | 10 | AND | AND array output | | 11 | OR | OR array output | **Key difference:** Behavioral decomposition is technology-independent and focuses on specification. Structural decomposition commits to a specific implementation architecture.

Problem 3

A digital system must be designed with the following specifications. Identify all modules needed and draw the top-level block diagram showing interconnections.

Read temperature from an 8-bit ADC every 100 ms
Compare temperature against a programmable threshold
Activate a cooling fan if temperature exceeds the threshold
Display current temperature on a 2-digit 7-segment display

Show Solution

**Module identification:** | Module | Type | Function | |--------|------|----------| | Clock Divider | Counter | Generates 10 Hz sample signal from system clock | | ADC Interface | Controller | Manages ADC read timing and handshake | | Threshold Register | Register | Stores programmable 8-bit threshold | | Comparator | Combinational | Compares ADC value > threshold | | Fan Controller | FSM | Manages fan on/off with hysteresis | | Binary-to-BCD | Combinational | Converts 8-bit binary to 2-digit BCD | | 7-Segment Decoder | Combinational | Converts BCD to segment patterns | | Display MUX | Counter + MUX | Time-multiplexes two digit displays | | Main Controller | FSM | Coordinates sampling and data flow | **Top-level block diagram:**

System CLK ──→ [Clock Divider] ──→ SAMPLE_TICK
                                       │
                    ┌──────────────────┘
                    ↓
ADC_DATA[7:0] ←── [ADC Interface] ←── ADC_BUSY
ADC_START     ──→       │
                        ↓
                 TEMP[7:0] ──┬──→ [Comparator] ──→ OVER_TEMP
                             │         ↑
                             │    [Threshold Reg] ←── THRESH_IN[7:0]
                             │                    ←── LOAD_THRESH
                             │
                             │    [Fan Controller] ←── OVER_TEMP
                             │         │
                             │         └──→ FAN_ON
                             │
                             └──→ [Bin-to-BCD] ──→ BCD_TENS[3:0]
                                       │           BCD_ONES[3:0]
                                       ↓
                                 [7-Seg Decode] ──→ SEGMENTS[6:0]
                                       │
                                 [Display MUX] ──→ DIGIT_SEL

**Main controller FSM states:** - IDLE: Wait for SAMPLE_TICK - START_ADC: Assert ADC_START - WAIT_ADC: Wait for ADC_BUSY to deassert - READ_DATA: Latch ADC_DATA into TEMP register - UPDATE: Trigger comparison and display update - Return to IDLE **Interface signals between modules are clearly defined,** allowing each module to be designed and tested independently.

Problem 4

What are the three key principles of modular design? For each principle, give a digital design example showing how violating it leads to problems.

Show Solution

**The three key principles:** **1. Well-defined interfaces** Each module communicates only through declared input/output ports with specified data types and timing. - **Good:** ALU has inputs A[7:0], B[7:0], Op[1:0] and output Result[7:0], Carry_out - **Violation:** Module A directly reads internal flip-flop values of Module B rather than using output ports - **Problem:** Changing Module B's internal implementation breaks Module A. No isolation between modules. **2. Encapsulation (information hiding)** Internal implementation details are hidden; only the interface is visible to other modules. - **Good:** A counter module exports COUNT[3:0] and TC (terminal count); its internal carry chain is hidden - **Violation:** The top-level design routes carry signals directly between internal full adders of separate counter modules - **Problem:** Cannot replace the ripple counter with a faster carry-lookahead counter without redesigning the entire system. **3. Single responsibility** Each module performs one clearly defined function. - **Good:** Separate modules for keyboard decoding, FSM control, and display driving - **Violation:** One monolithic FSM handles keyboard scanning, sequence checking, display multiplexing, and timeout simultaneously - **Problem:** The FSM state space explodes combinatorially. A 4-state keyboard scanner x 5-state lock controller x 3-state display driver = 60 states in a single FSM instead of three small FSMs (4 + 5 + 3 = 12 states total). **Quantitative example of the state explosion:** | Design approach | FSM states | Flip-flops needed | |----------------|-----------|-------------------| | Single monolithic FSM | $4 \times 5 \times 3 = 60$ | $\lceil\log_2 60\rceil = 6$ | | Three separate FSMs | $4 + 5 + 3 = 12$ | $2 + 3 + 2 = 7$ flip-flops, but far simpler logic | Although the monolithic FSM uses one fewer flip-flop, its next-state logic is vastly more complex and nearly impossible to debug.

Section B: Datapath and Controller Design (4 problems)

Problem 5

Explain the datapath-controller partitioning approach. For a circuit that computes $Y = A \times B + C$ where A, B, C are 8-bit inputs, design the datapath and controller separately.

Show Solution

**Datapath-controller partitioning:** - **Datapath:** Contains the components that store, transform, and route data (registers, ALUs, multiplexers, buses) - **Controller:** An FSM that generates control signals to orchestrate datapath operations each clock cycle - **Separation:** The controller does not process data; the datapath does not make sequencing decisions **Datapath design for $Y = A \times B + C$:**

A[7:0] ──→ [Reg_A] ──→ ┐
                        ├──→ [8x8 Multiplier] ──→ [Reg_P] ──→ ┐
B[7:0] ──→ [Reg_B] ──→ ┘                                      ├──→ [16-bit Adder] ──→ [Reg_Y]
                                                               │                        │
C[7:0] ──→ [Reg_C] ──────────────→ (zero-extended to 16 bits) ┘                        ↓
                                                                                    Y[15:0]

**Datapath control signals:** | Signal | Function | |--------|----------| | LD_A | Load register A | | LD_B | Load register B | | LD_C | Load register C | | LD_P | Load product register | | LD_Y | Load result register | **Controller FSM:**

┌──────────────────────────────────────────┐
│  S0: IDLE          (wait for START)      │
│  ──→ if START: go to S1                  │
├──────────────────────────────────────────┤
│  S1: LOAD          assert LD_A,LD_B,LD_C │
│  ──→ go to S2                            │
├──────────────────────────────────────────┤
│  S2: MULTIPLY      (wait for mult done)  │
│  ──→ if MULT_DONE: assert LD_P, go to S3│
├──────────────────────────────────────────┤
│  S3: ADD           assert LD_Y           │
│  ──→ go to S4                            │
├──────────────────────────────────────────┤
│  S4: DONE          assert RESULT_VALID   │
│  ──→ go to S0                            │
└──────────────────────────────────────────┘

**Control signal table:** | State | LD_A | LD_B | LD_C | LD_P | LD_Y | RESULT_VALID | |-------|------|------|------|------|------|-------------| | S0 | 0 | 0 | 0 | 0 | 0 | 0 | | S1 | 1 | 1 | 1 | 0 | 0 | 0 | | S2 | 0 | 0 | 0 | 1* | 0 | 0 | | S3 | 0 | 0 | 0 | 0 | 1 | 0 | | S4 | 0 | 0 | 0 | 0 | 0 | 1 | *LD_P asserted only when MULT_DONE = 1 **Latency:** 4 clock cycles (load + multiply + add + done)

Problem 6

Design the datapath and controller for a circuit that finds the maximum of three 8-bit unsigned numbers A, B, and C. Show the ASM (Algorithmic State Machine) chart.

Show Solution

**Algorithm:** 1. Load A, B, C 2. Compare A and B; keep the larger as TEMP 3. Compare TEMP and C; keep the larger as MAX **Datapath:**

A[7:0]──→[Reg_A]──→┐                      ┌──→[Reg_MAX]──→ MAX[7:0]
                    ├──→[8-bit Comparator]──┤
B[7:0]──→[Reg_B]──→┤   + 2-to-1 MUX       │
                    │        ↑              │
C[7:0]──→[Reg_C]───┘    SEL_AB, SEL_MC     │
                                            │
                [2:1 MUX] ←─────────────────┘
                   ↑
                SEL_MC

**Refined datapath with two comparisons:**

A ──→ [Reg_A] ──→ ┐
                  ├──→ [CMP1: A>B?] ──→ A_GT_B (status to controller)
B ──→ [Reg_B] ──→ ┘
                  ┌──→ [MUX1] ──→ TEMP
Reg_A output ─────┤      ↑
Reg_B output ─────┘   SEL_AB (from controller)

TEMP ─────────────┐
                  ├──→ [CMP2: TEMP>C?] ──→ T_GT_C (status to controller)
C ──→ [Reg_C] ──→ ┘
                  ┌──→ [MUX2] ──→ [Reg_MAX]
TEMP ─────────────┤      ↑
Reg_C output ─────┘   SEL_MC (from controller)

**ASM Chart:**

┌──────────┐
│ S0: IDLE │ ──── START=0? ──→ (stay)
└────┬─────┘
     │ START=1
     ↓
┌──────────┐
│ S1: LOAD │  LD_A=1, LD_B=1, LD_C=1
└────┬─────┘
     ↓
┌──────────┐
│ S2: CMP1 │  Compare A, B
└────┬─────┘
     │
    ◇ A_GT_B?
   / \
  Y   N
  │   │
  ↓   ↓
SEL_AB=0  SEL_AB=1
(pick A)  (pick B)
  │   │
  └─┬─┘
    ↓
┌──────────┐
│ S3: CMP2 │  Compare TEMP, C
└────┬─────┘
     │
    ◇ T_GT_C?
   / \
  Y   N
  │   │
  ↓   ↓
SEL_MC=0  SEL_MC=1
(pick T)  (pick C)
  │   │
  └─┬─┘
    ↓
┌──────────┐
│ S4: DONE │  LD_MAX=1, VALID=1
└────┬─────┘
     ↓
   (go to S0)

**Controller outputs per state:** | State | LD_A | LD_B | LD_C | SEL_AB | SEL_MC | LD_MAX | VALID | |-------|------|------|------|--------|--------|--------|-------| | S0 | 0 | 0 | 0 | - | - | 0 | 0 | | S1 | 1 | 1 | 1 | - | - | 0 | 0 | | S2 | 0 | 0 | 0 | A_GT_B' | - | 0 | 0 | | S3 | 0 | 0 | 0 | - | T_GT_C' | 0 | 0 | | S4 | 0 | 0 | 0 | - | - | 1 | 1 | **Total latency:** 4 clock cycles from START to VALID.

Problem 7

A datapath for a simple accumulator processor includes: an 8-bit register (ACC), an 8-bit adder, and a 2-to-1 MUX. The processor supports three operations: LOAD (ACC = input), ADD (ACC = ACC + input), CLEAR (ACC = 0). Design the datapath and the controller.

Show Solution

**Datapath:**

DATA_IN[7:0] ──→ ┐
                 ├──→ [2:1 MUX] ──→ [8-bit Adder] ──→ [ACC Register]──┬──→ ACC_OUT[7:0]
0x00 ────────────┘        ↑              ↑                │             │
                       MUX_SEL        B input             │             │
                                         │                └─────────────┘
                                         └────────────────(feedback from ACC)

**Refined datapath with all three operations:**

                    MUX_A_SEL
                       ↓
DATA_IN ──→ [MUX_A] ──→ A ──→ [8-bit Adder] ──→ SUM ──→ [MUX_R] ──→ [ACC]──→ ACC_OUT
0x00 ──────→           B ←──────────────────────── ACC     ↑
                                                        MUX_R_SEL
                                                        LD_ACC

**Control signal encoding:** | Operation | MUX_A_SEL | MUX_R_SEL | LD_ACC | Effect | |-----------|-----------|-----------|--------|--------| | NOP | X | X | 0 | ACC unchanged | | LOAD | - | 0 | 1 | ACC = DATA_IN (MUX_R selects DATA_IN directly) | | ADD | 0 | 1 | 1 | ACC = ACC + DATA_IN (MUX_R selects adder output) | | CLEAR | 1 | 1 | 1 | ACC = ACC + 0 ... | **Simpler approach — single MUX at ACC input:**

DATA_IN[7:0]──────────────────→ ┐
(ACC + DATA_IN) from adder ───→ ├──→ [4:1 MUX] ──→ D ──→ [ACC Reg] ──→ ACC_OUT
0x00 (constant zero) ─────────→ ┤        ↑                    │
ACC_OUT (hold) ───────────────→ ┘     SEL[1:0]                │
                                                               │
ACC_OUT ──→ [Adder] ←── DATA_IN                               │
              │                                                │
              └────────(ACC + DATA_IN)                         │
                                                               │
              └────────────────────────────────────────────────┘

| SEL[1:0] | MUX Output | Operation | |----------|------------|-----------| | 00 | DATA_IN | LOAD | | 01 | ACC + DATA_IN | ADD | | 10 | 0x00 | CLEAR | | 11 | ACC_OUT | NOP (hold) | **Controller FSM:**

┌────────┐    OP = LOAD     ┌──────────┐
│  IDLE  │ ──────────────→  │  EXECUTE  │
│        │    OP = ADD      │           │
│        │ ──────────────→  │  SEL =    │
│        │    OP = CLR      │  f(OP)    │
│        │ ──────────────→  │  LD_ACC=1 │
└────────┘                  └─────┬─────┘
     ↑                            │
     └────────────────────────────┘

For a single-cycle implementation, the controller is purely combinational: - SEL[1:0] = OP[1:0] - LD_ACC = (OP != NOP)

Problem 8

Design the datapath and controller for a binary search circuit that finds whether a value KEY exists in a sorted 8-element array stored in a register file. Each element is 8 bits. Show the datapath components, control signals, and FSM state diagram.

Show Solution

**Algorithm (binary search):** 1. Set LOW = 0, HIGH = 7 2. Compute MID = (LOW + HIGH) / 2 3. Read ARRAY[MID] 4. If ARRAY[MID] = KEY: FOUND 5. If ARRAY[MID] < KEY: LOW = MID + 1 6. If ARRAY[MID] > KEY: HIGH = MID - 1 7. If LOW > HIGH: NOT_FOUND 8. Repeat from step 2 **Datapath components:**

KEY[7:0] ──→ [Reg_KEY]
                               ┌──→ [Comparator] ──→ EQ, LT, GT
[Reg File: 8 x 8-bit] ──→ DATA_OUT ──┘        ↑
     ↑ ADDR[2:0]                           Reg_KEY output
     │
[MID Register] ──→ ADDR
     ↑
[Adder/Shifter] ──→ (LOW + HIGH) >> 1
     ↑         ↑
[Reg_LOW]   [Reg_HIGH]
     ↑         ↑
MID+1 ──┘   MID-1 ──┘  (via adder with +1/-1)

**Datapath block diagram:**

┌───────────────────────────────────────────────────┐
│                   DATAPATH                        │
│                                                   │
│  [Reg_LOW] ──┬──→ [Adder] ──→ [>>1] ──→ [Reg_MID]│
│  [Reg_HIGH]──┘      ↑                      │     │
│       ↑  ↑        LOW+HIGH               ADDR    │
│     LD_L LD_H                              ↓     │
│       ↑  ↑                          [Reg File]   │
│    MID+1 MID-1                          │        │
│                                    DATA_OUT      │
│                                         ↓        │
│  [Reg_KEY] ──→ [Comparator] ←── DATA_OUT         │
│                  │  │  │                          │
│                 EQ  LT GT  (status to controller) │
│                                                   │
│  [Comparator2: LOW > HIGH?] ──→ DONE_NF          │
└───────────────────────────────────────────────────┘

**Controller FSM:** | State | Actions | Next State | |-------|---------|------------| | S0: INIT | LD_L (LOW=0), LD_H (HIGH=7), LD_KEY | S1 | | S1: CALC_MID | MID = (LOW+HIGH)>>1, LD_MID | S2 | | S2: READ | Read Reg_File[MID] | S3 | | S3: COMPARE | Check comparator outputs | S4 or S5 or S6 or S7 | | S4: FOUND | Assert FOUND=1 | S0 | | S5: GO_HIGH | LOW = MID+1, LD_L | S7 | | S6: GO_LOW | HIGH = MID-1, LD_H | S7 | | S7: CHECK_DONE | If LOW > HIGH: NOT_FOUND, else go to S1 | S1 or S8 | | S8: NOT_FOUND | Assert NOT_FOUND=1 | S0 | **State diagram:**

S0 ──→ S1 ──→ S2 ──→ S3 ──→ EQ? ──→ S4 (FOUND)
                           │
                    LT? ──→ S5 ──→ S7 ──→ LOW>HIGH? ──→ S8 (NOT_FOUND)
                           │              │
                    GT? ──→ S6 ──→ S7     └──→ S1 (repeat)

**Worst-case latency:** $\lceil\log_2 8\rceil = 3$ iterations, each 4-5 clock cycles, so approximately **12-15 clock cycles**.

Section C: Timing Analysis (4 problems)

Problem 9

A synchronous circuit uses D flip-flops with the following parameters:

Clock-to-Q delay: $T_{cq}$ = 0.5 ns
Setup time: $T_{setup}$ = 0.3 ns
Hold time: $T_{hold}$ = 0.1 ns
Combinational logic delay: $T_{comb}$ = 2.8 ns

Calculate the maximum clock frequency $f_{max}$ and verify the hold time constraint.

Show Solution

**Setup time constraint (determines $f_{max}$):** The data must arrive at the next flip-flop's D input at least $T_{setup}$ before the next clock edge. $$T_{clk} \geq T_{cq} + T_{comb} + T_{setup}$$ $$T_{clk} \geq 0.5 + 2.8 + 0.3 = 3.6 \text{ ns}$$ $$f_{max} = \frac{1}{T_{clk,min}} = \frac{1}{3.6 \text{ ns}} = 277.8 \text{ MHz}$$ **Hold time constraint:** The data at the D input must remain stable for at least $T_{hold}$ after the clock edge. $$T_{cq} + T_{comb,min} \geq T_{hold}$$ The minimum combinational delay path (shortest path between flip-flops): $$T_{comb,min} \geq T_{hold} - T_{cq} = 0.1 - 0.5 = -0.4 \text{ ns}$$ Since $T_{comb,min}$ is always $\geq 0$, the hold constraint is **automatically satisfied** (any non-negative combinational delay meets the requirement). **Timing diagram:**

CLK  ──┐     ┌──────┐     ┌──────┐
       │     │      │     │      │
       └─────┘      └─────┘      └─────
       |←──── Tclk = 3.6 ns ────→|

FF1 Q  ──────┤←Tcq→├──────────────────
             0.5ns  │←── Tcomb ──→│
                    │   2.8 ns    │
FF2 D  ─────────────┤            ├────
                                 │←Tsu→│
                                  0.3ns

**Summary:** - $f_{max} = 277.8$ MHz - Hold time: satisfied (margin = 0.5 + 0 - 0.1 = 0.4 ns minimum)

Problem 10

A digital system has three pipeline stages with the following delays:

Stage 1: $T_{cq}$ + 4.0 ns combinational
Stage 2: $T_{cq}$ + 6.5 ns combinational
Stage 3: $T_{cq}$ + 3.0 ns combinational

Flip-flop parameters: $T_{cq}$ = 0.4 ns, $T_{setup}$ = 0.3 ns.

(a) What is $f_{max}$ for the pipelined design? (b) What would $f_{max}$ be without pipelining (all logic in one stage)? (c) Calculate the throughput improvement from pipelining.

Show Solution

**(a) Pipelined $f_{max}$:** The clock frequency is limited by the **slowest stage**: $$T_{clk} \geq T_{cq} + T_{comb,max} + T_{setup}$$ Stage delays (including $T_{cq}$ and $T_{setup}$): - Stage 1: $0.4 + 4.0 + 0.3 = 4.7$ ns - Stage 2: $0.4 + 6.5 + 0.3 = 7.2$ ns (bottleneck) - Stage 3: $0.4 + 3.0 + 0.3 = 3.7$ ns $$T_{clk,min} = 7.2 \text{ ns}$$ $$f_{max,pipelined} = \frac{1}{7.2 \text{ ns}} = 138.9 \text{ MHz}$$ **(b) Non-pipelined $f_{max}$:** Total combinational delay = $4.0 + 6.5 + 3.0 = 13.5$ ns $$T_{clk} \geq T_{cq} + 13.5 + T_{setup} = 0.4 + 13.5 + 0.3 = 14.2 \text{ ns}$$ $$f_{max,unpipelined} = \frac{1}{14.2 \text{ ns}} = 70.4 \text{ MHz}$$ **(c) Throughput improvement:** Throughput is proportional to clock frequency (one result per clock at steady state): $$\text{Throughput ratio} = \frac{f_{max,pipelined}}{f_{max,unpipelined}} = \frac{138.9}{70.4} = 1.97\times$$ **Pipelining nearly doubles the throughput.** However, latency increases: - Non-pipelined latency: 14.2 ns (1 cycle) - Pipelined latency: $3 \times 7.2 = 21.6$ ns (3 cycles) **Latency increased by a factor of** $21.6 / 14.2 = 1.52\times$. | Metric | Unpipelined | Pipelined | |--------|-------------|-----------| | $f_{max}$ | 70.4 MHz | 138.9 MHz | | Latency | 14.2 ns | 21.6 ns | | Throughput | 70.4 M results/s | 138.9 M results/s |

Problem 11

A system has the following timing parameters:

Clock period: 10 ns
$T_{cq}$ = 0.6 ns
$T_{setup}$ = 0.4 ns
$T_{hold}$ = 0.2 ns
Clock skew between source and destination flip-flop: $\delta$ = 0.5 ns (destination clock arrives late)

(a) What is the maximum allowable combinational delay? (b) Is hold time violated? What is the hold margin?

Show Solution

**(a) Maximum combinational delay with clock skew:** When the destination clock arrives late by $\delta$, the data has more time to propagate (positive skew helps setup): Wait — we must be careful about sign convention. If the destination clock arrives **late**, the data has $\delta$ extra time, which **helps** setup but **hurts** hold. **Setup constraint with skew:** $$T_{cq} + T_{comb} + T_{setup} \leq T_{clk} + \delta$$ (The destination flip-flop samples $\delta$ later, giving more time.) $$T_{comb,max} = T_{clk} + \delta - T_{cq} - T_{setup}$$ $$T_{comb,max} = 10 + 0.5 - 0.6 - 0.4 = 9.5 \text{ ns}$$ **Without skew:** $T_{comb,max} = 10 - 0.6 - 0.4 = 9.0$ ns The positive skew at the destination gives 0.5 ns more margin for combinational logic. **(b) Hold time analysis with skew:** **Hold constraint with skew:** $$T_{cq} + T_{comb,min} \geq T_{hold} + \delta$$ (The destination clock arriving late means the new data from the current clock edge could arrive before the destination has finished sampling the old value.) $$T_{comb,min} \geq T_{hold} + \delta - T_{cq} = 0.2 + 0.5 - 0.6 = 0.1 \text{ ns}$$ **Hold margin** = $T_{cq} + T_{comb,min} - T_{hold} - \delta$ If the minimum combinational path delay is 0 ns: $$\text{Hold margin} = 0.6 + 0 - 0.2 - 0.5 = -0.1 \text{ ns}$$ **Hold time IS violated** if there is any direct path with $T_{comb,min} < 0.1$ ns. **Fix:** Insert a buffer (delay element) in the shortest path to add at least 0.1 ns of delay, or reduce clock skew. | Constraint | Without Skew | With Skew ($\delta$ = 0.5 ns) | |-----------|-------------|-------------------------------| | $T_{comb,max}$ | 9.0 ns | 9.5 ns (more margin) | | $T_{comb,min}$ | $\geq$ 0 ns (no issue) | $\geq$ 0.1 ns (potential issue!) | | Setup margin | Helped | Helped | | Hold margin | Fine | Reduced (may violate) |

Problem 12

Calculate the maximum clock frequency for a 16-bit ripple carry adder used in a registered pipeline stage. Each full adder has a gate delay of 2 ns (two gates on the carry path). The flip-flop has $T_{cq}$ = 0.5 ns and $T_{setup}$ = 0.3 ns.

Then calculate $f_{max}$ if the design is changed to a 4-bit carry lookahead adder with four 4-bit CLA blocks and a second-level carry lookahead unit (2 gate levels for CLA + 2 gate levels for second-level group carry).

Show Solution

**Ripple Carry Adder:** Each full adder's carry path delay = 2 ns (two gate levels per bit). For 16 bits, the carry must ripple through all 16 full adders: $$T_{comb,ripple} = 16 \times 2 = 32 \text{ ns}$$ $$T_{clk,min} = T_{cq} + T_{comb} + T_{setup} = 0.5 + 32 + 0.3 = 32.8 \text{ ns}$$ $$f_{max,ripple} = \frac{1}{32.8 \text{ ns}} = 30.5 \text{ MHz}$$ **Carry Lookahead Adder (2-level):** Structure: Four 4-bit CLA blocks + one second-level lookahead unit. Delay breakdown: - Generate G and P from inputs: 1 gate level = 1 ns - First-level CLA (4-bit group G, P): 2 gate levels = 2 ns - Second-level CLA (group carry): 2 gate levels = 2 ns - Carries back to first level for sum: 1 gate level = 1 ns - Final XOR for sum bits: 1 gate level = 1 ns Wait — let me be more precise with the given parameters (2 ns per two gates = 1 ns per gate level): | Stage | Gate levels | Delay | |-------|------------|-------| | Generate $g_i$, $p_i$ from $a_i$, $b_i$ | 1 | 1 ns | | First-level CLA: group G, P | 2 | 2 ns | | Second-level CLA: group carries $C_4, C_8, C_{12}$ | 2 | 2 ns | | Carries distributed back into CLA blocks | 2 | 2 ns | | Final sum XOR | 1 | 1 ns | $$T_{comb,CLA} = 1 + 2 + 2 + 2 + 1 = 8 \text{ ns}$$ $$T_{clk,min} = 0.5 + 8 + 0.3 = 8.8 \text{ ns}$$ $$f_{max,CLA} = \frac{1}{8.8 \text{ ns}} = 113.6 \text{ MHz}$$ **Comparison:** | Adder Type | $T_{comb}$ | $f_{max}$ | Speedup | |-----------|-----------|----------|---------| | 16-bit Ripple | 32.0 ns | 30.5 MHz | 1.0x | | 16-bit 2-level CLA | 8.0 ns | 113.6 MHz | 3.7x | **The CLA is 3.7x faster at the cost of significantly more hardware (carry lookahead logic).**

Section D: Pipelining and Optimization (4 problems)

Problem 13

A non-pipelined combinational circuit has four logic blocks in series with delays: 3 ns, 5 ns, 4 ns, 2 ns. Flip-flop overhead is $T_{cq}$ = 0.3 ns and $T_{setup}$ = 0.2 ns.

(a) Calculate throughput without pipelining. (b) Pipeline the design into 4 stages (one block per stage). Calculate the new throughput and latency. (c) Pipeline into 2 stages by grouping blocks optimally. Calculate throughput and latency.

Show Solution

**Original delays:** Block A = 3 ns, Block B = 5 ns, Block C = 4 ns, Block D = 2 ns **Register overhead per stage:** $T_{cq} + T_{setup} = 0.3 + 0.2 = 0.5$ ns **(a) No pipelining:** $$T_{clk} = T_{cq} + (3 + 5 + 4 + 2) + T_{setup} = 0.3 + 14 + 0.2 = 14.5 \text{ ns}$$ $$\text{Throughput} = \frac{1}{14.5 \text{ ns}} = 69.0 \text{ M results/s}$$ $$\text{Latency} = 14.5 \text{ ns (1 cycle)}$$ **(b) 4-stage pipeline:** Stage delays (each includes register overhead): - Stage 1: $0.3 + 3 + 0.2 = 3.5$ ns - Stage 2: $0.3 + 5 + 0.2 = 5.5$ ns (bottleneck) - Stage 3: $0.3 + 4 + 0.2 = 4.5$ ns - Stage 4: $0.3 + 2 + 0.2 = 2.5$ ns $$T_{clk} = 5.5 \text{ ns (limited by slowest stage)}$$ $$\text{Throughput} = \frac{1}{5.5 \text{ ns}} = 181.8 \text{ M results/s}$$ $$\text{Latency} = 4 \times 5.5 = 22.0 \text{ ns}$$ **Speedup:** $181.8 / 69.0 = 2.63\times$ (not $4\times$ due to unbalanced stages) **(c) 2-stage pipeline (optimal grouping):** We want to split the four blocks into two groups with roughly equal total delay. Total logic delay = 14 ns, ideal split = 7 ns each. | Grouping | Stage 1 | Stage 2 | Bottleneck | |----------|---------|---------|------------| | {A,B} + {C,D} | 3+5 = 8 | 4+2 = 6 | 8.5 ns | | {A} + {B,C,D} | 3 | 5+4+2 = 11 | 11.5 ns | | {A,B,C} + {D} | 3+5+4 = 12 | 2 | 12.5 ns | **Optimal: {A, B} | {C, D}** - Stage 1: $0.3 + 8 + 0.2 = 8.5$ ns - Stage 2: $0.3 + 6 + 0.2 = 6.5$ ns $$T_{clk} = 8.5 \text{ ns}$$ $$\text{Throughput} = \frac{1}{8.5 \text{ ns}} = 117.6 \text{ M results/s}$$ $$\text{Latency} = 2 \times 8.5 = 17.0 \text{ ns}$$ **Summary:** | Configuration | $T_{clk}$ | Throughput | Latency | Speedup | |---------------|----------|------------|---------|---------| | No pipeline | 14.5 ns | 69.0 M/s | 14.5 ns | 1.0x | | 2-stage | 8.5 ns | 117.6 M/s | 17.0 ns | 1.7x | | 4-stage | 5.5 ns | 181.8 M/s | 22.0 ns | 2.6x | **Key insight:** Pipelining improves throughput but increases latency. Unbalanced stages reduce the theoretical speedup (4-stage gives only 2.6x, not 4x).

Problem 14

A 32-bit multiplier has a combinational delay of 20 ns. Registers have $T_{cq}$ = 0.4 ns and $T_{setup}$ = 0.3 ns.

(a) What is $f_{max}$ without pipelining? (b) If the multiplier is split into 4 equal pipeline stages, what is $f_{max}$? (c) What is the pipeline latency? (d) If a feedback path exists (output feeds back to input), how does this affect pipeline throughput?

Show Solution

**(a) Without pipelining:** $$T_{clk} = T_{cq} + 20 + T_{setup} = 0.4 + 20 + 0.3 = 20.7 \text{ ns}$$ $$f_{max} = \frac{1}{20.7 \text{ ns}} = 48.3 \text{ MHz}$$ **(b) With 4 pipeline stages:** Each stage combinational delay: $20 / 4 = 5$ ns $$T_{clk} = T_{cq} + 5 + T_{setup} = 0.4 + 5 + 0.3 = 5.7 \text{ ns}$$ $$f_{max} = \frac{1}{5.7 \text{ ns}} = 175.4 \text{ MHz}$$ **Speedup:** $175.4 / 48.3 = 3.63\times$ **(c) Pipeline latency:** $$\text{Latency} = 4 \times T_{clk} = 4 \times 5.7 = 22.8 \text{ ns}$$ Compare with non-pipelined latency of 20.7 ns. Pipelining adds $22.8 - 20.7 = 2.1$ ns of latency overhead from the 3 extra pipeline registers ($3 \times 0.7 = 2.1$ ns, where 0.7 = $T_{cq} + T_{setup}$ for each added register). **(d) Feedback path impact:**

Input ──→ [Stage1] ──→ [Stage2] ──→ [Stage3] ──→ [Stage4] ──→ Output
                                                                 │
             ┌───────────────────────────────────────────────────┘
             ↓ (feedback)
          [MUX] ←── New Input
             │
             ↓
Input ──→ [Stage1] ...

With feedback, the next computation that depends on the current result must wait for the full pipeline latency (4 cycles) before it can start. This creates a **data hazard**. **Effective throughput with feedback:** - Without feedback: 1 result per cycle = 175.4 M results/s - With feedback (every result depends on previous): 1 result per 4 cycles = $175.4 / 4 = 43.9$ M results/s **The feedback loop completely negates the pipelining benefit for dependent computations.** The throughput with feedback ($43.9$ MHz) is actually slightly worse than the non-pipelined design ($48.3$ MHz) due to pipeline register overhead. **Solution approaches:** Loop unrolling, bypassing/forwarding (if partial results suffice), or accepting the latency penalty.

Problem 15

Compare the area, speed, and power trade-offs for three implementations of an 8-bit adder:

Ripple carry adder (RCA)
Carry lookahead adder (CLA)
Carry select adder (CSLA)

Assume a unit gate delay of 1 ns and each gate costs 1 area unit and consumes 1 power unit per switching event.

Show Solution

**1. Ripple Carry Adder (RCA):** - **Structure:** 8 cascaded full adders - **Delay:** Carry propagates through 8 stages, 2 gate delays each = $8 \times 2 = 16$ gate delays = **16 ns** - **Area:** 8 full adders, each ~5 gates = **40 area units** - **Power:** Only gates on the active carry path switch = low dynamic power. Approximately **40 power units** worst case (all gates switch once) **2. Carry Lookahead Adder (CLA):** - **Structure:** Generate/propagate logic + lookahead carry unit - **Delay:** G,P generation (1) + CLA carry (2) + sum XOR (1) = **4 gate delays = 4 ns** - **Area:** 8 G/P generators (16 gates) + CLA unit (~30 gates for 8-bit) + 8 XOR sum gates = **~54 area units** - **Power:** All carry bits computed simultaneously, more gates switch = **~54 power units** worst case **3. Carry Select Adder (CSLA):** - **Structure:** Two 4-bit RCAs (for carry-in = 0 and 1) + MUX - **Delay:** 4-bit RCA ($4 \times 2 = 8$ delays) + MUX (1 delay) = **9 gate delays = 9 ns** - **Area:** 3 x 4-bit RCA = 60 gates + 4-bit MUX = 12 gates = **~72 area units** - **Power:** Two RCAs compute redundantly = **~72 power units** worst case **Comparison table:** | Metric | RCA | CLA | CSLA | |--------|-----|-----|------| | Delay | 16 ns | 4 ns | 9 ns | | Area | 40 units | 54 units | 72 units | | Power | 40 units | 54 units | 72 units | | Speed rank | 3rd | 1st | 2nd | | Area rank | 1st | 2nd | 3rd | | Power rank | 1st | 2nd | 3rd | **Area-Delay Product (ADP):** A combined metric for efficiency: | Design | ADP | |--------|-----| | RCA | $40 \times 16 = 640$ | | CLA | $54 \times 4 = 216$ | | CSLA | $72 \times 9 = 648$ | **CLA has the best area-delay product,** making it the most efficient trade-off. RCA is smallest but slowest. CSLA is largest but offers a middle-ground speed. **When to use each:** - **RCA:** Area-critical, low-speed designs - **CLA:** Speed-critical designs where area overhead is acceptable - **CSLA:** Moderate speed improvement needed with simpler design than full CLA

Problem 16

A design must process 200 million samples per second. The combinational logic takes 12 ns. Registers have $T_{cq}$ = 0.3 ns and $T_{setup}$ = 0.2 ns. Determine the minimum number of pipeline stages required and show how to partition the logic.

Show Solution

**Required throughput:** 200 M samples/s $$T_{clk,required} = \frac{1}{200 \times 10^6} = 5.0 \text{ ns}$$ **Register overhead per stage:** $T_{cq} + T_{setup} = 0.3 + 0.2 = 0.5$ ns **Single stage (no pipelining):** $$T_{clk} = 0.3 + 12 + 0.2 = 12.5 \text{ ns} \Rightarrow 80 \text{ MHz (too slow)}$$ **Minimum pipeline stages calculation:** For $N$ stages, assuming perfectly balanced logic: $$T_{clk} = T_{cq} + \frac{12}{N} + T_{setup} = 0.5 + \frac{12}{N}$$ Set $T_{clk} \leq 5.0$ ns: $$0.5 + \frac{12}{N} \leq 5.0$$ $$\frac{12}{N} \leq 4.5$$ $$N \geq \frac{12}{4.5} = 2.67$$ **Minimum stages: $N = 3$** **Verification with 3 stages:** Each stage logic delay: $12 / 3 = 4.0$ ns $$T_{clk} = 0.3 + 4.0 + 0.2 = 4.5 \text{ ns}$$ $$f_{max} = \frac{1}{4.5 \text{ ns}} = 222.2 \text{ MHz} > 200 \text{ MHz} \checkmark$$ **Check $N = 2$:** $$T_{clk} = 0.5 + 6.0 = 6.5 \text{ ns} \Rightarrow 153.8 \text{ MHz} < 200 \text{ MHz} \times$$ Two stages is insufficient. **Partitioning the logic:** If the 12 ns logic consists of sub-blocks, we partition into 3 groups each totaling 4 ns:

Input ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ Output
          Stage 1                    Stage 2                    Stage 3

**Pipeline metrics:** | Metric | Value | |--------|-------| | Clock period | 4.5 ns | | $f_{max}$ | 222.2 MHz | | Throughput | 222.2 M samples/s | | Latency | $3 \times 4.5 = 13.5$ ns | | Pipeline registers added | 2 (between stages) | **Note:** If the logic cannot be split into exactly equal groups, the bottleneck stage determines $T_{clk}$. In that case, more stages may be needed to meet the 200 MHz target with unbalanced partitioning.

Section E: System Design Problems (4 problems)

Problem 17

Design a UART transmitter that sends 8-bit data with 1 start bit, 8 data bits (LSB first), and 1 stop bit. The baud rate is 9600. The system clock is 50 MHz. Show the controller FSM and datapath.

Show Solution

**UART frame format:**

IDLE(1) | START(0) | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | STOP(1) | IDLE(1)...

**Baud rate clock generation:** Clock divider count: $\frac{50 \times 10^6}{9600} = 5208.33 \approx 5208$ A counter counts from 0 to 5207 and generates a BAUD_TICK pulse. **Datapath:**

DATA_IN[7:0] ──→ [Shift Register (8-bit, PISO)] ──→ TX_BIT
                      ↑         ↑                       ↑
                    LOAD      SHIFT                   [MUX]
                                                      ↑   ↑
                                                 TX_SEL[1:0]
                                                    │
Sources: 0 (start), shift_out (data), 1 (stop/idle)

**Datapath components:** | Component | Size | Function | |-----------|------|----------| | Baud counter | 13-bit | Divides 50 MHz to 9600 Hz | | Bit counter | 4-bit | Counts 0-9 (start + 8 data + stop) | | Shift register | 8-bit PISO | Holds data, shifts out LSB first | | Output MUX | 3-to-1 | Selects start/data/stop bit | **Controller FSM:**

┌──────────┐
│   IDLE   │ ←─────────────────────────────────┐
│  TX = 1  │                                    │
└────┬─────┘                                    │
     │ SEND = 1                                 │
     ↓                                          │
┌──────────┐                                    │
│  START   │  TX = 0, LOAD shift reg            │
│          │  Wait for BAUD_TICK                 │
└────┬─────┘                                    │
     │ BAUD_TICK                                │
     ↓                                          │
┌──────────┐                                    │
│   DATA   │  TX = shift_out                    │
│          │  On BAUD_TICK: shift, bit_cnt++    │
│          │  When bit_cnt = 8: go to STOP      │
└────┬─────┘                                    │
     │ bit_cnt = 8 AND BAUD_TICK                │
     ↓                                          │
┌──────────┐                                    │
│   STOP   │  TX = 1                            │
│          │  Wait for BAUD_TICK                 │
└────┬─────┘                                    │
     │ BAUD_TICK                                │
     └──────────────────────────────────────────┘

**State encoding and control signals:** | State | TX_SEL | LOAD | SHIFT | COUNT_EN | BUSY | |-------|--------|------|-------|----------|------| | IDLE | 10 (high) | 0 | 0 | 0 | 0 | | START | 00 (low) | 1 | 0 | 0 | 1 | | DATA | 01 (shift_out) | 0 | BAUD_TICK | BAUD_TICK | 1 | | STOP | 10 (high) | 0 | 0 | 0 | 1 | **Timing for one byte at 9600 baud:** $$T_{byte} = 10 \text{ bits} \times \frac{1}{9600} = 1.042 \text{ ms}$$ Maximum data rate: $9600 / 10 = 960$ bytes/s.

Problem 18

Design a simple vending machine controller that:

Accepts nickels (5 cents) and dimes (10 cents)
Item costs 25 cents
Dispenses when exact amount or overpayment is reached
Returns to idle after dispensing (no change given)

Provide the complete state diagram, state table, state encoding, and next-state equations.

Show Solution

**States (based on amount collected):** | State | Amount | Encoding | |-------|--------|----------| | S0 | 0 cents | 000 | | S5 | 5 cents | 001 | | S10 | 10 cents | 010 | | S15 | 15 cents | 011 | | S20 | 20 cents | 100 | | S25 | 25+ cents (dispense) | 101 | **Inputs:** N (nickel inserted), D (dime inserted). Assume N and D are mutually exclusive pulses. **Output:** DISP (dispense item) **State diagram:**

S0 ──N──→ S5 ──N──→ S10 ──N──→ S15 ──N──→ S20 ──N──→ S25
│         │          │          │          │
└──D──→ S10   └──D──→ S15  └──D──→ S20  └──D──→ S25  └──D──→ S25
                                                        │
S25: DISP=1, then return to S0 ←────────────────────────┘

**Complete state table:** | Present | N | D | Next | DISP | |---------|---|---|------|------| | S0 (000) | 0 | 0 | S0 | 0 | | S0 | 1 | 0 | S5 | 0 | | S0 | 0 | 1 | S10 | 0 | | S5 (001) | 0 | 0 | S5 | 0 | | S5 | 1 | 0 | S10 | 0 | | S5 | 0 | 1 | S15 | 0 | | S10 (010) | 0 | 0 | S10 | 0 | | S10 | 1 | 0 | S15 | 0 | | S10 | 0 | 1 | S20 | 0 | | S15 (011) | 0 | 0 | S15 | 0 | | S15 | 1 | 0 | S20 | 0 | | S15 | 0 | 1 | S25 | 0 | | S20 (100) | 0 | 0 | S20 | 0 | | S20 | 1 | 0 | S25 | 0 | | S20 | 0 | 1 | S25 | 0 | | S25 (101) | X | X | S0 | 1 | **Next-state equations (from K-maps):** Let the state bits be $Q_2 Q_1 Q_0$ and don't-care states are 110, 111. $D_2 = Q_1 Q_0 N + Q_1 Q_0 D + Q_2 \overline{Q_0} \overline{N} \overline{D} + Q_2 \overline{Q_0} N$ Simplified (using don't cares for 110, 111): $$D_2 = Q_1 Q_0 (N + D) + Q_2 \overline{Q_0}$$ $$D_1 = \overline{Q_2} \overline{Q_1} Q_0 N + \overline{Q_2} Q_1 \overline{Q_0} \overline{N}\overline{D} + \overline{Q_2} \overline{Q_1} D + \overline{Q_2} Q_1 \overline{Q_0} N + \overline{Q_2} Q_1 Q_0 \overline{N}\overline{D}$$ Simplified: $$D_1 = \overline{Q_2}(\overline{Q_1}Q_0 N + \overline{Q_1}D + Q_1\overline{N}\overline{D} + Q_1\overline{Q_0}N)$$ $$D_0 = \overline{Q_2}\overline{Q_0}N + \overline{Q_2}Q_0\overline{N}\overline{D} + \overline{Q_2}\overline{Q_1}Q_0 D$$ Simplified: $$D_0 = \overline{Q_2}(N \oplus Q_0) \cdot \text{(with correction for dime cases)}$$ **Output:** $\text{DISP} = Q_2 Q_0$ (DISP = 1 only in state S25 = 101)

Problem 19

A self-checking testbench must verify a 4-bit counter. Describe the testbench architecture, the test vectors needed, and how to implement automatic checking in a hardware description approach.

Show Solution

**Testbench architecture:**

┌──────────────────────────────────────────────────┐
│                  TESTBENCH                        │
│                                                  │
│  [Stimulus       [Device        [Response         │
│   Generator] ──→  Under    ──→  Checker]          │
│       │          Test (DUT)]       │              │
│       │           4-bit           │              │
│       │          counter          │              │
│  CLK, RST,                   Expected vs         │
│  EN signals                  Actual comparison    │
│                                   │              │
│                              PASS/FAIL           │
└──────────────────────────────────────────────────┘

**Test categories and vectors:** **1. Reset test:** | Test | CLK | RST | EN | Expected Q | |------|-----|-----|----|-----------| | Assert reset | rising | 1 | X | 0000 | | Hold reset | rising | 1 | 1 | 0000 | | Release reset | rising | 0 | 0 | 0000 | **2. Normal counting (16 cycles):** | Cycle | RST | EN | Expected Q | |-------|-----|----|-----------| | 1 | 0 | 1 | 0001 | | 2 | 0 | 1 | 0010 | | ... | ... | ... | ... | | 15 | 0 | 1 | 1111 | | 16 | 0 | 1 | 0000 (rollover) | **3. Enable control:** | Test | RST | EN | Expected | |------|-----|----|----------| | Count to 5 | 0 | 1 | 0101 | | Disable | 0 | 0 | 0101 (hold) | | Disable again | 0 | 0 | 0101 (hold) | | Re-enable | 0 | 1 | 0110 (resume) | **4. Reset during count:** | Test | RST | EN | Expected | |------|-----|----|----------| | Count to 7 | 0 | 1 | 0111 | | Assert reset | 1 | 1 | 0000 | | Release, count | 0 | 1 | 0001 | **Self-checking mechanism (pseudocode):**

expected_count = 0

for each test_cycle:
    apply(CLK, RST, EN)
    wait(clock_edge)

    // Compute expected value
    if (RST == 1):
        expected_count = 0
    else if (EN == 1):
        expected_count = (expected_count + 1) mod 16

    // Automatic check
    if (DUT.Q != expected_count):
        report ERROR at cycle number
        error_count++

// Final report
if (error_count == 0):
    report "ALL TESTS PASSED"
else:
    report "FAILED: {error_count} errors"

**Comprehensive test vector count:** | Category | Vectors | Purpose | |----------|---------|---------| | Reset | 3 | Verify reset behavior | | Full count | 17 | All states + rollover | | Enable toggle | 6 | Hold and resume | | Mid-count reset | 4 | Reset from non-zero | | Corner cases | 4 | Rollover with EN toggle | | **Total** | **~34** | **Complete coverage** | **Design for testability tips:** - Include a counter output observation point (no buried states) - Add a synchronous reset (easier to test than async) - Include a terminal count (TC) output and verify it at state 1111 - Test at least two full counting sequences to verify rollover

Problem 20

A complete digital system design flows from specification to implementation. List and describe each step in the design flow. For a 4-bit ALU with ADD, SUB, AND, OR, XOR operations, give a concrete example of what happens at each step.

Show Solution

**Digital design flow — 8 steps:** **Step 1: Specification** Define what the system does, its inputs, outputs, and constraints. *4-bit ALU example:* - Inputs: A[3:0], B[3:0], Op[2:0] - Outputs: Result[3:0], Carry_out, Zero_flag - Operations: ADD (000), SUB (001), AND (010), OR (011), XOR (100) - Constraint: Combinational, max delay < 15 ns **Step 2: Architecture / Top-down decomposition** Break the system into modules and define their interfaces. *4-bit ALU example:*

ALU
├── Adder/Subtractor unit (handles ADD, SUB)
│   ├── XOR array (B complement for SUB)
│   └── 4-bit adder (with carry-in for SUB)
├── Logic unit (handles AND, OR, XOR)
│   ├── AND array
│   ├── OR array
│   └── XOR array
├── Output MUX (selects result based on Op)
└── Flag generator (Zero, Carry)

**Step 3: Detailed design / RTL coding** Implement each module at the register-transfer level. *Example — Output MUX logic:* | Op[2:0] | Result | |---------|--------| | 000 | A + B | | 001 | A - B | | 010 | A AND B | | 011 | A OR B | | 100 | A XOR B | Zero_flag = (Result == 0000) **Step 4: Functional simulation** Verify the design produces correct outputs for all operations. *Example test vectors:* | A | B | Op | Expected Result | Expected Carry | |---|---|----|-----------------|----------------| | 0011 | 0001 | ADD | 0100 | 0 | | 0011 | 0001 | SUB | 0010 | 0 | | 1111 | 0001 | ADD | 0000 | 1 | | 1010 | 1100 | AND | 1000 | - | | 1010 | 1100 | OR | 1110 | - | | 1010 | 1100 | XOR | 0110 | - | | 0000 | 0000 | ADD | 0000 (Z=1) | 0 | **Step 5: Synthesis** Convert RTL to gate-level netlist targeting specific technology. *Example output:* The adder synthesizes to a ripple carry adder using AND, OR, XOR gates from the target library. Total: ~35 gates. **Step 6: Static timing analysis** Verify all paths meet timing constraints. *Example:* - Critical path: A input through carry chain to Result[3] - Path delay: 4 XOR + 4 AND + 3 OR = 11 gate delays - At 1 ns/gate: 11 ns < 15 ns requirement. **PASS.** **Step 7: Place and route (for FPGA/ASIC)** Map gates to physical locations and create interconnect wiring. *Example:* The ALU uses 22 LUTs and 0 flip-flops on an FPGA. Routing adds 2 ns interconnect delay. Total delay: 13 ns < 15 ns. **PASS.** **Step 8: Post-layout verification and testing** Final simulation with actual delays, then physical testing. *Example:* Post-layout simulation confirms all 112 test vectors ($7 \text{ ops} \times 16 \text{ input combos}$) pass with worst-case delay of 13.4 ns. Design is programmed onto FPGA and verified with logic analyzer. **Summary of flow:** | Step | Activity | Output | |------|----------|--------| | 1 | Specification | Requirements document | | 2 | Architecture | Block diagram, interfaces | | 3 | RTL Design | HDL code or schematic | | 4 | Functional Simulation | Test results (pass/fail) | | 5 | Synthesis | Gate-level netlist | | 6 | Timing Analysis | Timing report | | 7 | Place and Route | Layout / bitstream | | 8 | Verification and Test | Working hardware |

Problems Summary

Section	Topics Covered	Problem Count
A	Top-Down Design and Modularity	4
B	Datapath and Controller Design	4
C	Timing Analysis	4
D	Pipelining and Optimization	4
E	System Design Problems	4
Total		20