Skip to content

End-of-Unit Problems: System Integration

Work through these problems to reinforce your understanding of top-down design, datapath-controller partitioning, timing analysis, pipelining, and system-level design trade-offs.


Section A: Top-Down Design and Modularity (4 problems)

Problem 1

A digital combination lock accepts a 4-digit code (each digit 0-9) entered sequentially. Decompose the system using top-down design into hierarchical modules. Identify each module, its inputs, outputs, and its role in the system.

Show Solution
**Level 0 — Top-level system:**
┌─────────────────────────────────────────────┐
│            Combination Lock System           │
│                                              │
│  Keypad ──→ [Controller] ──→ Lock Actuator   │
│              ↕                                │
│         [Code Memory]                        │
└─────────────────────────────────────────────┘
**Level 1 — Subsystem decomposition:** | Module | Inputs | Outputs | Function | |--------|--------|---------|----------| | Keypad Encoder | 10 key lines | 4-bit BCD digit, KEY_VALID | Encodes pressed key to binary | | Digit Comparator | entered digit, stored digit | MATCH | Compares one digit against stored code | | Sequence Counter | KEY_VALID, RESET | digit_index[1:0] | Tracks which digit (0-3) is being entered | | Code Memory | digit_index | stored_digit[3:0] | Stores the 4-digit secret code | | Main Controller (FSM) | MATCH, digit_index, KEY_VALID | UNLOCK, RESET, ERROR | Orchestrates the verification sequence | | Lock Actuator | UNLOCK | physical lock signal | Drives the solenoid/motor | | Timeout Timer | CLK, RESET | TIMEOUT | Resets system after inactivity | **Level 2 — Detailed module breakdown:** - **Keypad Encoder:** 4x3 matrix scanner + priority encoder + debounce circuit - **Main Controller FSM states:** IDLE, CHECK_D0, CHECK_D1, CHECK_D2, CHECK_D3, UNLOCK, ERROR_LOCKOUT - **Code Memory:** 4 x 4-bit register file (can be ROM for fixed code or RAM for programmable code) **Design hierarchy:**
Combination Lock System
├── Keypad Encoder
│   ├── Matrix Scanner
│   ├── Priority Encoder
│   └── Debounce Circuit
├── Main Controller FSM
├── Sequence Counter (2-bit)
├── Code Memory (4 x 4-bit)
├── Digit Comparator (4-bit equality)
├── Timeout Timer (counter)
└── Lock Actuator Driver
**Key design principle:** Each module has a well-defined interface. The controller FSM coordinates all modules without knowing their internal implementation.

Problem 2

Explain the difference between structural and behavioral decomposition in top-down design. For a 4-bit ALU that supports ADD, SUB, AND, OR, give both decompositions.

Show Solution
**Structural decomposition** breaks the system into physical components and their interconnections. **Behavioral decomposition** breaks the system into functional operations without specifying implementation. **Behavioral decomposition of 4-bit ALU:**
4-bit ALU
├── Addition operation (A + B)
├── Subtraction operation (A - B)
├── Bitwise AND operation (A & B)
└── Bitwise OR operation (A | B)
Each operation is described by what it does, not how it is built. **Structural decomposition of 4-bit ALU:**
4-bit ALU
├── 4-bit Adder/Subtractor
│   ├── XOR array (4 gates, for B complement)
│   ├── 4-bit Ripple Carry Adder
│   │   ├── Full Adder bit 0
│   │   ├── Full Adder bit 1
│   │   ├── Full Adder bit 2
│   │   └── Full Adder bit 3
│   └── Carry-in = SUB control
├── AND array (4 AND gates)
├── OR array (4 OR gates)
└── Output MUX (4-bit, 4-to-1)
    └── Select = Op[1:0]
**Block diagram (structural):**
A[3:0] ──┬──────────────────────┬──────┬──────┐
         │                      │      │      │
B[3:0] ──┼──[XOR w/ SUB]──[Adder]  [AND]  [OR]│
         │        │           │      │      │  │
         │        └───────────┼──────┼──────┼──┘
         │                    ↓      ↓      ↓
         │              ┌─────────────────────┐
Op[1:0]──┴─────────────→│    4-to-1 MUX       │
                        └─────────┬───────────┘
                                  ↓
                              Result[3:0]
| Op[1:0] | Operation | MUX selects | |---------|-----------|-------------| | 00 | ADD | Adder output | | 01 | SUB | Adder output (B complemented, Cin=1) | | 10 | AND | AND array output | | 11 | OR | OR array output | **Key difference:** Behavioral decomposition is technology-independent and focuses on specification. Structural decomposition commits to a specific implementation architecture.

Problem 3

A digital system must be designed with the following specifications. Identify all modules needed and draw the top-level block diagram showing interconnections.

  • Read temperature from an 8-bit ADC every 100 ms
  • Compare temperature against a programmable threshold
  • Activate a cooling fan if temperature exceeds the threshold
  • Display current temperature on a 2-digit 7-segment display
Show Solution
**Module identification:** | Module | Type | Function | |--------|------|----------| | Clock Divider | Counter | Generates 10 Hz sample signal from system clock | | ADC Interface | Controller | Manages ADC read timing and handshake | | Threshold Register | Register | Stores programmable 8-bit threshold | | Comparator | Combinational | Compares ADC value > threshold | | Fan Controller | FSM | Manages fan on/off with hysteresis | | Binary-to-BCD | Combinational | Converts 8-bit binary to 2-digit BCD | | 7-Segment Decoder | Combinational | Converts BCD to segment patterns | | Display MUX | Counter + MUX | Time-multiplexes two digit displays | | Main Controller | FSM | Coordinates sampling and data flow | **Top-level block diagram:**
System CLK ──→ [Clock Divider] ──→ SAMPLE_TICK
                                       │
                    ┌──────────────────┘
                    ↓
ADC_DATA[7:0] ←── [ADC Interface] ←── ADC_BUSY
ADC_START     ──→       │
                        ↓
                 TEMP[7:0] ──┬──→ [Comparator] ──→ OVER_TEMP
                             │         ↑
                             │    [Threshold Reg] ←── THRESH_IN[7:0]
                             │                    ←── LOAD_THRESH
                             │
                             │    [Fan Controller] ←── OVER_TEMP
                             │         │
                             │         └──→ FAN_ON
                             │
                             └──→ [Bin-to-BCD] ──→ BCD_TENS[3:0]
                                       │           BCD_ONES[3:0]
                                       ↓
                                 [7-Seg Decode] ──→ SEGMENTS[6:0]
                                       │
                                 [Display MUX] ──→ DIGIT_SEL
**Main controller FSM states:** - IDLE: Wait for SAMPLE_TICK - START_ADC: Assert ADC_START - WAIT_ADC: Wait for ADC_BUSY to deassert - READ_DATA: Latch ADC_DATA into TEMP register - UPDATE: Trigger comparison and display update - Return to IDLE **Interface signals between modules are clearly defined,** allowing each module to be designed and tested independently.

Problem 4

What are the three key principles of modular design? For each principle, give a digital design example showing how violating it leads to problems.

Show Solution
**The three key principles:** **1. Well-defined interfaces** Each module communicates only through declared input/output ports with specified data types and timing. - **Good:** ALU has inputs A[7:0], B[7:0], Op[1:0] and output Result[7:0], Carry_out - **Violation:** Module A directly reads internal flip-flop values of Module B rather than using output ports - **Problem:** Changing Module B's internal implementation breaks Module A. No isolation between modules. **2. Encapsulation (information hiding)** Internal implementation details are hidden; only the interface is visible to other modules. - **Good:** A counter module exports COUNT[3:0] and TC (terminal count); its internal carry chain is hidden - **Violation:** The top-level design routes carry signals directly between internal full adders of separate counter modules - **Problem:** Cannot replace the ripple counter with a faster carry-lookahead counter without redesigning the entire system. **3. Single responsibility** Each module performs one clearly defined function. - **Good:** Separate modules for keyboard decoding, FSM control, and display driving - **Violation:** One monolithic FSM handles keyboard scanning, sequence checking, display multiplexing, and timeout simultaneously - **Problem:** The FSM state space explodes combinatorially. A 4-state keyboard scanner x 5-state lock controller x 3-state display driver = 60 states in a single FSM instead of three small FSMs (4 + 5 + 3 = 12 states total). **Quantitative example of the state explosion:** | Design approach | FSM states | Flip-flops needed | |----------------|-----------|-------------------| | Single monolithic FSM | $4 \times 5 \times 3 = 60$ | $\lceil\log_2 60\rceil = 6$ | | Three separate FSMs | $4 + 5 + 3 = 12$ | $2 + 3 + 2 = 7$ flip-flops, but far simpler logic | Although the monolithic FSM uses one fewer flip-flop, its next-state logic is vastly more complex and nearly impossible to debug.

Section B: Datapath and Controller Design (4 problems)

Problem 5

Explain the datapath-controller partitioning approach. For a circuit that computes \(Y = A \times B + C\) where A, B, C are 8-bit inputs, design the datapath and controller separately.

Show Solution
**Datapath-controller partitioning:** - **Datapath:** Contains the components that store, transform, and route data (registers, ALUs, multiplexers, buses) - **Controller:** An FSM that generates control signals to orchestrate datapath operations each clock cycle - **Separation:** The controller does not process data; the datapath does not make sequencing decisions **Datapath design for $Y = A \times B + C$:**
A[7:0] ──→ [Reg_A] ──→ ┐
                        ├──→ [8x8 Multiplier] ──→ [Reg_P] ──→ ┐
B[7:0] ──→ [Reg_B] ──→ ┘                                      ├──→ [16-bit Adder] ──→ [Reg_Y]
                                                               │                        │
C[7:0] ──→ [Reg_C] ──────────────→ (zero-extended to 16 bits) ┘                        ↓
                                                                                    Y[15:0]
**Datapath control signals:** | Signal | Function | |--------|----------| | LD_A | Load register A | | LD_B | Load register B | | LD_C | Load register C | | LD_P | Load product register | | LD_Y | Load result register | **Controller FSM:**
┌──────────────────────────────────────────┐
│  S0: IDLE          (wait for START)      │
│  ──→ if START: go to S1                  │
├──────────────────────────────────────────┤
│  S1: LOAD          assert LD_A,LD_B,LD_C │
│  ──→ go to S2                            │
├──────────────────────────────────────────┤
│  S2: MULTIPLY      (wait for mult done)  │
│  ──→ if MULT_DONE: assert LD_P, go to S3│
├──────────────────────────────────────────┤
│  S3: ADD           assert LD_Y           │
│  ──→ go to S4                            │
├──────────────────────────────────────────┤
│  S4: DONE          assert RESULT_VALID   │
│  ──→ go to S0                            │
└──────────────────────────────────────────┘
**Control signal table:** | State | LD_A | LD_B | LD_C | LD_P | LD_Y | RESULT_VALID | |-------|------|------|------|------|------|-------------| | S0 | 0 | 0 | 0 | 0 | 0 | 0 | | S1 | 1 | 1 | 1 | 0 | 0 | 0 | | S2 | 0 | 0 | 0 | 1* | 0 | 0 | | S3 | 0 | 0 | 0 | 0 | 1 | 0 | | S4 | 0 | 0 | 0 | 0 | 0 | 1 | *LD_P asserted only when MULT_DONE = 1 **Latency:** 4 clock cycles (load + multiply + add + done)

Problem 6

Design the datapath and controller for a circuit that finds the maximum of three 8-bit unsigned numbers A, B, and C. Show the ASM (Algorithmic State Machine) chart.

Show Solution
**Algorithm:** 1. Load A, B, C 2. Compare A and B; keep the larger as TEMP 3. Compare TEMP and C; keep the larger as MAX **Datapath:**
A[7:0]──→[Reg_A]──→┐                      ┌──→[Reg_MAX]──→ MAX[7:0]
                    ├──→[8-bit Comparator]──┤
B[7:0]──→[Reg_B]──→┤   + 2-to-1 MUX       │
                    │        ↑              │
C[7:0]──→[Reg_C]───┘    SEL_AB, SEL_MC     │
                                            │
                [2:1 MUX] ←─────────────────┘
                   ↑
                SEL_MC
**Refined datapath with two comparisons:**
A ──→ [Reg_A] ──→ ┐
                  ├──→ [CMP1: A>B?] ──→ A_GT_B (status to controller)
B ──→ [Reg_B] ──→ ┘
                  ┌──→ [MUX1] ──→ TEMP
Reg_A output ─────┤      ↑
Reg_B output ─────┘   SEL_AB (from controller)

TEMP ─────────────┐
                  ├──→ [CMP2: TEMP>C?] ──→ T_GT_C (status to controller)
C ──→ [Reg_C] ──→ ┘
                  ┌──→ [MUX2] ──→ [Reg_MAX]
TEMP ─────────────┤      ↑
Reg_C output ─────┘   SEL_MC (from controller)
**ASM Chart:**
┌──────────┐
│ S0: IDLE │ ──── START=0? ──→ (stay)
└────┬─────┘
     │ START=1
     ↓
┌──────────┐
│ S1: LOAD │  LD_A=1, LD_B=1, LD_C=1
└────┬─────┘
     ↓
┌──────────┐
│ S2: CMP1 │  Compare A, B
└────┬─────┘
     │
    ◇ A_GT_B?
   / \
  Y   N
  │   │
  ↓   ↓
SEL_AB=0  SEL_AB=1
(pick A)  (pick B)
  │   │
  └─┬─┘
    ↓
┌──────────┐
│ S3: CMP2 │  Compare TEMP, C
└────┬─────┘
     │
    ◇ T_GT_C?
   / \
  Y   N
  │   │
  ↓   ↓
SEL_MC=0  SEL_MC=1
(pick T)  (pick C)
  │   │
  └─┬─┘
    ↓
┌──────────┐
│ S4: DONE │  LD_MAX=1, VALID=1
└────┬─────┘
     ↓
   (go to S0)
**Controller outputs per state:** | State | LD_A | LD_B | LD_C | SEL_AB | SEL_MC | LD_MAX | VALID | |-------|------|------|------|--------|--------|--------|-------| | S0 | 0 | 0 | 0 | - | - | 0 | 0 | | S1 | 1 | 1 | 1 | - | - | 0 | 0 | | S2 | 0 | 0 | 0 | A_GT_B' | - | 0 | 0 | | S3 | 0 | 0 | 0 | - | T_GT_C' | 0 | 0 | | S4 | 0 | 0 | 0 | - | - | 1 | 1 | **Total latency:** 4 clock cycles from START to VALID.

Problem 7

A datapath for a simple accumulator processor includes: an 8-bit register (ACC), an 8-bit adder, and a 2-to-1 MUX. The processor supports three operations: LOAD (ACC = input), ADD (ACC = ACC + input), CLEAR (ACC = 0). Design the datapath and the controller.

Show Solution
**Datapath:**
DATA_IN[7:0] ──→ ┐
                 ├──→ [2:1 MUX] ──→ [8-bit Adder] ──→ [ACC Register]──┬──→ ACC_OUT[7:0]
0x00 ────────────┘        ↑              ↑                │             │
                       MUX_SEL        B input             │             │
                                         │                └─────────────┘
                                         └────────────────(feedback from ACC)
**Refined datapath with all three operations:**
                    MUX_A_SEL
                       ↓
DATA_IN ──→ [MUX_A] ──→ A ──→ [8-bit Adder] ──→ SUM ──→ [MUX_R] ──→ [ACC]──→ ACC_OUT
0x00 ──────→           B ←──────────────────────── ACC     ↑
                                                        MUX_R_SEL
                                                        LD_ACC
**Control signal encoding:** | Operation | MUX_A_SEL | MUX_R_SEL | LD_ACC | Effect | |-----------|-----------|-----------|--------|--------| | NOP | X | X | 0 | ACC unchanged | | LOAD | - | 0 | 1 | ACC = DATA_IN (MUX_R selects DATA_IN directly) | | ADD | 0 | 1 | 1 | ACC = ACC + DATA_IN (MUX_R selects adder output) | | CLEAR | 1 | 1 | 1 | ACC = ACC + 0 ... | **Simpler approach — single MUX at ACC input:**
DATA_IN[7:0]──────────────────→ ┐
(ACC + DATA_IN) from adder ───→ ├──→ [4:1 MUX] ──→ D ──→ [ACC Reg] ──→ ACC_OUT
0x00 (constant zero) ─────────→ ┤        ↑                    │
ACC_OUT (hold) ───────────────→ ┘     SEL[1:0]                │
                                                               │
ACC_OUT ──→ [Adder] ←── DATA_IN                               │
              │                                                │
              └────────(ACC + DATA_IN)                         │
                                                               │
              └────────────────────────────────────────────────┘
| SEL[1:0] | MUX Output | Operation | |----------|------------|-----------| | 00 | DATA_IN | LOAD | | 01 | ACC + DATA_IN | ADD | | 10 | 0x00 | CLEAR | | 11 | ACC_OUT | NOP (hold) | **Controller FSM:**
┌────────┐    OP = LOAD     ┌──────────┐
│  IDLE  │ ──────────────→  │  EXECUTE  │
│        │    OP = ADD      │           │
│        │ ──────────────→  │  SEL =    │
│        │    OP = CLR      │  f(OP)    │
│        │ ──────────────→  │  LD_ACC=1 │
└────────┘                  └─────┬─────┘
     ↑                            │
     └────────────────────────────┘
For a single-cycle implementation, the controller is purely combinational: - SEL[1:0] = OP[1:0] - LD_ACC = (OP != NOP)

Problem 8

Design the datapath and controller for a binary search circuit that finds whether a value KEY exists in a sorted 8-element array stored in a register file. Each element is 8 bits. Show the datapath components, control signals, and FSM state diagram.

Show Solution
**Algorithm (binary search):** 1. Set LOW = 0, HIGH = 7 2. Compute MID = (LOW + HIGH) / 2 3. Read ARRAY[MID] 4. If ARRAY[MID] = KEY: FOUND 5. If ARRAY[MID] < KEY: LOW = MID + 1 6. If ARRAY[MID] > KEY: HIGH = MID - 1 7. If LOW > HIGH: NOT_FOUND 8. Repeat from step 2 **Datapath components:**
KEY[7:0] ──→ [Reg_KEY]
                               ┌──→ [Comparator] ──→ EQ, LT, GT
[Reg File: 8 x 8-bit] ──→ DATA_OUT ──┘        ↑
     ↑ ADDR[2:0]                           Reg_KEY output
     │
[MID Register] ──→ ADDR
     ↑
[Adder/Shifter] ──→ (LOW + HIGH) >> 1
     ↑         ↑
[Reg_LOW]   [Reg_HIGH]
     ↑         ↑
MID+1 ──┘   MID-1 ──┘  (via adder with +1/-1)
**Datapath block diagram:**
┌───────────────────────────────────────────────────┐
│                   DATAPATH                        │
│                                                   │
│  [Reg_LOW] ──┬──→ [Adder] ──→ [>>1] ──→ [Reg_MID]│
│  [Reg_HIGH]──┘      ↑                      │     │
│       ↑  ↑        LOW+HIGH               ADDR    │
│     LD_L LD_H                              ↓     │
│       ↑  ↑                          [Reg File]   │
│    MID+1 MID-1                          │        │
│                                    DATA_OUT      │
│                                         ↓        │
│  [Reg_KEY] ──→ [Comparator] ←── DATA_OUT         │
│                  │  │  │                          │
│                 EQ  LT GT  (status to controller) │
│                                                   │
│  [Comparator2: LOW > HIGH?] ──→ DONE_NF          │
└───────────────────────────────────────────────────┘
**Controller FSM:** | State | Actions | Next State | |-------|---------|------------| | S0: INIT | LD_L (LOW=0), LD_H (HIGH=7), LD_KEY | S1 | | S1: CALC_MID | MID = (LOW+HIGH)>>1, LD_MID | S2 | | S2: READ | Read Reg_File[MID] | S3 | | S3: COMPARE | Check comparator outputs | S4 or S5 or S6 or S7 | | S4: FOUND | Assert FOUND=1 | S0 | | S5: GO_HIGH | LOW = MID+1, LD_L | S7 | | S6: GO_LOW | HIGH = MID-1, LD_H | S7 | | S7: CHECK_DONE | If LOW > HIGH: NOT_FOUND, else go to S1 | S1 or S8 | | S8: NOT_FOUND | Assert NOT_FOUND=1 | S0 | **State diagram:**
S0 ──→ S1 ──→ S2 ──→ S3 ──→ EQ? ──→ S4 (FOUND)
                           │
                    LT? ──→ S5 ──→ S7 ──→ LOW>HIGH? ──→ S8 (NOT_FOUND)
                           │              │
                    GT? ──→ S6 ──→ S7     └──→ S1 (repeat)
**Worst-case latency:** $\lceil\log_2 8\rceil = 3$ iterations, each 4-5 clock cycles, so approximately **12-15 clock cycles**.

Section C: Timing Analysis (4 problems)

Problem 9

A synchronous circuit uses D flip-flops with the following parameters:

  • Clock-to-Q delay: \(T_{cq}\) = 0.5 ns
  • Setup time: \(T_{setup}\) = 0.3 ns
  • Hold time: \(T_{hold}\) = 0.1 ns
  • Combinational logic delay: \(T_{comb}\) = 2.8 ns

Calculate the maximum clock frequency \(f_{max}\) and verify the hold time constraint.

Show Solution
**Setup time constraint (determines $f_{max}$):** The data must arrive at the next flip-flop's D input at least $T_{setup}$ before the next clock edge. $$T_{clk} \geq T_{cq} + T_{comb} + T_{setup}$$ $$T_{clk} \geq 0.5 + 2.8 + 0.3 = 3.6 \text{ ns}$$ $$f_{max} = \frac{1}{T_{clk,min}} = \frac{1}{3.6 \text{ ns}} = 277.8 \text{ MHz}$$ **Hold time constraint:** The data at the D input must remain stable for at least $T_{hold}$ after the clock edge. $$T_{cq} + T_{comb,min} \geq T_{hold}$$ The minimum combinational delay path (shortest path between flip-flops): $$T_{comb,min} \geq T_{hold} - T_{cq} = 0.1 - 0.5 = -0.4 \text{ ns}$$ Since $T_{comb,min}$ is always $\geq 0$, the hold constraint is **automatically satisfied** (any non-negative combinational delay meets the requirement). **Timing diagram:**
CLK  ──┐     ┌──────┐     ┌──────┐
       │     │      │     │      │
       └─────┘      └─────┘      └─────
       |←──── Tclk = 3.6 ns ────→|

FF1 Q  ──────┤←Tcq→├──────────────────
             0.5ns  │←── Tcomb ──→│
                    │   2.8 ns    │
FF2 D  ─────────────┤            ├────
                                 │←Tsu→│
                                  0.3ns
**Summary:** - $f_{max} = 277.8$ MHz - Hold time: satisfied (margin = 0.5 + 0 - 0.1 = 0.4 ns minimum)

Problem 10

A digital system has three pipeline stages with the following delays:

  • Stage 1: \(T_{cq}\) + 4.0 ns combinational
  • Stage 2: \(T_{cq}\) + 6.5 ns combinational
  • Stage 3: \(T_{cq}\) + 3.0 ns combinational

Flip-flop parameters: \(T_{cq}\) = 0.4 ns, \(T_{setup}\) = 0.3 ns.

(a) What is \(f_{max}\) for the pipelined design? (b) What would \(f_{max}\) be without pipelining (all logic in one stage)? (c) Calculate the throughput improvement from pipelining.

Show Solution
**(a) Pipelined $f_{max}$:** The clock frequency is limited by the **slowest stage**: $$T_{clk} \geq T_{cq} + T_{comb,max} + T_{setup}$$ Stage delays (including $T_{cq}$ and $T_{setup}$): - Stage 1: $0.4 + 4.0 + 0.3 = 4.7$ ns - Stage 2: $0.4 + 6.5 + 0.3 = 7.2$ ns (bottleneck) - Stage 3: $0.4 + 3.0 + 0.3 = 3.7$ ns $$T_{clk,min} = 7.2 \text{ ns}$$ $$f_{max,pipelined} = \frac{1}{7.2 \text{ ns}} = 138.9 \text{ MHz}$$ **(b) Non-pipelined $f_{max}$:** Total combinational delay = $4.0 + 6.5 + 3.0 = 13.5$ ns $$T_{clk} \geq T_{cq} + 13.5 + T_{setup} = 0.4 + 13.5 + 0.3 = 14.2 \text{ ns}$$ $$f_{max,unpipelined} = \frac{1}{14.2 \text{ ns}} = 70.4 \text{ MHz}$$ **(c) Throughput improvement:** Throughput is proportional to clock frequency (one result per clock at steady state): $$\text{Throughput ratio} = \frac{f_{max,pipelined}}{f_{max,unpipelined}} = \frac{138.9}{70.4} = 1.97\times$$ **Pipelining nearly doubles the throughput.** However, latency increases: - Non-pipelined latency: 14.2 ns (1 cycle) - Pipelined latency: $3 \times 7.2 = 21.6$ ns (3 cycles) **Latency increased by a factor of** $21.6 / 14.2 = 1.52\times$. | Metric | Unpipelined | Pipelined | |--------|-------------|-----------| | $f_{max}$ | 70.4 MHz | 138.9 MHz | | Latency | 14.2 ns | 21.6 ns | | Throughput | 70.4 M results/s | 138.9 M results/s |

Problem 11

A system has the following timing parameters:

  • Clock period: 10 ns
  • \(T_{cq}\) = 0.6 ns
  • \(T_{setup}\) = 0.4 ns
  • \(T_{hold}\) = 0.2 ns
  • Clock skew between source and destination flip-flop: \(\delta\) = 0.5 ns (destination clock arrives late)

(a) What is the maximum allowable combinational delay? (b) Is hold time violated? What is the hold margin?

Show Solution
**(a) Maximum combinational delay with clock skew:** When the destination clock arrives late by $\delta$, the data has more time to propagate (positive skew helps setup): Wait — we must be careful about sign convention. If the destination clock arrives **late**, the data has $\delta$ extra time, which **helps** setup but **hurts** hold. **Setup constraint with skew:** $$T_{cq} + T_{comb} + T_{setup} \leq T_{clk} + \delta$$ (The destination flip-flop samples $\delta$ later, giving more time.) $$T_{comb,max} = T_{clk} + \delta - T_{cq} - T_{setup}$$ $$T_{comb,max} = 10 + 0.5 - 0.6 - 0.4 = 9.5 \text{ ns}$$ **Without skew:** $T_{comb,max} = 10 - 0.6 - 0.4 = 9.0$ ns The positive skew at the destination gives 0.5 ns more margin for combinational logic. **(b) Hold time analysis with skew:** **Hold constraint with skew:** $$T_{cq} + T_{comb,min} \geq T_{hold} + \delta$$ (The destination clock arriving late means the new data from the current clock edge could arrive before the destination has finished sampling the old value.) $$T_{comb,min} \geq T_{hold} + \delta - T_{cq} = 0.2 + 0.5 - 0.6 = 0.1 \text{ ns}$$ **Hold margin** = $T_{cq} + T_{comb,min} - T_{hold} - \delta$ If the minimum combinational path delay is 0 ns: $$\text{Hold margin} = 0.6 + 0 - 0.2 - 0.5 = -0.1 \text{ ns}$$ **Hold time IS violated** if there is any direct path with $T_{comb,min} < 0.1$ ns. **Fix:** Insert a buffer (delay element) in the shortest path to add at least 0.1 ns of delay, or reduce clock skew. | Constraint | Without Skew | With Skew ($\delta$ = 0.5 ns) | |-----------|-------------|-------------------------------| | $T_{comb,max}$ | 9.0 ns | 9.5 ns (more margin) | | $T_{comb,min}$ | $\geq$ 0 ns (no issue) | $\geq$ 0.1 ns (potential issue!) | | Setup margin | Helped | Helped | | Hold margin | Fine | Reduced (may violate) |

Problem 12

Calculate the maximum clock frequency for a 16-bit ripple carry adder used in a registered pipeline stage. Each full adder has a gate delay of 2 ns (two gates on the carry path). The flip-flop has \(T_{cq}\) = 0.5 ns and \(T_{setup}\) = 0.3 ns.

Then calculate \(f_{max}\) if the design is changed to a 4-bit carry lookahead adder with four 4-bit CLA blocks and a second-level carry lookahead unit (2 gate levels for CLA + 2 gate levels for second-level group carry).

Show Solution
**Ripple Carry Adder:** Each full adder's carry path delay = 2 ns (two gate levels per bit). For 16 bits, the carry must ripple through all 16 full adders: $$T_{comb,ripple} = 16 \times 2 = 32 \text{ ns}$$ $$T_{clk,min} = T_{cq} + T_{comb} + T_{setup} = 0.5 + 32 + 0.3 = 32.8 \text{ ns}$$ $$f_{max,ripple} = \frac{1}{32.8 \text{ ns}} = 30.5 \text{ MHz}$$ **Carry Lookahead Adder (2-level):** Structure: Four 4-bit CLA blocks + one second-level lookahead unit. Delay breakdown: - Generate G and P from inputs: 1 gate level = 1 ns - First-level CLA (4-bit group G, P): 2 gate levels = 2 ns - Second-level CLA (group carry): 2 gate levels = 2 ns - Carries back to first level for sum: 1 gate level = 1 ns - Final XOR for sum bits: 1 gate level = 1 ns Wait — let me be more precise with the given parameters (2 ns per two gates = 1 ns per gate level): | Stage | Gate levels | Delay | |-------|------------|-------| | Generate $g_i$, $p_i$ from $a_i$, $b_i$ | 1 | 1 ns | | First-level CLA: group G, P | 2 | 2 ns | | Second-level CLA: group carries $C_4, C_8, C_{12}$ | 2 | 2 ns | | Carries distributed back into CLA blocks | 2 | 2 ns | | Final sum XOR | 1 | 1 ns | $$T_{comb,CLA} = 1 + 2 + 2 + 2 + 1 = 8 \text{ ns}$$ $$T_{clk,min} = 0.5 + 8 + 0.3 = 8.8 \text{ ns}$$ $$f_{max,CLA} = \frac{1}{8.8 \text{ ns}} = 113.6 \text{ MHz}$$ **Comparison:** | Adder Type | $T_{comb}$ | $f_{max}$ | Speedup | |-----------|-----------|----------|---------| | 16-bit Ripple | 32.0 ns | 30.5 MHz | 1.0x | | 16-bit 2-level CLA | 8.0 ns | 113.6 MHz | 3.7x | **The CLA is 3.7x faster at the cost of significantly more hardware (carry lookahead logic).**

Section D: Pipelining and Optimization (4 problems)

Problem 13

A non-pipelined combinational circuit has four logic blocks in series with delays: 3 ns, 5 ns, 4 ns, 2 ns. Flip-flop overhead is \(T_{cq}\) = 0.3 ns and \(T_{setup}\) = 0.2 ns.

(a) Calculate throughput without pipelining. (b) Pipeline the design into 4 stages (one block per stage). Calculate the new throughput and latency. (c) Pipeline into 2 stages by grouping blocks optimally. Calculate throughput and latency.

Show Solution
**Original delays:** Block A = 3 ns, Block B = 5 ns, Block C = 4 ns, Block D = 2 ns **Register overhead per stage:** $T_{cq} + T_{setup} = 0.3 + 0.2 = 0.5$ ns **(a) No pipelining:** $$T_{clk} = T_{cq} + (3 + 5 + 4 + 2) + T_{setup} = 0.3 + 14 + 0.2 = 14.5 \text{ ns}$$ $$\text{Throughput} = \frac{1}{14.5 \text{ ns}} = 69.0 \text{ M results/s}$$ $$\text{Latency} = 14.5 \text{ ns (1 cycle)}$$ **(b) 4-stage pipeline:** Stage delays (each includes register overhead): - Stage 1: $0.3 + 3 + 0.2 = 3.5$ ns - Stage 2: $0.3 + 5 + 0.2 = 5.5$ ns (bottleneck) - Stage 3: $0.3 + 4 + 0.2 = 4.5$ ns - Stage 4: $0.3 + 2 + 0.2 = 2.5$ ns $$T_{clk} = 5.5 \text{ ns (limited by slowest stage)}$$ $$\text{Throughput} = \frac{1}{5.5 \text{ ns}} = 181.8 \text{ M results/s}$$ $$\text{Latency} = 4 \times 5.5 = 22.0 \text{ ns}$$ **Speedup:** $181.8 / 69.0 = 2.63\times$ (not $4\times$ due to unbalanced stages) **(c) 2-stage pipeline (optimal grouping):** We want to split the four blocks into two groups with roughly equal total delay. Total logic delay = 14 ns, ideal split = 7 ns each. | Grouping | Stage 1 | Stage 2 | Bottleneck | |----------|---------|---------|------------| | {A,B} + {C,D} | 3+5 = 8 | 4+2 = 6 | 8.5 ns | | {A} + {B,C,D} | 3 | 5+4+2 = 11 | 11.5 ns | | {A,B,C} + {D} | 3+5+4 = 12 | 2 | 12.5 ns | **Optimal: {A, B} | {C, D}** - Stage 1: $0.3 + 8 + 0.2 = 8.5$ ns - Stage 2: $0.3 + 6 + 0.2 = 6.5$ ns $$T_{clk} = 8.5 \text{ ns}$$ $$\text{Throughput} = \frac{1}{8.5 \text{ ns}} = 117.6 \text{ M results/s}$$ $$\text{Latency} = 2 \times 8.5 = 17.0 \text{ ns}$$ **Summary:** | Configuration | $T_{clk}$ | Throughput | Latency | Speedup | |---------------|----------|------------|---------|---------| | No pipeline | 14.5 ns | 69.0 M/s | 14.5 ns | 1.0x | | 2-stage | 8.5 ns | 117.6 M/s | 17.0 ns | 1.7x | | 4-stage | 5.5 ns | 181.8 M/s | 22.0 ns | 2.6x | **Key insight:** Pipelining improves throughput but increases latency. Unbalanced stages reduce the theoretical speedup (4-stage gives only 2.6x, not 4x).

Problem 14

A 32-bit multiplier has a combinational delay of 20 ns. Registers have \(T_{cq}\) = 0.4 ns and \(T_{setup}\) = 0.3 ns.

(a) What is \(f_{max}\) without pipelining? (b) If the multiplier is split into 4 equal pipeline stages, what is \(f_{max}\)? (c) What is the pipeline latency? (d) If a feedback path exists (output feeds back to input), how does this affect pipeline throughput?

Show Solution
**(a) Without pipelining:** $$T_{clk} = T_{cq} + 20 + T_{setup} = 0.4 + 20 + 0.3 = 20.7 \text{ ns}$$ $$f_{max} = \frac{1}{20.7 \text{ ns}} = 48.3 \text{ MHz}$$ **(b) With 4 pipeline stages:** Each stage combinational delay: $20 / 4 = 5$ ns $$T_{clk} = T_{cq} + 5 + T_{setup} = 0.4 + 5 + 0.3 = 5.7 \text{ ns}$$ $$f_{max} = \frac{1}{5.7 \text{ ns}} = 175.4 \text{ MHz}$$ **Speedup:** $175.4 / 48.3 = 3.63\times$ **(c) Pipeline latency:** $$\text{Latency} = 4 \times T_{clk} = 4 \times 5.7 = 22.8 \text{ ns}$$ Compare with non-pipelined latency of 20.7 ns. Pipelining adds $22.8 - 20.7 = 2.1$ ns of latency overhead from the 3 extra pipeline registers ($3 \times 0.7 = 2.1$ ns, where 0.7 = $T_{cq} + T_{setup}$ for each added register). **(d) Feedback path impact:**
Input ──→ [Stage1] ──→ [Stage2] ──→ [Stage3] ──→ [Stage4] ──→ Output
                                                                 │
             ┌───────────────────────────────────────────────────┘
             ↓ (feedback)
          [MUX] ←── New Input
             │
             ↓
Input ──→ [Stage1] ...
With feedback, the next computation that depends on the current result must wait for the full pipeline latency (4 cycles) before it can start. This creates a **data hazard**. **Effective throughput with feedback:** - Without feedback: 1 result per cycle = 175.4 M results/s - With feedback (every result depends on previous): 1 result per 4 cycles = $175.4 / 4 = 43.9$ M results/s **The feedback loop completely negates the pipelining benefit for dependent computations.** The throughput with feedback ($43.9$ MHz) is actually slightly worse than the non-pipelined design ($48.3$ MHz) due to pipeline register overhead. **Solution approaches:** Loop unrolling, bypassing/forwarding (if partial results suffice), or accepting the latency penalty.

Problem 15

Compare the area, speed, and power trade-offs for three implementations of an 8-bit adder:

  1. Ripple carry adder (RCA)
  2. Carry lookahead adder (CLA)
  3. Carry select adder (CSLA)

Assume a unit gate delay of 1 ns and each gate costs 1 area unit and consumes 1 power unit per switching event.

Show Solution
**1. Ripple Carry Adder (RCA):** - **Structure:** 8 cascaded full adders - **Delay:** Carry propagates through 8 stages, 2 gate delays each = $8 \times 2 = 16$ gate delays = **16 ns** - **Area:** 8 full adders, each ~5 gates = **40 area units** - **Power:** Only gates on the active carry path switch = low dynamic power. Approximately **40 power units** worst case (all gates switch once) **2. Carry Lookahead Adder (CLA):** - **Structure:** Generate/propagate logic + lookahead carry unit - **Delay:** G,P generation (1) + CLA carry (2) + sum XOR (1) = **4 gate delays = 4 ns** - **Area:** 8 G/P generators (16 gates) + CLA unit (~30 gates for 8-bit) + 8 XOR sum gates = **~54 area units** - **Power:** All carry bits computed simultaneously, more gates switch = **~54 power units** worst case **3. Carry Select Adder (CSLA):** - **Structure:** Two 4-bit RCAs (for carry-in = 0 and 1) + MUX - **Delay:** 4-bit RCA ($4 \times 2 = 8$ delays) + MUX (1 delay) = **9 gate delays = 9 ns** - **Area:** 3 x 4-bit RCA = 60 gates + 4-bit MUX = 12 gates = **~72 area units** - **Power:** Two RCAs compute redundantly = **~72 power units** worst case **Comparison table:** | Metric | RCA | CLA | CSLA | |--------|-----|-----|------| | Delay | 16 ns | 4 ns | 9 ns | | Area | 40 units | 54 units | 72 units | | Power | 40 units | 54 units | 72 units | | Speed rank | 3rd | 1st | 2nd | | Area rank | 1st | 2nd | 3rd | | Power rank | 1st | 2nd | 3rd | **Area-Delay Product (ADP):** A combined metric for efficiency: | Design | ADP | |--------|-----| | RCA | $40 \times 16 = 640$ | | CLA | $54 \times 4 = 216$ | | CSLA | $72 \times 9 = 648$ | **CLA has the best area-delay product,** making it the most efficient trade-off. RCA is smallest but slowest. CSLA is largest but offers a middle-ground speed. **When to use each:** - **RCA:** Area-critical, low-speed designs - **CLA:** Speed-critical designs where area overhead is acceptable - **CSLA:** Moderate speed improvement needed with simpler design than full CLA

Problem 16

A design must process 200 million samples per second. The combinational logic takes 12 ns. Registers have \(T_{cq}\) = 0.3 ns and \(T_{setup}\) = 0.2 ns. Determine the minimum number of pipeline stages required and show how to partition the logic.

Show Solution
**Required throughput:** 200 M samples/s $$T_{clk,required} = \frac{1}{200 \times 10^6} = 5.0 \text{ ns}$$ **Register overhead per stage:** $T_{cq} + T_{setup} = 0.3 + 0.2 = 0.5$ ns **Single stage (no pipelining):** $$T_{clk} = 0.3 + 12 + 0.2 = 12.5 \text{ ns} \Rightarrow 80 \text{ MHz (too slow)}$$ **Minimum pipeline stages calculation:** For $N$ stages, assuming perfectly balanced logic: $$T_{clk} = T_{cq} + \frac{12}{N} + T_{setup} = 0.5 + \frac{12}{N}$$ Set $T_{clk} \leq 5.0$ ns: $$0.5 + \frac{12}{N} \leq 5.0$$ $$\frac{12}{N} \leq 4.5$$ $$N \geq \frac{12}{4.5} = 2.67$$ **Minimum stages: $N = 3$** **Verification with 3 stages:** Each stage logic delay: $12 / 3 = 4.0$ ns $$T_{clk} = 0.3 + 4.0 + 0.2 = 4.5 \text{ ns}$$ $$f_{max} = \frac{1}{4.5 \text{ ns}} = 222.2 \text{ MHz} > 200 \text{ MHz} \checkmark$$ **Check $N = 2$:** $$T_{clk} = 0.5 + 6.0 = 6.5 \text{ ns} \Rightarrow 153.8 \text{ MHz} < 200 \text{ MHz} \times$$ Two stages is insufficient. **Partitioning the logic:** If the 12 ns logic consists of sub-blocks, we partition into 3 groups each totaling 4 ns:
Input ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ [Logic: 4ns] ──→ [Reg] ──→ Output
          Stage 1                    Stage 2                    Stage 3
**Pipeline metrics:** | Metric | Value | |--------|-------| | Clock period | 4.5 ns | | $f_{max}$ | 222.2 MHz | | Throughput | 222.2 M samples/s | | Latency | $3 \times 4.5 = 13.5$ ns | | Pipeline registers added | 2 (between stages) | **Note:** If the logic cannot be split into exactly equal groups, the bottleneck stage determines $T_{clk}$. In that case, more stages may be needed to meet the 200 MHz target with unbalanced partitioning.

Section E: System Design Problems (4 problems)

Problem 17

Design a UART transmitter that sends 8-bit data with 1 start bit, 8 data bits (LSB first), and 1 stop bit. The baud rate is 9600. The system clock is 50 MHz. Show the controller FSM and datapath.

Show Solution
**UART frame format:**
IDLE(1) | START(0) | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | STOP(1) | IDLE(1)...
**Baud rate clock generation:** Clock divider count: $\frac{50 \times 10^6}{9600} = 5208.33 \approx 5208$ A counter counts from 0 to 5207 and generates a BAUD_TICK pulse. **Datapath:**
DATA_IN[7:0] ──→ [Shift Register (8-bit, PISO)] ──→ TX_BIT
                      ↑         ↑                       ↑
                    LOAD      SHIFT                   [MUX]
                                                      ↑   ↑
                                                 TX_SEL[1:0]
                                                    │
Sources: 0 (start), shift_out (data), 1 (stop/idle)
**Datapath components:** | Component | Size | Function | |-----------|------|----------| | Baud counter | 13-bit | Divides 50 MHz to 9600 Hz | | Bit counter | 4-bit | Counts 0-9 (start + 8 data + stop) | | Shift register | 8-bit PISO | Holds data, shifts out LSB first | | Output MUX | 3-to-1 | Selects start/data/stop bit | **Controller FSM:**
┌──────────┐
│   IDLE   │ ←─────────────────────────────────┐
│  TX = 1  │                                    │
└────┬─────┘                                    │
     │ SEND = 1                                 │
     ↓                                          │
┌──────────┐                                    │
│  START   │  TX = 0, LOAD shift reg            │
│          │  Wait for BAUD_TICK                 │
└────┬─────┘                                    │
     │ BAUD_TICK                                │
     ↓                                          │
┌──────────┐                                    │
│   DATA   │  TX = shift_out                    │
│          │  On BAUD_TICK: shift, bit_cnt++    │
│          │  When bit_cnt = 8: go to STOP      │
└────┬─────┘                                    │
     │ bit_cnt = 8 AND BAUD_TICK                │
     ↓                                          │
┌──────────┐                                    │
│   STOP   │  TX = 1                            │
│          │  Wait for BAUD_TICK                 │
└────┬─────┘                                    │
     │ BAUD_TICK                                │
     └──────────────────────────────────────────┘
**State encoding and control signals:** | State | TX_SEL | LOAD | SHIFT | COUNT_EN | BUSY | |-------|--------|------|-------|----------|------| | IDLE | 10 (high) | 0 | 0 | 0 | 0 | | START | 00 (low) | 1 | 0 | 0 | 1 | | DATA | 01 (shift_out) | 0 | BAUD_TICK | BAUD_TICK | 1 | | STOP | 10 (high) | 0 | 0 | 0 | 1 | **Timing for one byte at 9600 baud:** $$T_{byte} = 10 \text{ bits} \times \frac{1}{9600} = 1.042 \text{ ms}$$ Maximum data rate: $9600 / 10 = 960$ bytes/s.

Problem 18

Design a simple vending machine controller that:

  • Accepts nickels (5 cents) and dimes (10 cents)
  • Item costs 25 cents
  • Dispenses when exact amount or overpayment is reached
  • Returns to idle after dispensing (no change given)

Provide the complete state diagram, state table, state encoding, and next-state equations.

Show Solution
**States (based on amount collected):** | State | Amount | Encoding | |-------|--------|----------| | S0 | 0 cents | 000 | | S5 | 5 cents | 001 | | S10 | 10 cents | 010 | | S15 | 15 cents | 011 | | S20 | 20 cents | 100 | | S25 | 25+ cents (dispense) | 101 | **Inputs:** N (nickel inserted), D (dime inserted). Assume N and D are mutually exclusive pulses. **Output:** DISP (dispense item) **State diagram:**
S0 ──N──→ S5 ──N──→ S10 ──N──→ S15 ──N──→ S20 ──N──→ S25
│         │          │          │          │
└──D──→ S10   └──D──→ S15  └──D──→ S20  └──D──→ S25  └──D──→ S25
                                                        │
S25: DISP=1, then return to S0 ←────────────────────────┘
**Complete state table:** | Present | N | D | Next | DISP | |---------|---|---|------|------| | S0 (000) | 0 | 0 | S0 | 0 | | S0 | 1 | 0 | S5 | 0 | | S0 | 0 | 1 | S10 | 0 | | S5 (001) | 0 | 0 | S5 | 0 | | S5 | 1 | 0 | S10 | 0 | | S5 | 0 | 1 | S15 | 0 | | S10 (010) | 0 | 0 | S10 | 0 | | S10 | 1 | 0 | S15 | 0 | | S10 | 0 | 1 | S20 | 0 | | S15 (011) | 0 | 0 | S15 | 0 | | S15 | 1 | 0 | S20 | 0 | | S15 | 0 | 1 | S25 | 0 | | S20 (100) | 0 | 0 | S20 | 0 | | S20 | 1 | 0 | S25 | 0 | | S20 | 0 | 1 | S25 | 0 | | S25 (101) | X | X | S0 | 1 | **Next-state equations (from K-maps):** Let the state bits be $Q_2 Q_1 Q_0$ and don't-care states are 110, 111. $D_2 = Q_1 Q_0 N + Q_1 Q_0 D + Q_2 \overline{Q_0} \overline{N} \overline{D} + Q_2 \overline{Q_0} N$ Simplified (using don't cares for 110, 111): $$D_2 = Q_1 Q_0 (N + D) + Q_2 \overline{Q_0}$$ $$D_1 = \overline{Q_2} \overline{Q_1} Q_0 N + \overline{Q_2} Q_1 \overline{Q_0} \overline{N}\overline{D} + \overline{Q_2} \overline{Q_1} D + \overline{Q_2} Q_1 \overline{Q_0} N + \overline{Q_2} Q_1 Q_0 \overline{N}\overline{D}$$ Simplified: $$D_1 = \overline{Q_2}(\overline{Q_1}Q_0 N + \overline{Q_1}D + Q_1\overline{N}\overline{D} + Q_1\overline{Q_0}N)$$ $$D_0 = \overline{Q_2}\overline{Q_0}N + \overline{Q_2}Q_0\overline{N}\overline{D} + \overline{Q_2}\overline{Q_1}Q_0 D$$ Simplified: $$D_0 = \overline{Q_2}(N \oplus Q_0) \cdot \text{(with correction for dime cases)}$$ **Output:** $\text{DISP} = Q_2 Q_0$ (DISP = 1 only in state S25 = 101)

Problem 19

A self-checking testbench must verify a 4-bit counter. Describe the testbench architecture, the test vectors needed, and how to implement automatic checking in a hardware description approach.

Show Solution
**Testbench architecture:**
┌──────────────────────────────────────────────────┐
│                  TESTBENCH                        │
│                                                  │
│  [Stimulus       [Device        [Response         │
│   Generator] ──→  Under    ──→  Checker]          │
│       │          Test (DUT)]       │              │
│       │           4-bit           │              │
│       │          counter          │              │
│  CLK, RST,                   Expected vs         │
│  EN signals                  Actual comparison    │
│                                   │              │
│                              PASS/FAIL           │
└──────────────────────────────────────────────────┘
**Test categories and vectors:** **1. Reset test:** | Test | CLK | RST | EN | Expected Q | |------|-----|-----|----|-----------| | Assert reset | rising | 1 | X | 0000 | | Hold reset | rising | 1 | 1 | 0000 | | Release reset | rising | 0 | 0 | 0000 | **2. Normal counting (16 cycles):** | Cycle | RST | EN | Expected Q | |-------|-----|----|-----------| | 1 | 0 | 1 | 0001 | | 2 | 0 | 1 | 0010 | | ... | ... | ... | ... | | 15 | 0 | 1 | 1111 | | 16 | 0 | 1 | 0000 (rollover) | **3. Enable control:** | Test | RST | EN | Expected | |------|-----|----|----------| | Count to 5 | 0 | 1 | 0101 | | Disable | 0 | 0 | 0101 (hold) | | Disable again | 0 | 0 | 0101 (hold) | | Re-enable | 0 | 1 | 0110 (resume) | **4. Reset during count:** | Test | RST | EN | Expected | |------|-----|----|----------| | Count to 7 | 0 | 1 | 0111 | | Assert reset | 1 | 1 | 0000 | | Release, count | 0 | 1 | 0001 | **Self-checking mechanism (pseudocode):**
expected_count = 0

for each test_cycle:
    apply(CLK, RST, EN)
    wait(clock_edge)

    // Compute expected value
    if (RST == 1):
        expected_count = 0
    else if (EN == 1):
        expected_count = (expected_count + 1) mod 16

    // Automatic check
    if (DUT.Q != expected_count):
        report ERROR at cycle number
        error_count++

// Final report
if (error_count == 0):
    report "ALL TESTS PASSED"
else:
    report "FAILED: {error_count} errors"
**Comprehensive test vector count:** | Category | Vectors | Purpose | |----------|---------|---------| | Reset | 3 | Verify reset behavior | | Full count | 17 | All states + rollover | | Enable toggle | 6 | Hold and resume | | Mid-count reset | 4 | Reset from non-zero | | Corner cases | 4 | Rollover with EN toggle | | **Total** | **~34** | **Complete coverage** | **Design for testability tips:** - Include a counter output observation point (no buried states) - Add a synchronous reset (easier to test than async) - Include a terminal count (TC) output and verify it at state 1111 - Test at least two full counting sequences to verify rollover

Problem 20

A complete digital system design flows from specification to implementation. List and describe each step in the design flow. For a 4-bit ALU with ADD, SUB, AND, OR, XOR operations, give a concrete example of what happens at each step.

Show Solution
**Digital design flow — 8 steps:** **Step 1: Specification** Define what the system does, its inputs, outputs, and constraints. *4-bit ALU example:* - Inputs: A[3:0], B[3:0], Op[2:0] - Outputs: Result[3:0], Carry_out, Zero_flag - Operations: ADD (000), SUB (001), AND (010), OR (011), XOR (100) - Constraint: Combinational, max delay < 15 ns **Step 2: Architecture / Top-down decomposition** Break the system into modules and define their interfaces. *4-bit ALU example:*
ALU
├── Adder/Subtractor unit (handles ADD, SUB)
│   ├── XOR array (B complement for SUB)
│   └── 4-bit adder (with carry-in for SUB)
├── Logic unit (handles AND, OR, XOR)
│   ├── AND array
│   ├── OR array
│   └── XOR array
├── Output MUX (selects result based on Op)
└── Flag generator (Zero, Carry)
**Step 3: Detailed design / RTL coding** Implement each module at the register-transfer level. *Example — Output MUX logic:* | Op[2:0] | Result | |---------|--------| | 000 | A + B | | 001 | A - B | | 010 | A AND B | | 011 | A OR B | | 100 | A XOR B | Zero_flag = (Result == 0000) **Step 4: Functional simulation** Verify the design produces correct outputs for all operations. *Example test vectors:* | A | B | Op | Expected Result | Expected Carry | |---|---|----|-----------------|----------------| | 0011 | 0001 | ADD | 0100 | 0 | | 0011 | 0001 | SUB | 0010 | 0 | | 1111 | 0001 | ADD | 0000 | 1 | | 1010 | 1100 | AND | 1000 | - | | 1010 | 1100 | OR | 1110 | - | | 1010 | 1100 | XOR | 0110 | - | | 0000 | 0000 | ADD | 0000 (Z=1) | 0 | **Step 5: Synthesis** Convert RTL to gate-level netlist targeting specific technology. *Example output:* The adder synthesizes to a ripple carry adder using AND, OR, XOR gates from the target library. Total: ~35 gates. **Step 6: Static timing analysis** Verify all paths meet timing constraints. *Example:* - Critical path: A input through carry chain to Result[3] - Path delay: 4 XOR + 4 AND + 3 OR = 11 gate delays - At 1 ns/gate: 11 ns < 15 ns requirement. **PASS.** **Step 7: Place and route (for FPGA/ASIC)** Map gates to physical locations and create interconnect wiring. *Example:* The ALU uses 22 LUTs and 0 flip-flops on an FPGA. Routing adds 2 ns interconnect delay. Total delay: 13 ns < 15 ns. **PASS.** **Step 8: Post-layout verification and testing** Final simulation with actual delays, then physical testing. *Example:* Post-layout simulation confirms all 112 test vectors ($7 \text{ ops} \times 16 \text{ input combos}$) pass with worst-case delay of 13.4 ns. Design is programmed onto FPGA and verified with logic analyzer. **Summary of flow:** | Step | Activity | Output | |------|----------|--------| | 1 | Specification | Requirements document | | 2 | Architecture | Block diagram, interfaces | | 3 | RTL Design | HDL code or schematic | | 4 | Functional Simulation | Test results (pass/fail) | | 5 | Synthesis | Gate-level netlist | | 6 | Timing Analysis | Timing report | | 7 | Place and Route | Layout / bitstream | | 8 | Verification and Test | Working hardware |

Problems Summary

Section Topics Covered Problem Count
A Top-Down Design and Modularity 4
B Datapath and Controller Design 4
C Timing Analysis 4
D Pipelining and Optimization 4
E System Design Problems 4
Total 20