Unit 13: System Integration and Design Projects

Unit Overview (click to expand)

Welcome to Unit 13, the capstone of this course. Everything you have learned — from Boolean algebra and logic gates through sequential design, programmable devices, and VHDL — comes together here as we tackle complete system integration and real design projects. The starting point is top-down design methodology. Instead of jumping straight to gates and flip-flops, you begin with a high-level block diagram, then progressively refine each block until you reach the implementation level. A critical part is the separation of a design into a datapath and a control unit. The datapath contains registers, arithmetic units, and buses that process data. The control unit is a finite state machine that generates the signals telling the datapath what to do and when. Once your design is described in VHDL, verification becomes paramount. You will write testbenches that systematically exercise your design. Static timing analysis helps you identify the critical path — the longest delay path — which determines the maximum operating frequency. Real-world design always involves trade-offs. Making a circuit faster often requires more area or consumes more power. These trade-offs determine whether your product meets its battery life target, fits in its package, or stays within budget. To make all of this concrete, we work through several real-world examples, including a digital combination lock, an arithmetic logic unit, a UART serial communication controller, and a vending machine controller. **Key Takeaways** 1. Top-down design methodology and the separation of datapath from control unit are the standard approaches for managing complexity in real digital systems. 2. Verification through testbenches and static timing analysis of the critical path are essential steps that ensure a design is both functionally correct and fast enough. 3. Real-world digital design requires navigating trade-offs among speed, area, and power, and capstone projects bring all course concepts together into practical, complete systems.

Summary

This capstone unit brings together every concept from the course—number systems, Boolean algebra, combinational logic, sequential circuits, programmable devices, and VHDL—into complete, integrated digital system designs. Students will learn the top-down design methodology for managing complexity, master systematic verification techniques including testbench design and timing analysis, and apply their skills to realistic design projects. The unit emphasizes the engineering judgment needed to make design trade-offs (speed vs area vs power), partition systems into manageable subsystems, and verify that all parts work together correctly. By completing this unit, students will have the skills to design, describe, simulate, and implement non-trivial digital systems on programmable logic devices.

Concepts Covered

Top-Down Design Methodology
Design Hierarchy and Modularity
System Partitioning
Interface Specification
Datapath and Control Unit Separation
Datapath Design
Control Unit Design
Datapath-Controller Integration
Design Documentation
Verification Planning
Functional Verification
Testbench Architecture
Self-Checking Testbenches
Test Vector Generation
Code Coverage Concepts
Static Timing Analysis
Critical Path Identification
Setup and Hold Time Budgeting
Clock Frequency Determination
Pipelining for Performance
Design Trade-offs: Area vs Speed vs Power
Resource Sharing and Scheduling
Design for Testability
System-Level Example: Digital Lock
System-Level Example: ALU Design
System-Level Example: Serial Communication
System-Level Example: Vending Machine Controller
Design Review and Optimization
From Specification to Silicon
Career Paths in Digital Design

Prerequisites

Before studying this unit, students should be familiar with:

All combinational logic concepts (Units 2-8)
Sequential circuit design: flip-flops, registers, counters, FSMs (Units 9-10)
Programmable logic devices, especially FPGAs (Unit 11)
VHDL fundamentals: entities, architectures, processes, testbenches (Unit 12)

13.1 The Need for System-Level Design

The circuits designed in prior units—adders, decoders, counters, state machines—are building blocks. Real digital systems combine dozens or hundreds of these blocks into coordinated, multi-component designs. A simple embedded controller might include:

An arithmetic logic unit (ALU) for computation
Registers for data storage
A finite state machine for control sequencing
Multiplexers for data routing
I/O interfaces for communication

Designing such a system by immediately writing Boolean equations or drawing gate-level schematics would be overwhelming. Instead, engineers use a top-down design methodology that manages complexity by working from abstract specifications down to detailed implementations.

Design Level	Description	Tools
System specification	What the system must do (requirements)	Natural language, timing diagrams
Architectural design	How subsystems are organized	Block diagrams, dataflow diagrams
Register-Transfer Level (RTL)	How data moves between registers each clock cycle	VHDL/Verilog, state diagrams
Gate level	Which gates implement each function	Synthesis output, netlists
Physical level	Where gates are placed and routed	FPGA implementation tools

13.2 Top-Down Design Methodology

The top-down approach decomposes a complex problem into progressively smaller, more manageable pieces:

Step 1: Specification. Define what the system must do—inputs, outputs, timing requirements, performance targets, constraints. A clear specification is the foundation; ambiguity here causes errors that propagate through the entire design.

Step 2: Architecture. Partition the system into major subsystems and define how they communicate. Identify the datapath (the components that process data) and the control unit (the FSM that sequences operations).

Step 3: Detailed design. Design each subsystem individually, using the techniques from prior units. Describe each in VHDL at the RTL level.

Step 4: Integration. Connect subsystems using structural VHDL (component instantiation). Verify interfaces match.

Step 5: Verification. Simulate the integrated design with comprehensive testbenches. Verify timing constraints.

Step 6: Implementation. Synthesize, place, route, and program onto the target FPGA. Verify in hardware.

Diagram: Top-Down Design Methodology

flowchart TD
    A["<b>System Specification</b><br/><i>Define inputs, outputs,<br/>timing, constraints</i>"]
    B["<b>Partition into Subsystems</b><br/><i>Identify datapath &amp; controller,<br/>define interfaces</i>"]
    C["<b>Design Each Module</b><br/><i>RTL design in VHDL,<br/>unit-level simulation</i>"]
    D["<b>Integrate</b><br/><i>Connect subsystems,<br/>structural VHDL</i>"]
    E["<b>Verify</b><br/><i>Integration testbench,<br/>timing analysis</i>"]
    F["<b>Implement</b><br/><i>Synthesize, place &amp; route,<br/>program FPGA</i>"]

    A --> B --> C --> D --> E --> F
    E -- "<i>Timing violation<br/>or bug found</i>" --> C
    F -- "<i>Requirements<br/>change</i>" --> A

    style A fill:#7E57C2,stroke:#5A3EED,color:#fff
    style B fill:#8E6AC8,stroke:#5A3EED,color:#fff
    style C fill:#9C7DD0,stroke:#5A3EED,color:#fff
    style D fill:#AA90D8,stroke:#5A3EED,color:#fff
    style E fill:#B8A3E0,stroke:#5A3EED,color:#333
    style F fill:#C6B6E8,stroke:#5A3EED,color:#333

Interactive Diagram: Top-Down Design Flow

13.3 Design Hierarchy and Modularity

Hierarchy is the primary tool for managing complexity. A hierarchical design consists of modules at multiple levels, where each module:

Has a well-defined interface (entity in VHDL)
Performs a single, clear function
Can be designed and tested independently
Can be reused in other designs

Example: Hierarchical ALU Design

Click any module to select it, expand/collapse children, and view details in the side panel.

Each leaf module is simple enough to design with the techniques from prior units. Integration assembles them structurally.

Module Size Rule of Thumb: Each module should be small enough to understand at a glance—typically 50–200 lines of VHDL. If a module grows larger, it probably should be split into sub-modules. Conversely, modules that are too small (a single gate) add unnecessary hierarchy.

Example: 4-Function Calculator Decomposition

Consider a simple calculator that performs addition, subtraction, AND, and OR on two 8-bit numbers, displaying results on a seven-segment display. A top-down decomposition yields:

Calculator (top level)
├── Input Interface
│   ├── Keypad Decoder (4×4 matrix → 4-bit BCD)
│   └── Input Registers (2 × 8-bit, load-enabled)
├── ALU Module
│   ├── 8-bit Adder-Subtractor (Unit 3)
│   ├── Bitwise Logic Unit (AND/OR)
│   └── Output MUX (selects operation result)
├── Control FSM
│   ├── State: ENTER_A → ENTER_B → SELECT_OP → DISPLAY
│   └── Generates: load_A, load_B, alu_sel, display_en
└── Output Interface
    ├── Binary-to-BCD Converter
    └── Seven-Segment Decoder (Unit 3)

Each leaf module is a component studied in earlier units. The hierarchy manages complexity: the top-level design sees only four blocks with well-defined interfaces, not the hundreds of gates inside them. If the ALU needs to support multiplication later, only the ALU Module subtree changes — the Control FSM needs a new opcode, but the Input and Output Interfaces remain untouched. This is the power of modular hierarchy.

Interface for the ALU Module:

Port	Direction	Width	Description
`a`, `b`	in	8 bits	Operands from input registers
`op_sel`	in	2 bits	Operation: 00=Add, 01=Sub, 10=AND, 11=OR
`result`	out	8 bits	Computation result
`flags`	out	3 bits	Zero, Carry, Negative

13.4 Datapath and Control Unit Separation

Most digital systems can be decomposed into two complementary parts:

Datapath: The components that store, transport, and transform data. This includes registers, ALUs, multiplexers, shifters, and buses. The datapath performs the "work" of the system.

Control Unit: A finite state machine (or set of FSMs) that generates the control signals directing the datapath. The control unit determines when and how data moves through the datapath, based on the current state and input conditions.

This separation is powerful because:

The datapath can be designed using combinational and sequential building blocks from Units 3–10
The control unit is a standard FSM design problem from Unit 10
Changes to the control sequence do not require redesigning the datapath hardware
The same datapath can be reused with different control units for different operations

Diagram: Datapath-Controller Architecture

flowchart TD
    subgraph CTRL["<b>Controller (FSM)</b>"]
        FSM["State Register +<br/>Next-State Logic<br/><i>Generates: MUX_sel, Reg_load, ALU_op</i>"]
    end

    subgraph DP["<b>Datapath</b>"]
        REG["<b>Registers</b><br/><i>R0, R1, ..., Rn</i>"]
        MUX["<b>MUX</b><br/><i>Data Select</i>"]
        ALU["<b>ALU</b><br/><i>Add, Sub,<br/>AND, OR</i>"]
        REG --> MUX --> ALU
        ALU -- "Result" --> REG
    end

    FSM -- "<b>Control Signals</b><br/><i>MUX_sel, Reg_load,<br/>ALU_op, Shift_en</i>" --> DP
    DP -- "<b>Status Signals</b><br/><i>Zero, Carry,<br/>Overflow, Sign</i>" --> FSM

    style FSM fill:#7E57C2,stroke:#5A3EED,color:#fff
    style CTRL fill:#F0ECFF,stroke:#5A3EED,color:#333
    style DP fill:#F0ECFF,stroke:#5A3EED,color:#333
    style REG fill:#EEF4FF,stroke:#A8C8FF,color:#333
    style MUX fill:#EEF4FF,stroke:#A8C8FF,color:#333
    style ALU fill:#EEF4FF,stroke:#A8C8FF,color:#333

Interactive Diagram: Datapath-Controller Architecture

13.5 Interface Specification

When multiple engineers (or multiple modules) must connect, interface specifications prevent integration errors. An interface specification for each module boundary includes:

Port names and types (the VHDL entity)
Timing relationships (when signals are valid relative to the clock)
Protocol (handshake signals, ready/valid, request/acknowledge)
Data encoding (unsigned, signed, BCD, one-hot)
Reset behavior (what state does the module enter on reset?)

A simple but effective interface pattern is the ready/valid handshake:

Producer asserts 'valid' when data is available
Consumer asserts 'ready' when it can accept data
Data transfers only when BOTH valid AND ready are HIGH

This pattern decouples the timing of producer and consumer, preventing data loss or duplication.

Example: ALU Module Interface Specification

A complete interface specification for an 8-bit ALU module documents everything another engineer needs to connect to it:

Port Specification:

Port	Direction	Type	Description
`clk`	in	`std_logic`	System clock (rising-edge active)
`rst`	in	`std_logic`	Synchronous reset (active high)
`a`	in	`std_logic_vector(7 downto 0)`	First operand
`b`	in	`std_logic_vector(7 downto 0)`	Second operand
`op`	in	`std_logic_vector(2 downto 0)`	Operation select (see table in 13.12)
`start`	in	`std_logic`	Pulse to begin operation
`result`	out	`std_logic_vector(7 downto 0)`	Computation result
`done`	out	`std_logic`	High for one cycle when result is valid
`flags`	out	`std_logic_vector(3 downto 0)`	{Carry, Zero, Negative, Overflow}

Timing: Operands a, b, and op must be stable before the rising clock edge when start is asserted. The result and flags outputs are valid on the same clock edge that done is asserted (one cycle after start for single-cycle operations).

Reset behavior: On rst = '1', result is driven to "00000000", all flags are cleared, and done is deasserted.

Why This Matters: This level of detail prevents integration bugs. Without it, one engineer might assume the result is available immediately (combinational), while another expects it one cycle later (registered) — a mismatch that causes intermittent failures.

13.6 Verification Planning

Professional digital design devotes more effort to verification than to design itself—typically a 60/40 or even 70/30 split. A verification plan defines:

What to test: Every input combination? Every state transition? Every boundary condition?
How to test: Manual stimulus? Random stimulus? Formal verification?
Pass/fail criteria: How does the testbench determine correctness automatically?
Coverage goals: What percentage of code, states, and transitions must be exercised?

For the designs in this course, verification focuses on:

Functional correctness: Does the design produce correct outputs for all relevant inputs?
Timing correctness: Does the design meet setup and hold time requirements at the target clock frequency?
Reset behavior: Does the design initialize correctly?
Edge cases: Does the design handle boundary conditions (overflow, maximum count, all-zeros, all-ones)?

Example: Verification Plan for the Digital Lock

A structured verification plan for the digital combination lock (Section 13.11) organizes tests by category:

Category	Test	Input Sequence	Expected Result
Normal	Correct code on first try	3→7→1→9	Unlock asserted
Normal	Wrong first digit, then correct	5→(reset)→3→7→1→9	Unlock after retry
Boundary	All zeros entered	0→0→0→0	Remain locked, attempt +1
Boundary	Correct code at attempt 3	2 wrong + correct	Unlock (just in time)
Error	Three wrong attempts	Wrong × 3	Lockout for 30 seconds
Error	Enter pressed with no digit	Enter (no BCD change)	No state change
Reset	Reset during code entry	3→7→Reset	Return to IDLE, counters cleared
Reset	Reset during lockout	Lockout → Reset	Return to IDLE
Timing	Rapid button presses	2 enters within 1 clock	Only one digit registered
Timing	Lockout timer accuracy	Enter lockout → wait	Timeout at exactly 30s ± 1 cycle

Key Insight: Each test maps to specific assert statements in the testbench. A well-designed verification plan like this typically catches 90% of bugs before hardware testing — the remaining 10% often involve physical timing issues that only appear on real hardware.

13.7 Testbench Architecture

Unit 12 introduced basic testbenches. For system-level verification, testbenches become more sophisticated:

Self-Checking Testbench

A self-checking testbench automatically compares the DUT's outputs against expected values, reporting errors without requiring manual waveform inspection:

-- Self-checking testbench for 4-bit adder
verify: process
begin
    -- Test case 1: 3 + 5 = 8
    a_tb <= "0011"; b_tb <= "0101"; cin_tb <= '0';
    wait for 10 ns;
    assert (sum_tb = "1000" and cout_tb = '0')
        report "FAIL: 3 + 5 should equal 8"
        severity error;

    -- Test case 2: 15 + 1 = 0 with carry
    a_tb <= "1111"; b_tb <= "0001"; cin_tb <= '0';
    wait for 10 ns;
    assert (sum_tb = "0000" and cout_tb = '1')
        report "FAIL: 15 + 1 should produce carry"
        severity error;

    -- More test cases...
    report "All tests passed!" severity note;
    wait;
end process verify;

The assert statement checks a condition. When the condition is FALSE, it prints the message and raises the specified severity level. This automates the tedious process of manually checking waveforms.

Testbench with File I/O

For designs with many test vectors, reading stimulus from a file is more practical than hardcoding values:

-- Read test vectors from file
file_reader: process
    file test_file : text open read_mode is "test_vectors.txt";
    variable line_v : line;
    variable a_v, b_v, expected_v : std_logic_vector(3 downto 0);
begin
    while not endfile(test_file) loop
        readline(test_file, line_v);
        read(line_v, a_v); read(line_v, b_v); read(line_v, expected_v);
        a_tb <= a_v; b_tb <= b_v;
        wait for 10 ns;
        assert (result_tb = expected_v)
            report "FAIL at inputs: " & to_string(a_v) & ", " & to_string(b_v)
            severity error;
    end loop;
    wait;
end process file_reader;

13.8 Static Timing Analysis

After synthesis and place-and-route, the FPGA tools perform static timing analysis (STA) to verify that the design operates correctly at the target clock frequency.

The Timing Model

Every signal path from a flip-flop output through combinational logic to the next flip-flop input must satisfy:

\[T_{clk} \geq T_{cq} + T_{comb} + T_{setup}\]

Where:

\(T_{clk}\) = clock period
\(T_{cq}\) = clock-to-Q delay of the source flip-flop (from Unit 9)
\(T_{comb}\) = worst-case propagation delay through combinational logic
\(T_{setup}\) = setup time of the destination flip-flop

The critical path is the path with the largest \(T_{cq} + T_{comb} + T_{setup}\). It determines the maximum clock frequency the design can achieve:

\[f_{max} = \frac{1}{T_{cq} + T_{comb,max} + T_{setup}}\]

Hold Time Check

Additionally, every path must satisfy the hold time requirement:

\[T_{cq} + T_{comb,min} \geq T_{hold}\]

This ensures that data doesn't change too quickly after the clock edge. Hold time violations are independent of clock frequency and must be fixed by adding delay to the path.

Diagram: Timing Analysis Visualizer

13.9 Pipelining for Performance

When the critical path limits clock frequency to an unacceptable level, pipelining breaks the critical path by inserting registers at intermediate points. This trades latency (more clock cycles to complete one operation) for throughput (higher clock frequency, more operations per second).

Without pipelining:

Critical path delay: \(T_{comb} = 20\) ns
With \(T_{cq} = 2\) ns, \(T_{setup} = 1\) ns
\(f_{max} = 1/(2 + 20 + 1) = 43.5\) MHz
Throughput: 43.5 million operations/second

With one pipeline stage (splitting the combinational logic in half):

Each stage: \(T_{comb} = 10\) ns
\(f_{max} = 1/(2 + 10 + 1) = 76.9\) MHz
Latency: 2 clock cycles per result
Throughput: 76.9 million operations/second (1.77× improvement)

The pipeline stage adds one clock cycle of delay but nearly doubles the throughput. This is the same principle used in modern processors, which may have 10–20+ pipeline stages.

Pipeline Design Considerations: Pipelining is not free. It adds flip-flops (area and power), increases latency, and requires all parallel data paths to be pipelined to the same depth to maintain synchronization. The designer must weigh these costs against the frequency improvement.

Example: 3-Stage Multiply-Accumulate Pipeline

Consider a multiply-accumulate (MAC) unit: \(R = R + A \times B\). Without pipelining, the critical path includes the multiplier delay (15 ns) plus the adder delay (8 ns) plus register overhead:

\[f_{max} = \frac{1}{2 + 23 + 1} = 38.5 \text{ MHz}\]

Splitting into three pipeline stages:

Stage	Operation	Delay
Stage 1	Partial products (first half of multiply)	8 ns
Stage 2	Complete multiply (second half)	7 ns
Stage 3	Accumulate (add to running total)	8 ns

\[f_{max} = \frac{1}{2 + 8 + 1} = 90.9 \text{ MHz}\]

Results comparison:

Metric	Unpipelined	3-Stage Pipeline
Clock frequency	38.5 MHz	90.9 MHz
Latency per result	1 cycle (26 ns)	3 cycles (33 ns)
Throughput	38.5 M ops/sec	90.9 M ops/sec
Added flip-flops	—	2 × register width

The pipeline achieves 2.36× throughput at the cost of 2 extra cycles of latency and additional flip-flops. In a DSP application processing a continuous stream of audio samples, the extra latency is negligible (33 ns vs 26 ns), but the throughput gain allows processing 2.36× more channels.

Pipeline Hazard: In the MAC example, Stage 3 reads the accumulator register \(R\) that it also writes. If a new multiply feeds the same accumulator, the result from Stage 3 must be forwarded back before the next addition. This data hazard requires either forwarding logic (adding a bypass MUX) or inserting a pipeline stall (bubble) to wait for the write to complete. Hazard resolution adds complexity but is essential for correctness.

Diagram: Pipeline Stages

flowchart LR
    IN["<b>Input</b>"]
    S1["<b>Stage 1</b><br/><i>Combinational<br/>Logic A</i>"]
    R1["<b>Reg</b><br/><i>Pipeline<br/>Register</i>"]
    S2["<b>Stage 2</b><br/><i>Combinational<br/>Logic B</i>"]
    R2["<b>Reg</b><br/><i>Pipeline<br/>Register</i>"]
    S3["<b>Stage 3</b><br/><i>Combinational<br/>Logic C</i>"]
    OUT["<b>Output</b>"]

    IN --> S1 --> R1 --> S2 --> R2 --> S3 --> OUT

    style IN fill:#7E57C2,stroke:#5A3EED,color:#fff
    style S1 fill:#EEF4FF,stroke:#A8C8FF,color:#333
    style R1 fill:#FFD700,stroke:#DAA520,color:#333
    style S2 fill:#EEF4FF,stroke:#A8C8FF,color:#333
    style R2 fill:#FFD700,stroke:#DAA520,color:#333
    style S3 fill:#EEF4FF,stroke:#A8C8FF,color:#333
    style OUT fill:#7E57C2,stroke:#5A3EED,color:#fff

*Without pipelining, the critical path spans all three stages. With pipeline registers (gold), each stage runs independently at a higher clock frequency. Throughput increases at the cost of added latency and register area.*

13.10 Design Trade-offs

Every design decision involves trade-offs among three fundamental metrics:

Area (resource usage): How many LUTs, flip-flops, and routing resources does the design consume?
Speed (clock frequency): How fast can the design operate?
Power (energy consumption): How much power does the design dissipate?

These metrics are interrelated:

Optimization	Effect on Area	Effect on Speed	Effect on Power
Pipelining	Increases (more FFs)	Increases (shorter critical path)	Increases (more switching)
Resource sharing	Decreases (fewer units)	Decreases (MUX overhead)	Mixed
Parallel execution	Increases (duplicated units)	Increases (more work per cycle)	Increases
Logic minimization	Decreases (fewer gates)	Increases (shorter paths)	Decreases
Clock gating	No change	No change	Decreases

Resource sharing reuses a single hardware unit for multiple operations by time-multiplexing it with a control FSM and multiplexers:

Instead of using two separate adders for \(R = A + B\) and \(S = C + D\), use one adder with a MUX:

Cycle 1: MUX selects A,B → Adder → Store in R
Cycle 2: MUX selects C,D → Adder → Store in S

This halves the adder count but doubles the execution time and adds MUX area. Resource sharing is valuable when area is constrained and the operations don't need to happen simultaneously.

Clock Gating for Power Reduction

Dynamic power in CMOS circuits is governed by:

\[P_{dynamic} = \alpha C V_{DD}^2 f\]

Where \(\alpha\) is the switching activity (fraction of gates toggling per cycle), \(C\) is total capacitance, \(V_{DD}\) is supply voltage, and \(f\) is clock frequency. Even when a module is idle, its flip-flops toggle on every clock edge, consuming power with no useful work.

Clock gating disables the clock to unused modules, eliminating their switching activity entirely:

-- Clock gating in VHDL
gated_clk <= clk and module_enable;

Example: In the 4-function calculator, the ALU is idle while the user enters operands (ENTER_A and ENTER_B states). Clock gating the ALU during these states eliminates its switching power — roughly 40% of total dynamic power if the ALU is the largest module.

On FPGAs, vendor-specific clock buffer primitives (Xilinx BUFGCE, Intel ALTCLKCTRL) implement clock gating safely, avoiding the glitches that a simple AND gate would produce on the clock signal. The FPGA synthesis tools can also automatically infer clock enables from VHDL if statements when the enable covers an entire process.

13.11 System-Level Example: Digital Combination Lock

This example integrates concepts from across the course into a complete system.

Specification:

4-digit combination lock (each digit 0–9)
Input: 4-bit BCD digit, "Enter" button, "Reset" button
Output: "Unlock" signal, 2-digit display showing entry progress
The correct combination is hardcoded (e.g., 3-7-1-9)
Lock allows 3 attempts before a 30-second lockout

Architecture

The digital combination lock employs a datapath-controller architecture. The controller — a five-state finite state machine (FSM) — orchestrates all sequencing decisions, while the datapath performs digit storage, comparison, and counting under the controller's direction. Four subsystems are organized in a pipeline from input conditioning through output indication.

Input Subsystem

The input subsystem conditions raw external signals into clean, synchronous events for the datapath and controller.

Block	Type	Function
Debouncer	Sequential	Filters mechanical switch bounce using a shift-register majority detector
Edge Detector	Sequential	Converts the debounced Enter signal into a single-clock-cycle pulse (`enter_pulse`)
BCD Input Register	Sequential	4-bit register; latches the current BCD digit when the controller asserts `load_digit`

Datapath

The datapath contains both combinational comparison logic and sequential counters. The controller drives all counter enables and clears.

Block	Type	Function
Combination ROM	Combinational	4 × 4-bit look-up table storing the correct code; addressed by `digit_count`
BCD Comparator	Combinational	Produces `match = 1` when the input register equals the ROM output at the current address
Digit Counter	Sequential	2-bit up-counter (0–3); incremented by `inc_digit_ctr`, cleared by `clr_digit_ctr`
Attempt Counter	Sequential	2-bit up-counter (0–3); incremented by `inc_attempt`, cleared by `clr_attempt`

Control Unit (FSM)

The controller is a Mealy/Moore hybrid FSM with five states. It reads status signals from the datapath and issues control signals in return.

States: IDLE → WAIT_DIGIT → CHECK → UNLOCK or LOCKOUT

Signal Direction	Signals
Inputs (from datapath/input)	`enter_pulse`, `match`, `digit_done` (digit counter = 3), `max_attempts` (attempt counter = 3), `timeout`
Outputs (to datapath/output)	`load_digit`, `inc_digit_ctr`, `clr_digit_ctr`, `inc_attempt`, `clr_attempt`, `start_timer`, `unlock_en`, `lockout_en`

The FSM also contains a Lockout Timer — a sequential down-counter that generates timeout after 30 seconds, returning the system from LOCKOUT to IDLE.

Output Subsystem

Block	Type	Function
Unlock Register	Sequential	SR latch; set by `unlock_en`, cleared on reset
Progress Display	Combinational	Decodes `digit_count` to show entered positions (e.g., "3 7 _ _")
Status LEDs	Combinational	Green = unlocked, Red = locked, Flashing Red = lockout active

System Operation

On power-up the FSM enters IDLE and clears all counters. When the user presses a BCD digit and hits Enter, the edge detector produces enter_pulse, advancing the FSM to CHECK. The controller asserts load_digit, latching the digit into the input register. The comparator evaluates the latched digit against the ROM value at the address selected by the digit counter and drives match accordingly. On a match the FSM asserts inc_digit_ctr and returns to WAIT_DIGIT; once the digit counter reaches 3 (digit_done), the FSM transitions to UNLOCK and asserts unlock_en. On a mismatch the FSM asserts inc_attempt and clr_digit_ctr, restarting the sequence. If the attempt counter saturates (max_attempts), the FSM enters LOCKOUT, asserts start_timer, and waits for timeout before returning to IDLE.

Diagram: Digital Lock System Architecture

VHDL Implementation (Simplified Control Unit)

type lock_state is (IDLE, WAIT_DIGIT, CHECK, UNLOCK, LOCKOUT);
signal state : lock_state;
signal digit_pos : unsigned(1 downto 0);    -- 0 to 3
signal attempts  : unsigned(1 downto 0);    -- 0 to 3
signal lockout_timer : unsigned(14 downto 0); -- counts to 30 sec

-- Control FSM
process(clk, rst)
begin
    if rst = '1' then
        state <= IDLE;
        digit_pos <= "00";
        attempts <= "00";
    elsif rising_edge(clk) then
        case state is
            when IDLE =>
                digit_pos <= "00";
                state <= WAIT_DIGIT;

            when WAIT_DIGIT =>
                if enter_edge = '1' then
                    state <= CHECK;
                end if;

            when CHECK =>
                if digit_match = '1' then
                    if digit_pos = 3 then
                        state <= UNLOCK;
                    else
                        digit_pos <= digit_pos + 1;
                        state <= WAIT_DIGIT;
                    end if;
                else
                    attempts <= attempts + 1;
                    if attempts = 2 then
                        state <= LOCKOUT;
                    else
                        state <= IDLE;
                    end if;
                end if;

            when UNLOCK =>
                unlock <= '1';
                if rst = '1' then
                    state <= IDLE;
                end if;

            when LOCKOUT =>
                if lockout_timer = 0 then
                    state <= IDLE;
                    attempts <= "00";
                end if;
        end case;
    end if;
end process;

13.12 System-Level Example: 8-Bit ALU

An Arithmetic Logic Unit (ALU) is the computational heart of any processor. This example designs an 8-bit ALU that performs the operations enabled by the circuits studied throughout the course.

ALU Operations

Operation Code	Operation	Unit Reference
000	Addition (\(A + B\))	Unit 3 (Full Adder)
001	Subtraction (\(A - B\))	Unit 3 (Adder-Subtractor)
010	Bitwise AND (\(A \cdot B\))	Unit 2 (AND Gate)
011	Bitwise OR (\(A + B\))	Unit 2 (OR Gate)
100	Bitwise XOR (\(A \oplus B\))	Unit 2 (XOR Gate)
101	Bitwise NOT (\(\bar{A}\))	Unit 2 (NOT Gate)
110	Shift Left (\(A \ll 1\))	Unit 10 (Shift Register)
111	Shift Right (\(A \gg 1\))	Unit 10 (Shift Register)

VHDL Implementation

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity alu8 is
    port (
        a, b   : in  std_logic_vector(7 downto 0);
        op     : in  std_logic_vector(2 downto 0);
        result : out std_logic_vector(7 downto 0);
        zero   : out std_logic;
        carry  : out std_logic;
        neg    : out std_logic
    );
end entity alu8;

architecture rtl of alu8 is
    signal temp : unsigned(8 downto 0);  -- 9 bits for carry
begin
    process(a, b, op)
    begin
        temp <= (others => '0');
        case op is
            when "000" =>  -- ADD
                temp <= ('0' & unsigned(a)) + ('0' & unsigned(b));
            when "001" =>  -- SUB
                temp <= ('0' & unsigned(a)) - ('0' & unsigned(b));
            when "010" =>  -- AND
                temp(7 downto 0) <= unsigned(a and b);
            when "011" =>  -- OR
                temp(7 downto 0) <= unsigned(a or b);
            when "100" =>  -- XOR
                temp(7 downto 0) <= unsigned(a xor b);
            when "101" =>  -- NOT
                temp(7 downto 0) <= unsigned(not a);
            when "110" =>  -- Shift Left
                temp(7 downto 0) <= unsigned(a(6 downto 0) & '0');
            when "111" =>  -- Shift Right
                temp(7 downto 0) <= unsigned('0' & a(7 downto 1));
            when others =>
                temp <= (others => '0');
        end case;
    end process;

    result <= std_logic_vector(temp(7 downto 0));
    carry  <= std_logic(temp(8));
    zero   <= '1' when temp(7 downto 0) = 0 else '0';
    neg    <= std_logic(temp(7));
end architecture rtl;

This ALU directly applies:

Binary addition and subtraction (Unit 1, Unit 3)
Boolean operations (Unit 2)
Shift operations (Unit 10)
Multiplexer-like selection via case statement (Unit 8)
Status flag generation (Units 1, 3)

13.13 System-Level Example: UART Transmitter

A Universal Asynchronous Receiver/Transmitter (UART) is one of the most common serial communication interfaces. The transmitter converts parallel data into a serial bitstream:

Protocol:

Idle state: line held HIGH
Start bit: line goes LOW for one bit period
Data bits: 8 bits transmitted LSB first
Stop bit: line returns HIGH for one bit period

Architecture

UART Transmitter
├── Baud Rate Generator (counter that divides clock to baud rate)
├── Shift Register (parallel-to-serial conversion)
├── Bit Counter (tracks which bit is being transmitted)
└── Control FSM (sequences: idle → start → data × 8 → stop → idle)

This design combines:

Counter design (Unit 10) for baud rate generation
Shift register (Unit 10) for parallel-to-serial conversion
FSM (Unit 10) for transmission control
All implemented in VHDL (Unit 12)

Diagram: UART Transmission Protocol and Architecture

Diagram: UART Transceiver (Transmitter + Receiver)

13.14 System-Level Example: Vending Machine Controller

A vending machine controller is a classic FSM design problem that exercises many concepts:

Specification:

Accepts nickels (5¢), dimes (10¢), and quarters (25¢)
Item costs 30¢
Must make change if overpaid
Two inputs: coin_type (2 bits), coin_inserted (pulse)
Two outputs: dispense (pulse), change_amount (4 bits)

State Diagram — Interactive Simulation

The FSM tracks the accumulated amount. Insert coins using the buttons below, then press Step (Clock) to advance the FSM one cycle at a time. Watch the state transitions, control signals, and datapath operations update in real time. When the total reaches or exceeds 30¢, the machine dispenses the item, outputs change, and returns to IDLE.

This design applies:

FSM design methodology (Unit 10)
Binary arithmetic for change calculation (Unit 1)
BCD representation for display (Unit 3)
State encoding options: binary, one-hot (Unit 10)

State Transition Table

The vending machine FSM uses states representing the accumulated credit in 5¢ increments:

Current State	Coin Input	Next State	Dispense	Change
S0 (0¢)	Nickel	S5	0	0¢
S0 (0¢)	Dime	S10	0	0¢
S0 (0¢)	Quarter	S25	0	0¢
S5 (5¢)	Nickel	S10	0	0¢
S5 (5¢)	Dime	S15	0	0¢
S5 (5¢)	Quarter	S0	1	0¢
S10 (10¢)	Nickel	S15	0	0¢
S10 (10¢)	Dime	S20	0	0¢
S10 (10¢)	Quarter	S0	1	5¢
S15 (15¢)	Nickel	S20	0	0¢
S15 (15¢)	Dime	S25	0	0¢
S15 (15¢)	Quarter	S0	1	10¢
S20 (20¢)	Nickel	S25	0	0¢
S20 (20¢)	Dime	S0	1	0¢
S20 (20¢)	Quarter	S0	1	15¢
S25 (25¢)	Nickel	S0	1	0¢
S25 (25¢)	Dime	S0	1	5¢
S25 (25¢)	Quarter	S0	1	20¢

State Encoding and VHDL

With 6 states, two encoding options are common:

Binary encoding: 3 bits (000–101). Uses fewer flip-flops (3 FFs) but requires more combinational logic for next-state decoding.
One-hot encoding: 6 bits (one FF per state). Uses more flip-flops (6 FFs) but simpler next-state logic — each transition is a single gate. FPGAs favor one-hot because LUTs are abundant but routing is expensive.

-- One-hot state encoding (FPGA-preferred)
type vend_state is (S0, S5, S10, S15, S20, S25);
attribute enum_encoding : string;
attribute enum_encoding of vend_state : type is "one-hot";
signal state : vend_state := S0;

13.15 Design for Testability

Design for Testability (DFT) adds features that make verification easier, both in simulation and in the final hardware:

Scan chains: Convert flip-flops into a shift register chain for loading and observing internal state
Built-In Self-Test (BIST): Include test pattern generators and response checkers on-chip
Observation points: Bring internal signals to test pins or debug registers
Controllability: Ensure every internal state can be forced to both 0 and 1 during testing

For FPGA designs, DFT includes:

ChipScope/SignalTap: Vendor tools that embed logic analyzers inside the FPGA to observe internal signals in real-time
Debug ports: Dedicated I/O pins that output key internal signals
Readable registers: Allow a host processor to read FSM state, counter values, and status flags

Design for Debug: Add a status register that captures the FSM state, error flags, and counter values. This register can be read through a simple interface (SPI, JTAG, or dedicated pins) and dramatically speeds up debugging when the design doesn't work in hardware.

How Scan Chains Work

A scan chain converts normal flip-flops into a shift register that can be loaded and read externally. Each flip-flop gets a multiplexer controlled by a scan_enable signal:

Normal mode (scan_enable = 0): The flip-flop loads data from the combinational logic (normal circuit operation).
Shift mode (scan_enable = 1): The flip-flop loads data from the previous flip-flop in the chain, forming a long shift register.

To test a circuit with scan chains: (1) shift a test pattern into all flip-flops via the scan chain, (2) switch to normal mode for one clock cycle so the combinational logic responds, (3) switch back to shift mode and shift out the results for comparison. This gives full observability and controllability of every internal register.

FPGA Debug with Embedded Logic Analyzers

FPGA vendors provide embedded logic analyzer tools — SignalTap (Intel/Altera) and ChipScope/ILA (Xilinx/AMD) — that insert a small logic analyzer IP core inside the FPGA alongside the user design. The analyzer samples selected internal signals at the system clock rate and stores them in on-chip block RAM. A trigger condition (e.g., "FSM enters LOCKOUT state") starts the capture, and the recorded waveforms are uploaded to the PC via JTAG for inspection.

-- Debug register for the digital lock (readable via JTAG or SPI)
debug_reg <= state_encoding & std_logic_vector(digit_pos)
           & std_logic_vector(attempts) & match & enter_pulse
           & timeout & unlock;

Key Statistic: This 12-bit debug register captures the complete system status in a single read. Industry experience confirms that 80% of FPGA development time is spent debugging, making DFT and debug infrastructure critical investments that pay for themselves many times over.

13.16 From Specification to Silicon

The complete journey from idea to working hardware follows a well-defined path. This table connects each phase to the units where the relevant skills were developed:

Phase	Activity	Course Unit
Specification	Define inputs, outputs, behavior	All units (truth tables, state diagrams)
Number system selection	Choose binary, BCD, signed representation	Unit 1
Boolean logic design	Derive equations, simplify	Units 2, 4, 5, 6
Combinational module selection	Choose adders, MUX, decoders	Units 3, 7, 8
Sequential design	Design registers, counters, FSMs	Units 9, 10
Device selection	Choose CPLD or FPGA	Unit 11
HDL coding	Write VHDL	Unit 12
Verification	Write testbenches, simulate	Units 12, 13
Implementation	Synthesize, place, route	Units 11, 13
Hardware test	Program FPGA, test with real signals	Unit 13

13.17 Career Paths in Digital Design

The skills developed in this course open doors to diverse engineering career paths:

Digital Design Engineer: Designs ASICs and FPGAs for consumer electronics, telecommunications, and computing
Verification Engineer: Develops testbenches, writes assertions, and performs formal verification—the most in-demand role in the semiconductor industry
FPGA Engineer: Implements designs on FPGAs for defense, telecommunications, data centers, and embedded systems
Embedded Systems Engineer: Combines digital hardware with software for IoT devices, automotive systems, and industrial control
Computer Architect: Designs processors, memory systems, and interconnects—building on the ALU and FSM concepts from this course
Test Engineer: Develops production test programs for manufactured chips, applying DFT concepts
Hardware Security Engineer: Analyzes and protects digital designs against side-channel attacks and hardware trojans

Each path builds directly on the foundation established in this course: Boolean algebra for logic optimization, sequential design for state machines, VHDL for implementation, and verification for quality assurance.

13.18 Summary and Key Takeaways

Top-down design manages complexity by decomposing systems into hierarchical, modular subsystems with well-defined interfaces.
Datapath-controller separation divides a system into components that process data (registers, ALU, MUX) and a control FSM that sequences operations.
Verification consumes more effort than design—self-checking testbenches, comprehensive test vectors, and timing analysis are essential for correct implementations.
Static timing analysis determines maximum clock frequency by finding the critical path: \(f_{max} = 1/(T_{cq} + T_{comb,max} + T_{setup})\).
Pipelining increases throughput by inserting registers to break long combinational paths, trading latency for clock speed.
Design trade-offs among area, speed, and power are fundamental to every design decision—there is no single "best" design, only the best design for given constraints.
Real digital systems—locks, ALUs, UART transmitters, vending machines—are composed from the building blocks of prior units: gates, adders, MUXes, flip-flops, registers, counters, and FSMs.
Design for testability and design for debug are not afterthoughts—they should be planned from the beginning.
The journey from specification to working silicon follows a systematic flow of design, verification, and implementation that applies everything learned in this course.

Self-Check: Why does pipelining increase throughput even though it adds latency?

Pipelining divides a long combinational path into shorter stages separated by registers. Each stage has less delay, allowing a higher clock frequency. Although a single result takes more clock cycles to complete (increased latency), a new result is produced every clock cycle once the pipeline is full. The higher clock frequency means more results per second (higher throughput), even though each individual result takes longer. It is analogous to an assembly line: each car takes longer to pass through all stations, but the factory produces more cars per hour.

Interactive Walkthrough

Design a datapath-controller system step-by-step, connecting registers, ALU, and MUX with an FSM controller:

Take the Unit Quiz | See Annotated References

Unit 13: System Integration and Design Projects

Summary

Concepts Covered

Prerequisites

13.1 The Need for System-Level Design

13.2 Top-Down Design Methodology

Diagram: Top-Down Design Methodology

Interactive Diagram: Top-Down Design Flow

13.3 Design Hierarchy and Modularity

Example: Hierarchical ALU Design

Example: 4-Function Calculator Decomposition

13.4 Datapath and Control Unit Separation

Diagram: Datapath-Controller Architecture

Interactive Diagram: Datapath-Controller Architecture

13.5 Interface Specification

Example: ALU Module Interface Specification

13.6 Verification Planning

Example: Verification Plan for the Digital Lock

13.7 Testbench Architecture

Self-Checking Testbench

Testbench with File I/O

13.8 Static Timing Analysis

The Timing Model

Hold Time Check

Diagram: Timing Analysis Visualizer

13.9 Pipelining for Performance

Example: 3-Stage Multiply-Accumulate Pipeline

Diagram: Pipeline Stages

13.10 Design Trade-offs

Resource Sharing

Clock Gating for Power Reduction

13.11 System-Level Example: Digital Combination Lock

Architecture

Input Subsystem

Datapath

Control Unit (FSM)

Output Subsystem

System Operation

Diagram: Digital Lock System Architecture

VHDL Implementation (Simplified Control Unit)

13.12 System-Level Example: 8-Bit ALU

ALU Operations

VHDL Implementation

13.13 System-Level Example: UART Transmitter

Architecture

Diagram: UART Transmission Protocol and Architecture

Diagram: UART Transceiver (Transmitter + Receiver)

13.14 System-Level Example: Vending Machine Controller

State Diagram — Interactive Simulation

State Transition Table

State Encoding and VHDL

13.15 Design for Testability

How Scan Chains Work

FPGA Debug with Embedded Logic Analyzers

13.16 From Specification to Silicon

13.17 Career Paths in Digital Design

13.18 Summary and Key Takeaways

Interactive Walkthrough