Unit 13: System Integration and Design Projects
Unit Overview (click to expand)
Welcome to Unit 13, the capstone of this course. Everything you have learned — from Boolean algebra and logic gates through sequential design, programmable devices, and VHDL — comes together here as we tackle complete system integration and real design projects. The starting point is top-down design methodology. Instead of jumping straight to gates and flip-flops, you begin with a high-level block diagram, then progressively refine each block until you reach the implementation level. A critical part is the separation of a design into a datapath and a control unit. The datapath contains registers, arithmetic units, and buses that process data. The control unit is a finite state machine that generates the signals telling the datapath what to do and when. Once your design is described in VHDL, verification becomes paramount. You will write testbenches that systematically exercise your design. Static timing analysis helps you identify the critical path — the longest delay path — which determines the maximum operating frequency. Real-world design always involves trade-offs. Making a circuit faster often requires more area or consumes more power. These trade-offs determine whether your product meets its battery life target, fits in its package, or stays within budget. To make all of this concrete, we work through several real-world examples, including a digital combination lock, an arithmetic logic unit, a UART serial communication controller, and a vending machine controller. **Key Takeaways** 1. Top-down design methodology and the separation of datapath from control unit are the standard approaches for managing complexity in real digital systems. 2. Verification through testbenches and static timing analysis of the critical path are essential steps that ensure a design is both functionally correct and fast enough. 3. Real-world digital design requires navigating trade-offs among speed, area, and power, and capstone projects bring all course concepts together into practical, complete systems.Summary
This capstone unit brings together every concept from the course—number systems, Boolean algebra, combinational logic, sequential circuits, programmable devices, and VHDL—into complete, integrated digital system designs. Students will learn the top-down design methodology for managing complexity, master systematic verification techniques including testbench design and timing analysis, and apply their skills to realistic design projects. The unit emphasizes the engineering judgment needed to make design trade-offs (speed vs area vs power), partition systems into manageable subsystems, and verify that all parts work together correctly. By completing this unit, students will have the skills to design, describe, simulate, and implement non-trivial digital systems on programmable logic devices.
Concepts Covered
- Top-Down Design Methodology
- Design Hierarchy and Modularity
- System Partitioning
- Interface Specification
- Datapath and Control Unit Separation
- Datapath Design
- Control Unit Design
- Datapath-Controller Integration
- Design Documentation
- Verification Planning
- Functional Verification
- Testbench Architecture
- Self-Checking Testbenches
- Test Vector Generation
- Code Coverage Concepts
- Static Timing Analysis
- Critical Path Identification
- Setup and Hold Time Budgeting
- Clock Frequency Determination
- Pipelining for Performance
- Design Trade-offs: Area vs Speed vs Power
- Resource Sharing and Scheduling
- Design for Testability
- System-Level Example: Digital Lock
- System-Level Example: ALU Design
- System-Level Example: Serial Communication
- System-Level Example: Vending Machine Controller
- Design Review and Optimization
- From Specification to Silicon
- Career Paths in Digital Design
Prerequisites
Before studying this unit, students should be familiar with:
- All combinational logic concepts (Units 2-8)
- Sequential circuit design: flip-flops, registers, counters, FSMs (Units 9-10)
- Programmable logic devices, especially FPGAs (Unit 11)
- VHDL fundamentals: entities, architectures, processes, testbenches (Unit 12)
13.1 The Need for System-Level Design
The circuits designed in prior units—adders, decoders, counters, state machines—are building blocks. Real digital systems combine dozens or hundreds of these blocks into coordinated, multi-component designs. A simple embedded controller might include:
- An arithmetic logic unit (ALU) for computation
- Registers for data storage
- A finite state machine for control sequencing
- Multiplexers for data routing
- I/O interfaces for communication
Designing such a system by immediately writing Boolean equations or drawing gate-level schematics would be overwhelming. Instead, engineers use a top-down design methodology that manages complexity by working from abstract specifications down to detailed implementations.
| Design Level | Description | Tools |
|---|---|---|
| System specification | What the system must do (requirements) | Natural language, timing diagrams |
| Architectural design | How subsystems are organized | Block diagrams, dataflow diagrams |
| Register-Transfer Level (RTL) | How data moves between registers each clock cycle | VHDL/Verilog, state diagrams |
| Gate level | Which gates implement each function | Synthesis output, netlists |
| Physical level | Where gates are placed and routed | FPGA implementation tools |
13.2 Top-Down Design Methodology
The top-down approach decomposes a complex problem into progressively smaller, more manageable pieces:
Step 1: Specification. Define what the system must do—inputs, outputs, timing requirements, performance targets, constraints. A clear specification is the foundation; ambiguity here causes errors that propagate through the entire design.
Step 2: Architecture. Partition the system into major subsystems and define how they communicate. Identify the datapath (the components that process data) and the control unit (the FSM that sequences operations).
Step 3: Detailed design. Design each subsystem individually, using the techniques from prior units. Describe each in VHDL at the RTL level.
Step 4: Integration. Connect subsystems using structural VHDL (component instantiation). Verify interfaces match.
Step 5: Verification. Simulate the integrated design with comprehensive testbenches. Verify timing constraints.
Step 6: Implementation. Synthesize, place, route, and program onto the target FPGA. Verify in hardware.
Diagram: Top-Down Design Methodology
flowchart TD
A["<b>System Specification</b><br/><i>Define inputs, outputs,<br/>timing, constraints</i>"]
B["<b>Partition into Subsystems</b><br/><i>Identify datapath & controller,<br/>define interfaces</i>"]
C["<b>Design Each Module</b><br/><i>RTL design in VHDL,<br/>unit-level simulation</i>"]
D["<b>Integrate</b><br/><i>Connect subsystems,<br/>structural VHDL</i>"]
E["<b>Verify</b><br/><i>Integration testbench,<br/>timing analysis</i>"]
F["<b>Implement</b><br/><i>Synthesize, place & route,<br/>program FPGA</i>"]
A --> B --> C --> D --> E --> F
E -- "<i>Timing violation<br/>or bug found</i>" --> C
F -- "<i>Requirements<br/>change</i>" --> A
style A fill:#7E57C2,stroke:#5A3EED,color:#fff
style B fill:#8E6AC8,stroke:#5A3EED,color:#fff
style C fill:#9C7DD0,stroke:#5A3EED,color:#fff
style D fill:#AA90D8,stroke:#5A3EED,color:#fff
style E fill:#B8A3E0,stroke:#5A3EED,color:#333
style F fill:#C6B6E8,stroke:#5A3EED,color:#333
Interactive Diagram: Top-Down Design Flow
13.3 Design Hierarchy and Modularity
Hierarchy is the primary tool for managing complexity. A hierarchical design consists of modules at multiple levels, where each module:
- Has a well-defined interface (entity in VHDL)
- Performs a single, clear function
- Can be designed and tested independently
- Can be reused in other designs
Example: Hierarchical ALU Design
Click any module to select it, expand/collapse children, and view details in the side panel.
Each leaf module is simple enough to design with the techniques from prior units. Integration assembles them structurally.
Example: 4-Function Calculator Decomposition
Consider a simple calculator that performs addition, subtraction, AND, and OR on two 8-bit numbers, displaying results on a seven-segment display. A top-down decomposition yields:
Calculator (top level)
├── Input Interface
│ ├── Keypad Decoder (4×4 matrix → 4-bit BCD)
│ └── Input Registers (2 × 8-bit, load-enabled)
├── ALU Module
│ ├── 8-bit Adder-Subtractor (Unit 3)
│ ├── Bitwise Logic Unit (AND/OR)
│ └── Output MUX (selects operation result)
├── Control FSM
│ ├── State: ENTER_A → ENTER_B → SELECT_OP → DISPLAY
│ └── Generates: load_A, load_B, alu_sel, display_en
└── Output Interface
├── Binary-to-BCD Converter
└── Seven-Segment Decoder (Unit 3)
Each leaf module is a component studied in earlier units. The hierarchy manages complexity: the top-level design sees only four blocks with well-defined interfaces, not the hundreds of gates inside them. If the ALU needs to support multiplication later, only the ALU Module subtree changes — the Control FSM needs a new opcode, but the Input and Output Interfaces remain untouched. This is the power of modular hierarchy.
Interface for the ALU Module:
| Port | Direction | Width | Description |
|---|---|---|---|
a, b |
in | 8 bits | Operands from input registers |
op_sel |
in | 2 bits | Operation: 00=Add, 01=Sub, 10=AND, 11=OR |
result |
out | 8 bits | Computation result |
flags |
out | 3 bits | Zero, Carry, Negative |
13.4 Datapath and Control Unit Separation
Most digital systems can be decomposed into two complementary parts:
Datapath: The components that store, transport, and transform data. This includes registers, ALUs, multiplexers, shifters, and buses. The datapath performs the "work" of the system.
Control Unit: A finite state machine (or set of FSMs) that generates the control signals directing the datapath. The control unit determines when and how data moves through the datapath, based on the current state and input conditions.
This separation is powerful because:
- The datapath can be designed using combinational and sequential building blocks from Units 3–10
- The control unit is a standard FSM design problem from Unit 10
- Changes to the control sequence do not require redesigning the datapath hardware
- The same datapath can be reused with different control units for different operations
Diagram: Datapath-Controller Architecture
flowchart TD
subgraph CTRL["<b>Controller (FSM)</b>"]
FSM["State Register +<br/>Next-State Logic<br/><i>Generates: MUX_sel, Reg_load, ALU_op</i>"]
end
subgraph DP["<b>Datapath</b>"]
REG["<b>Registers</b><br/><i>R0, R1, ..., Rn</i>"]
MUX["<b>MUX</b><br/><i>Data Select</i>"]
ALU["<b>ALU</b><br/><i>Add, Sub,<br/>AND, OR</i>"]
REG --> MUX --> ALU
ALU -- "Result" --> REG
end
FSM -- "<b>Control Signals</b><br/><i>MUX_sel, Reg_load,<br/>ALU_op, Shift_en</i>" --> DP
DP -- "<b>Status Signals</b><br/><i>Zero, Carry,<br/>Overflow, Sign</i>" --> FSM
style FSM fill:#7E57C2,stroke:#5A3EED,color:#fff
style CTRL fill:#F0ECFF,stroke:#5A3EED,color:#333
style DP fill:#F0ECFF,stroke:#5A3EED,color:#333
style REG fill:#EEF4FF,stroke:#A8C8FF,color:#333
style MUX fill:#EEF4FF,stroke:#A8C8FF,color:#333
style ALU fill:#EEF4FF,stroke:#A8C8FF,color:#333
Interactive Diagram: Datapath-Controller Architecture
13.5 Interface Specification
When multiple engineers (or multiple modules) must connect, interface specifications prevent integration errors. An interface specification for each module boundary includes:
- Port names and types (the VHDL entity)
- Timing relationships (when signals are valid relative to the clock)
- Protocol (handshake signals, ready/valid, request/acknowledge)
- Data encoding (unsigned, signed, BCD, one-hot)
- Reset behavior (what state does the module enter on reset?)
A simple but effective interface pattern is the ready/valid handshake:
Producer asserts 'valid' when data is available
Consumer asserts 'ready' when it can accept data
Data transfers only when BOTH valid AND ready are HIGH
This pattern decouples the timing of producer and consumer, preventing data loss or duplication.
Example: ALU Module Interface Specification
A complete interface specification for an 8-bit ALU module documents everything another engineer needs to connect to it:
Port Specification:
| Port | Direction | Type | Description |
|---|---|---|---|
clk |
in | std_logic |
System clock (rising-edge active) |
rst |
in | std_logic |
Synchronous reset (active high) |
a |
in | std_logic_vector(7 downto 0) |
First operand |
b |
in | std_logic_vector(7 downto 0) |
Second operand |
op |
in | std_logic_vector(2 downto 0) |
Operation select (see table in 13.12) |
start |
in | std_logic |
Pulse to begin operation |
result |
out | std_logic_vector(7 downto 0) |
Computation result |
done |
out | std_logic |
High for one cycle when result is valid |
flags |
out | std_logic_vector(3 downto 0) |
{Carry, Zero, Negative, Overflow} |
Timing: Operands a, b, and op must be stable before the rising clock edge when start is asserted. The result and flags outputs are valid on the same clock edge that done is asserted (one cycle after start for single-cycle operations).
Reset behavior: On rst = '1', result is driven to "00000000", all flags are cleared, and done is deasserted.
13.6 Verification Planning
Professional digital design devotes more effort to verification than to design itself—typically a 60/40 or even 70/30 split. A verification plan defines:
- What to test: Every input combination? Every state transition? Every boundary condition?
- How to test: Manual stimulus? Random stimulus? Formal verification?
- Pass/fail criteria: How does the testbench determine correctness automatically?
- Coverage goals: What percentage of code, states, and transitions must be exercised?
For the designs in this course, verification focuses on:
- Functional correctness: Does the design produce correct outputs for all relevant inputs?
- Timing correctness: Does the design meet setup and hold time requirements at the target clock frequency?
- Reset behavior: Does the design initialize correctly?
- Edge cases: Does the design handle boundary conditions (overflow, maximum count, all-zeros, all-ones)?
Example: Verification Plan for the Digital Lock
A structured verification plan for the digital combination lock (Section 13.11) organizes tests by category:
| Category | Test | Input Sequence | Expected Result |
|---|---|---|---|
| Normal | Correct code on first try | 3→7→1→9 | Unlock asserted |
| Normal | Wrong first digit, then correct | 5→(reset)→3→7→1→9 | Unlock after retry |
| Boundary | All zeros entered | 0→0→0→0 | Remain locked, attempt +1 |
| Boundary | Correct code at attempt 3 | 2 wrong + correct | Unlock (just in time) |
| Error | Three wrong attempts | Wrong × 3 | Lockout for 30 seconds |
| Error | Enter pressed with no digit | Enter (no BCD change) | No state change |
| Reset | Reset during code entry | 3→7→Reset | Return to IDLE, counters cleared |
| Reset | Reset during lockout | Lockout → Reset | Return to IDLE |
| Timing | Rapid button presses | 2 enters within 1 clock | Only one digit registered |
| Timing | Lockout timer accuracy | Enter lockout → wait | Timeout at exactly 30s ± 1 cycle |
assert statements in the testbench. A well-designed verification plan like this typically catches 90% of bugs before hardware testing — the remaining 10% often involve physical timing issues that only appear on real hardware.
13.7 Testbench Architecture
Unit 12 introduced basic testbenches. For system-level verification, testbenches become more sophisticated:
Self-Checking Testbench
A self-checking testbench automatically compares the DUT's outputs against expected values, reporting errors without requiring manual waveform inspection:
-- Self-checking testbench for 4-bit adder
verify: process
begin
-- Test case 1: 3 + 5 = 8
a_tb <= "0011"; b_tb <= "0101"; cin_tb <= '0';
wait for 10 ns;
assert (sum_tb = "1000" and cout_tb = '0')
report "FAIL: 3 + 5 should equal 8"
severity error;
-- Test case 2: 15 + 1 = 0 with carry
a_tb <= "1111"; b_tb <= "0001"; cin_tb <= '0';
wait for 10 ns;
assert (sum_tb = "0000" and cout_tb = '1')
report "FAIL: 15 + 1 should produce carry"
severity error;
-- More test cases...
report "All tests passed!" severity note;
wait;
end process verify;
The assert statement checks a condition. When the condition is FALSE, it prints the message and raises the specified severity level. This automates the tedious process of manually checking waveforms.
Testbench with File I/O
For designs with many test vectors, reading stimulus from a file is more practical than hardcoding values:
-- Read test vectors from file
file_reader: process
file test_file : text open read_mode is "test_vectors.txt";
variable line_v : line;
variable a_v, b_v, expected_v : std_logic_vector(3 downto 0);
begin
while not endfile(test_file) loop
readline(test_file, line_v);
read(line_v, a_v); read(line_v, b_v); read(line_v, expected_v);
a_tb <= a_v; b_tb <= b_v;
wait for 10 ns;
assert (result_tb = expected_v)
report "FAIL at inputs: " & to_string(a_v) & ", " & to_string(b_v)
severity error;
end loop;
wait;
end process file_reader;
13.8 Static Timing Analysis
After synthesis and place-and-route, the FPGA tools perform static timing analysis (STA) to verify that the design operates correctly at the target clock frequency.
The Timing Model
Every signal path from a flip-flop output through combinational logic to the next flip-flop input must satisfy:
Where:
- \(T_{clk}\) = clock period
- \(T_{cq}\) = clock-to-Q delay of the source flip-flop (from Unit 9)
- \(T_{comb}\) = worst-case propagation delay through combinational logic
- \(T_{setup}\) = setup time of the destination flip-flop
The critical path is the path with the largest \(T_{cq} + T_{comb} + T_{setup}\). It determines the maximum clock frequency the design can achieve:
Hold Time Check
Additionally, every path must satisfy the hold time requirement:
This ensures that data doesn't change too quickly after the clock edge. Hold time violations are independent of clock frequency and must be fixed by adding delay to the path.
Diagram: Timing Analysis Visualizer
13.9 Pipelining for Performance
When the critical path limits clock frequency to an unacceptable level, pipelining breaks the critical path by inserting registers at intermediate points. This trades latency (more clock cycles to complete one operation) for throughput (higher clock frequency, more operations per second).
Without pipelining:
- Critical path delay: \(T_{comb} = 20\) ns
- With \(T_{cq} = 2\) ns, \(T_{setup} = 1\) ns
- \(f_{max} = 1/(2 + 20 + 1) = 43.5\) MHz
- Throughput: 43.5 million operations/second
With one pipeline stage (splitting the combinational logic in half):
- Each stage: \(T_{comb} = 10\) ns
- \(f_{max} = 1/(2 + 10 + 1) = 76.9\) MHz
- Latency: 2 clock cycles per result
- Throughput: 76.9 million operations/second (1.77× improvement)
The pipeline stage adds one clock cycle of delay but nearly doubles the throughput. This is the same principle used in modern processors, which may have 10–20+ pipeline stages.
Example: 3-Stage Multiply-Accumulate Pipeline
Consider a multiply-accumulate (MAC) unit: \(R = R + A \times B\). Without pipelining, the critical path includes the multiplier delay (15 ns) plus the adder delay (8 ns) plus register overhead:
Splitting into three pipeline stages:
| Stage | Operation | Delay |
|---|---|---|
| Stage 1 | Partial products (first half of multiply) | 8 ns |
| Stage 2 | Complete multiply (second half) | 7 ns |
| Stage 3 | Accumulate (add to running total) | 8 ns |
Results comparison:
| Metric | Unpipelined | 3-Stage Pipeline |
|---|---|---|
| Clock frequency | 38.5 MHz | 90.9 MHz |
| Latency per result | 1 cycle (26 ns) | 3 cycles (33 ns) |
| Throughput | 38.5 M ops/sec | 90.9 M ops/sec |
| Added flip-flops | — | 2 × register width |
The pipeline achieves 2.36× throughput at the cost of 2 extra cycles of latency and additional flip-flops. In a DSP application processing a continuous stream of audio samples, the extra latency is negligible (33 ns vs 26 ns), but the throughput gain allows processing 2.36× more channels.
Diagram: Pipeline Stages
flowchart LR
IN["<b>Input</b>"]
S1["<b>Stage 1</b><br/><i>Combinational<br/>Logic A</i>"]
R1["<b>Reg</b><br/><i>Pipeline<br/>Register</i>"]
S2["<b>Stage 2</b><br/><i>Combinational<br/>Logic B</i>"]
R2["<b>Reg</b><br/><i>Pipeline<br/>Register</i>"]
S3["<b>Stage 3</b><br/><i>Combinational<br/>Logic C</i>"]
OUT["<b>Output</b>"]
IN --> S1 --> R1 --> S2 --> R2 --> S3 --> OUT
style IN fill:#7E57C2,stroke:#5A3EED,color:#fff
style S1 fill:#EEF4FF,stroke:#A8C8FF,color:#333
style R1 fill:#FFD700,stroke:#DAA520,color:#333
style S2 fill:#EEF4FF,stroke:#A8C8FF,color:#333
style R2 fill:#FFD700,stroke:#DAA520,color:#333
style S3 fill:#EEF4FF,stroke:#A8C8FF,color:#333
style OUT fill:#7E57C2,stroke:#5A3EED,color:#fff
*Without pipelining, the critical path spans all three stages. With pipeline registers (gold), each stage runs independently at a higher clock frequency. Throughput increases at the cost of added latency and register area.*
13.10 Design Trade-offs
Every design decision involves trade-offs among three fundamental metrics:
- Area (resource usage): How many LUTs, flip-flops, and routing resources does the design consume?
- Speed (clock frequency): How fast can the design operate?
- Power (energy consumption): How much power does the design dissipate?
These metrics are interrelated:
| Optimization | Effect on Area | Effect on Speed | Effect on Power |
|---|---|---|---|
| Pipelining | Increases (more FFs) | Increases (shorter critical path) | Increases (more switching) |
| Resource sharing | Decreases (fewer units) | Decreases (MUX overhead) | Mixed |
| Parallel execution | Increases (duplicated units) | Increases (more work per cycle) | Increases |
| Logic minimization | Decreases (fewer gates) | Increases (shorter paths) | Decreases |
| Clock gating | No change | No change | Decreases |
Resource Sharing
Resource sharing reuses a single hardware unit for multiple operations by time-multiplexing it with a control FSM and multiplexers:
Instead of using two separate adders for \(R = A + B\) and \(S = C + D\), use one adder with a MUX:
Cycle 1: MUX selects A,B → Adder → Store in R
Cycle 2: MUX selects C,D → Adder → Store in S
This halves the adder count but doubles the execution time and adds MUX area. Resource sharing is valuable when area is constrained and the operations don't need to happen simultaneously.
Clock Gating for Power Reduction
Dynamic power in CMOS circuits is governed by:
Where \(\alpha\) is the switching activity (fraction of gates toggling per cycle), \(C\) is total capacitance, \(V_{DD}\) is supply voltage, and \(f\) is clock frequency. Even when a module is idle, its flip-flops toggle on every clock edge, consuming power with no useful work.
Clock gating disables the clock to unused modules, eliminating their switching activity entirely:
-- Clock gating in VHDL
gated_clk <= clk and module_enable;
On FPGAs, vendor-specific clock buffer primitives (Xilinx BUFGCE, Intel ALTCLKCTRL) implement clock gating safely, avoiding the glitches that a simple AND gate would produce on the clock signal. The FPGA synthesis tools can also automatically infer clock enables from VHDL if statements when the enable covers an entire process.
13.11 System-Level Example: Digital Combination Lock
This example integrates concepts from across the course into a complete system.
Specification:
- 4-digit combination lock (each digit 0–9)
- Input: 4-bit BCD digit, "Enter" button, "Reset" button
- Output: "Unlock" signal, 2-digit display showing entry progress
- The correct combination is hardcoded (e.g., 3-7-1-9)
- Lock allows 3 attempts before a 30-second lockout
Architecture
The digital combination lock employs a datapath-controller architecture. The controller — a five-state finite state machine (FSM) — orchestrates all sequencing decisions, while the datapath performs digit storage, comparison, and counting under the controller's direction. Four subsystems are organized in a pipeline from input conditioning through output indication.
Input Subsystem
The input subsystem conditions raw external signals into clean, synchronous events for the datapath and controller.
| Block | Type | Function |
|---|---|---|
| Debouncer | Sequential | Filters mechanical switch bounce using a shift-register majority detector |
| Edge Detector | Sequential | Converts the debounced Enter signal into a single-clock-cycle pulse (enter_pulse) |
| BCD Input Register | Sequential | 4-bit register; latches the current BCD digit when the controller asserts load_digit |
Datapath
The datapath contains both combinational comparison logic and sequential counters. The controller drives all counter enables and clears.
| Block | Type | Function |
|---|---|---|
| Combination ROM | Combinational | 4 × 4-bit look-up table storing the correct code; addressed by digit_count |
| BCD Comparator | Combinational | Produces match = 1 when the input register equals the ROM output at the current address |
| Digit Counter | Sequential | 2-bit up-counter (0–3); incremented by inc_digit_ctr, cleared by clr_digit_ctr |
| Attempt Counter | Sequential | 2-bit up-counter (0–3); incremented by inc_attempt, cleared by clr_attempt |
Control Unit (FSM)
The controller is a Mealy/Moore hybrid FSM with five states. It reads status signals from the datapath and issues control signals in return.
States: IDLE → WAIT_DIGIT → CHECK → UNLOCK or LOCKOUT
| Signal Direction | Signals |
|---|---|
| Inputs (from datapath/input) | enter_pulse, match, digit_done (digit counter = 3), max_attempts (attempt counter = 3), timeout |
| Outputs (to datapath/output) | load_digit, inc_digit_ctr, clr_digit_ctr, inc_attempt, clr_attempt, start_timer, unlock_en, lockout_en |
The FSM also contains a Lockout Timer — a sequential down-counter that generates timeout after 30 seconds, returning the system from LOCKOUT to IDLE.
Output Subsystem
| Block | Type | Function |
|---|---|---|
| Unlock Register | Sequential | SR latch; set by unlock_en, cleared on reset |
| Progress Display | Combinational | Decodes digit_count to show entered positions (e.g., "3 7 _ _") |
| Status LEDs | Combinational | Green = unlocked, Red = locked, Flashing Red = lockout active |
System Operation
On power-up the FSM enters IDLE and clears all counters. When the user presses a BCD digit and hits Enter, the edge detector produces enter_pulse, advancing the FSM to CHECK. The controller asserts load_digit, latching the digit into the input register. The comparator evaluates the latched digit against the ROM value at the address selected by the digit counter and drives match accordingly. On a match the FSM asserts inc_digit_ctr and returns to WAIT_DIGIT; once the digit counter reaches 3 (digit_done), the FSM transitions to UNLOCK and asserts unlock_en. On a mismatch the FSM asserts inc_attempt and clr_digit_ctr, restarting the sequence. If the attempt counter saturates (max_attempts), the FSM enters LOCKOUT, asserts start_timer, and waits for timeout before returning to IDLE.
Diagram: Digital Lock System Architecture
VHDL Implementation (Simplified Control Unit)
type lock_state is (IDLE, WAIT_DIGIT, CHECK, UNLOCK, LOCKOUT);
signal state : lock_state;
signal digit_pos : unsigned(1 downto 0); -- 0 to 3
signal attempts : unsigned(1 downto 0); -- 0 to 3
signal lockout_timer : unsigned(14 downto 0); -- counts to 30 sec
-- Control FSM
process(clk, rst)
begin
if rst = '1' then
state <= IDLE;
digit_pos <= "00";
attempts <= "00";
elsif rising_edge(clk) then
case state is
when IDLE =>
digit_pos <= "00";
state <= WAIT_DIGIT;
when WAIT_DIGIT =>
if enter_edge = '1' then
state <= CHECK;
end if;
when CHECK =>
if digit_match = '1' then
if digit_pos = 3 then
state <= UNLOCK;
else
digit_pos <= digit_pos + 1;
state <= WAIT_DIGIT;
end if;
else
attempts <= attempts + 1;
if attempts = 2 then
state <= LOCKOUT;
else
state <= IDLE;
end if;
end if;
when UNLOCK =>
unlock <= '1';
if rst = '1' then
state <= IDLE;
end if;
when LOCKOUT =>
if lockout_timer = 0 then
state <= IDLE;
attempts <= "00";
end if;
end case;
end if;
end process;
13.12 System-Level Example: 8-Bit ALU
An Arithmetic Logic Unit (ALU) is the computational heart of any processor. This example designs an 8-bit ALU that performs the operations enabled by the circuits studied throughout the course.
ALU Operations
| Operation Code | Operation | Unit Reference |
|---|---|---|
| 000 | Addition (\(A + B\)) | Unit 3 (Full Adder) |
| 001 | Subtraction (\(A - B\)) | Unit 3 (Adder-Subtractor) |
| 010 | Bitwise AND (\(A \cdot B\)) | Unit 2 (AND Gate) |
| 011 | Bitwise OR (\(A + B\)) | Unit 2 (OR Gate) |
| 100 | Bitwise XOR (\(A \oplus B\)) | Unit 2 (XOR Gate) |
| 101 | Bitwise NOT (\(\bar{A}\)) | Unit 2 (NOT Gate) |
| 110 | Shift Left (\(A \ll 1\)) | Unit 10 (Shift Register) |
| 111 | Shift Right (\(A \gg 1\)) | Unit 10 (Shift Register) |
VHDL Implementation
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity alu8 is
port (
a, b : in std_logic_vector(7 downto 0);
op : in std_logic_vector(2 downto 0);
result : out std_logic_vector(7 downto 0);
zero : out std_logic;
carry : out std_logic;
neg : out std_logic
);
end entity alu8;
architecture rtl of alu8 is
signal temp : unsigned(8 downto 0); -- 9 bits for carry
begin
process(a, b, op)
begin
temp <= (others => '0');
case op is
when "000" => -- ADD
temp <= ('0' & unsigned(a)) + ('0' & unsigned(b));
when "001" => -- SUB
temp <= ('0' & unsigned(a)) - ('0' & unsigned(b));
when "010" => -- AND
temp(7 downto 0) <= unsigned(a and b);
when "011" => -- OR
temp(7 downto 0) <= unsigned(a or b);
when "100" => -- XOR
temp(7 downto 0) <= unsigned(a xor b);
when "101" => -- NOT
temp(7 downto 0) <= unsigned(not a);
when "110" => -- Shift Left
temp(7 downto 0) <= unsigned(a(6 downto 0) & '0');
when "111" => -- Shift Right
temp(7 downto 0) <= unsigned('0' & a(7 downto 1));
when others =>
temp <= (others => '0');
end case;
end process;
result <= std_logic_vector(temp(7 downto 0));
carry <= std_logic(temp(8));
zero <= '1' when temp(7 downto 0) = 0 else '0';
neg <= std_logic(temp(7));
end architecture rtl;
- Binary addition and subtraction (Unit 1, Unit 3)
- Boolean operations (Unit 2)
- Shift operations (Unit 10)
- Multiplexer-like selection via case statement (Unit 8)
- Status flag generation (Units 1, 3)
13.13 System-Level Example: UART Transmitter
A Universal Asynchronous Receiver/Transmitter (UART) is one of the most common serial communication interfaces. The transmitter converts parallel data into a serial bitstream:
Protocol:
- Idle state: line held HIGH
- Start bit: line goes LOW for one bit period
- Data bits: 8 bits transmitted LSB first
- Stop bit: line returns HIGH for one bit period
Architecture
UART Transmitter
├── Baud Rate Generator (counter that divides clock to baud rate)
├── Shift Register (parallel-to-serial conversion)
├── Bit Counter (tracks which bit is being transmitted)
└── Control FSM (sequences: idle → start → data × 8 → stop → idle)
- Counter design (Unit 10) for baud rate generation
- Shift register (Unit 10) for parallel-to-serial conversion
- FSM (Unit 10) for transmission control
- All implemented in VHDL (Unit 12)
Diagram: UART Transmission Protocol and Architecture
Diagram: UART Transceiver (Transmitter + Receiver)
13.14 System-Level Example: Vending Machine Controller
A vending machine controller is a classic FSM design problem that exercises many concepts:
Specification:
- Accepts nickels (5¢), dimes (10¢), and quarters (25¢)
- Item costs 30¢
- Must make change if overpaid
- Two inputs: coin_type (2 bits), coin_inserted (pulse)
- Two outputs: dispense (pulse), change_amount (4 bits)
State Diagram — Interactive Simulation
The FSM tracks the accumulated amount. Insert coins using the buttons below, then press Step (Clock) to advance the FSM one cycle at a time. Watch the state transitions, control signals, and datapath operations update in real time. When the total reaches or exceeds 30¢, the machine dispenses the item, outputs change, and returns to IDLE.
- FSM design methodology (Unit 10)
- Binary arithmetic for change calculation (Unit 1)
- BCD representation for display (Unit 3)
- State encoding options: binary, one-hot (Unit 10)
State Transition Table
The vending machine FSM uses states representing the accumulated credit in 5¢ increments:
| Current State | Coin Input | Next State | Dispense | Change |
|---|---|---|---|---|
| S0 (0¢) | Nickel | S5 | 0 | 0¢ |
| S0 (0¢) | Dime | S10 | 0 | 0¢ |
| S0 (0¢) | Quarter | S25 | 0 | 0¢ |
| S5 (5¢) | Nickel | S10 | 0 | 0¢ |
| S5 (5¢) | Dime | S15 | 0 | 0¢ |
| S5 (5¢) | Quarter | S0 | 1 | 0¢ |
| S10 (10¢) | Nickel | S15 | 0 | 0¢ |
| S10 (10¢) | Dime | S20 | 0 | 0¢ |
| S10 (10¢) | Quarter | S0 | 1 | 5¢ |
| S15 (15¢) | Nickel | S20 | 0 | 0¢ |
| S15 (15¢) | Dime | S25 | 0 | 0¢ |
| S15 (15¢) | Quarter | S0 | 1 | 10¢ |
| S20 (20¢) | Nickel | S25 | 0 | 0¢ |
| S20 (20¢) | Dime | S0 | 1 | 0¢ |
| S20 (20¢) | Quarter | S0 | 1 | 15¢ |
| S25 (25¢) | Nickel | S0 | 1 | 0¢ |
| S25 (25¢) | Dime | S0 | 1 | 5¢ |
| S25 (25¢) | Quarter | S0 | 1 | 20¢ |
State Encoding and VHDL
With 6 states, two encoding options are common:
- Binary encoding: 3 bits (000–101). Uses fewer flip-flops (3 FFs) but requires more combinational logic for next-state decoding.
- One-hot encoding: 6 bits (one FF per state). Uses more flip-flops (6 FFs) but simpler next-state logic — each transition is a single gate. FPGAs favor one-hot because LUTs are abundant but routing is expensive.
-- One-hot state encoding (FPGA-preferred)
type vend_state is (S0, S5, S10, S15, S20, S25);
attribute enum_encoding : string;
attribute enum_encoding of vend_state : type is "one-hot";
signal state : vend_state := S0;
13.15 Design for Testability
Design for Testability (DFT) adds features that make verification easier, both in simulation and in the final hardware:
- Scan chains: Convert flip-flops into a shift register chain for loading and observing internal state
- Built-In Self-Test (BIST): Include test pattern generators and response checkers on-chip
- Observation points: Bring internal signals to test pins or debug registers
- Controllability: Ensure every internal state can be forced to both 0 and 1 during testing
For FPGA designs, DFT includes:
- ChipScope/SignalTap: Vendor tools that embed logic analyzers inside the FPGA to observe internal signals in real-time
- Debug ports: Dedicated I/O pins that output key internal signals
- Readable registers: Allow a host processor to read FSM state, counter values, and status flags
How Scan Chains Work
A scan chain converts normal flip-flops into a shift register that can be loaded and read externally. Each flip-flop gets a multiplexer controlled by a scan_enable signal:
- Normal mode (
scan_enable = 0): The flip-flop loads data from the combinational logic (normal circuit operation). - Shift mode (
scan_enable = 1): The flip-flop loads data from the previous flip-flop in the chain, forming a long shift register.
To test a circuit with scan chains: (1) shift a test pattern into all flip-flops via the scan chain, (2) switch to normal mode for one clock cycle so the combinational logic responds, (3) switch back to shift mode and shift out the results for comparison. This gives full observability and controllability of every internal register.
FPGA Debug with Embedded Logic Analyzers
FPGA vendors provide embedded logic analyzer tools — SignalTap (Intel/Altera) and ChipScope/ILA (Xilinx/AMD) — that insert a small logic analyzer IP core inside the FPGA alongside the user design. The analyzer samples selected internal signals at the system clock rate and stores them in on-chip block RAM. A trigger condition (e.g., "FSM enters LOCKOUT state") starts the capture, and the recorded waveforms are uploaded to the PC via JTAG for inspection.
-- Debug register for the digital lock (readable via JTAG or SPI)
debug_reg <= state_encoding & std_logic_vector(digit_pos)
& std_logic_vector(attempts) & match & enter_pulse
& timeout & unlock;
13.16 From Specification to Silicon
The complete journey from idea to working hardware follows a well-defined path. This table connects each phase to the units where the relevant skills were developed:
| Phase | Activity | Course Unit |
|---|---|---|
| Specification | Define inputs, outputs, behavior | All units (truth tables, state diagrams) |
| Number system selection | Choose binary, BCD, signed representation | Unit 1 |
| Boolean logic design | Derive equations, simplify | Units 2, 4, 5, 6 |
| Combinational module selection | Choose adders, MUX, decoders | Units 3, 7, 8 |
| Sequential design | Design registers, counters, FSMs | Units 9, 10 |
| Device selection | Choose CPLD or FPGA | Unit 11 |
| HDL coding | Write VHDL | Unit 12 |
| Verification | Write testbenches, simulate | Units 12, 13 |
| Implementation | Synthesize, place, route | Units 11, 13 |
| Hardware test | Program FPGA, test with real signals | Unit 13 |
13.17 Career Paths in Digital Design
The skills developed in this course open doors to diverse engineering career paths:
- Digital Design Engineer: Designs ASICs and FPGAs for consumer electronics, telecommunications, and computing
- Verification Engineer: Develops testbenches, writes assertions, and performs formal verification—the most in-demand role in the semiconductor industry
- FPGA Engineer: Implements designs on FPGAs for defense, telecommunications, data centers, and embedded systems
- Embedded Systems Engineer: Combines digital hardware with software for IoT devices, automotive systems, and industrial control
- Computer Architect: Designs processors, memory systems, and interconnects—building on the ALU and FSM concepts from this course
- Test Engineer: Develops production test programs for manufactured chips, applying DFT concepts
- Hardware Security Engineer: Analyzes and protects digital designs against side-channel attacks and hardware trojans
Each path builds directly on the foundation established in this course: Boolean algebra for logic optimization, sequential design for state machines, VHDL for implementation, and verification for quality assurance.
13.18 Summary and Key Takeaways
- Top-down design manages complexity by decomposing systems into hierarchical, modular subsystems with well-defined interfaces.
- Datapath-controller separation divides a system into components that process data (registers, ALU, MUX) and a control FSM that sequences operations.
- Verification consumes more effort than design—self-checking testbenches, comprehensive test vectors, and timing analysis are essential for correct implementations.
- Static timing analysis determines maximum clock frequency by finding the critical path: \(f_{max} = 1/(T_{cq} + T_{comb,max} + T_{setup})\).
- Pipelining increases throughput by inserting registers to break long combinational paths, trading latency for clock speed.
- Design trade-offs among area, speed, and power are fundamental to every design decision—there is no single "best" design, only the best design for given constraints.
- Real digital systems—locks, ALUs, UART transmitters, vending machines—are composed from the building blocks of prior units: gates, adders, MUXes, flip-flops, registers, counters, and FSMs.
- Design for testability and design for debug are not afterthoughts—they should be planned from the beginning.
- The journey from specification to working silicon follows a systematic flow of design, verification, and implementation that applies everything learned in this course.
Self-Check: Why does pipelining increase throughput even though it adds latency?
Pipelining divides a long combinational path into shorter stages separated by registers. Each stage has less delay, allowing a higher clock frequency. Although a single result takes more clock cycles to complete (increased latency), a new result is produced every clock cycle once the pipeline is full. The higher clock frequency means more results per second (higher throughput), even though each individual result takes longer. It is analogous to an assembly line: each car takes longer to pass through all stations, but the factory produces more cars per hour.
Interactive Walkthrough
Design a datapath-controller system step-by-step, connecting registers, ALU, and MUX with an FSM controller: