Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rust Patterns & Engineering How-Tos / Rust 模式与工程实践手册

Speaker Intro / 讲师简介

  • Principal Firmware Architect in Microsoft SCHIE (Silicon and Cloud Hardware Infrastructure Engineering) team / Microsoft SCHIE(Silicon and Cloud Hardware Infrastructure Engineering)团队首席固件架构师
  • Industry veteran with expertise in security, systems programming (firmware, operating systems, hypervisors), CPU and platform architecture, and C++ systems / 在安全、系统编程(固件、操作系统、虚拟机监控器)、CPU 与平台架构以及 C++ 系统方面经验丰富
  • Started programming in Rust in 2017 (@AWS EC2), and have been in love with the language ever since / 2017 年在 AWS EC2 开始使用 Rust,此后长期深度投入

A practical guide to intermediate-and-above Rust patterns that arise in real codebases. This is not a language tutorial - it assumes you can write basic Rust and want to level up. Each chapter isolates one concept, explains when and why to use it, and provides compilable examples with inline exercises.

这是一本聚焦真实代码库中常见中高级 Rust 模式的实用指南。它不是语言入门教程,而是假设你已经能编写基础 Rust,并希望继续进阶。每章聚焦一个概念,解释何时使用、为何使用,并提供可编译示例与内联练习。

Who This Is For / 适合谁阅读

  • Developers who have finished The Rust Programming Language but struggle with “how do I actually design this?” / 已经读完 The Rust Programming Language,但仍困惑“真实系统到底该怎么设计”的开发者
  • C++/C# engineers translating production systems into Rust / 正在把生产系统从 C++/C# 迁移到 Rust 的工程师
  • Anyone who has hit a wall with generics, trait bounds, or lifetime errors and wants a systematic toolkit / 被泛型、trait 约束或生命周期报错卡住,希望建立系统化工具箱的人

Prerequisites / 前置知识

Before starting, you should be comfortable with:

开始之前,你应当熟悉以下内容:

  • Ownership, borrowing, and lifetimes (basic level) / 所有权、借用与生命周期(基础层面)
  • Enums, pattern matching, and Option/Result / 枚举、模式匹配以及 Option/Result
  • Structs, methods, and basic traits (Display, Debug, Clone) / 结构体、方法与基础 trait(如 DisplayDebugClone
  • Cargo basics: cargo build, cargo test, cargo run / Cargo 基础:cargo buildcargo testcargo run

How to Use This Book / 如何使用本书

Difficulty Legend / 难度说明

Each chapter is tagged with a difficulty level:

每章都带有难度标记:

Symbol / 标记Level / 等级Meaning / 含义
🟢Fundamentals / 基础Core concepts every Rust developer needs / 每个 Rust 开发者都需要掌握的核心概念
🟡Intermediate / 中级Patterns used in production codebases / 生产代码中常见的模式
🔶Advanced / 高级Deep language mechanics - revisit as needed / 深入语言机制,建议按需反复回看

Pacing Guide / 学习节奏建议

Chapters / 章节Topic / 主题Suggested Time / 建议时间Checkpoint / 检查点
Part I: Type-Level Patterns / 类型层模式
1. Generics 🟢Monomorphization, const generics, const fn / 单态化、const 泛型、const fn1-2 hours / 1-2 小时Can explain when dyn Trait beats generics / 能解释何时 dyn Trait 比泛型更合适
2. Traits 🟡Associated types, GATs, blanket impls, vtables / 关联类型、GAT、blanket impl、vtable3-4 hours / 3-4 小时Can design a trait with associated types / 能设计带关联类型的 trait
3. Newtype & Type-State 🟡Zero-cost safety, compile-time FSMs / 零成本安全、编译期有限状态机2-3 hours / 2-3 小时Can build a type-state builder pattern / 能实现 type-state builder 模式
4. PhantomData 🔶Lifetime branding, variance, drop check / 生命周期品牌化、变型、drop check2-3 hours / 2-3 小时Can explain why PhantomData<fn(T)> differs from PhantomData<T> / 能解释 PhantomData<fn(T)>PhantomData<T> 的区别
Part II: Concurrency & Runtime / 并发与运行时
5. Channels 🟢mpsc, crossbeam, select!, actors / mpsc、crossbeam、select!、actor1-2 hours / 1-2 小时Can implement a channel-based worker pool / 能实现基于 channel 的 worker pool
6. Concurrency 🟡Threads, rayon, Mutex, RwLock, atomics / 线程、rayon、Mutex、RwLock、原子类型2-3 hours / 2-3 小时Can pick the right sync primitive for a scenario / 能为具体场景选择合适的同步原语
7. Closures 🟢Fn, FnMut, FnOnce, higher-order functions / Fn、FnMut、FnOnce、高阶函数1-2 hours / 1-2 小时Can explain the three closure traits / 能解释三种闭包 trait
8. Functional vs Imperative 🟡Iterator combinators, adapters, side-effect management / 迭代器组合器、适配器、副作用管理1-2 hours / 1-2 小时Can explain when to use .fold() vs for loop / 能解释何时使用 .fold() 而不是 for 循环
9. Smart Pointers 🟡Box, Rc, Arc, RefCell, Cow, Pin / Box、Rc、Arc、RefCell、Cow、Pin2-3 hours / 2-3 小时Can explain when to use each smart pointer / 能解释每种智能指针的适用场景
Part III: Systems & Production / 系统与生产实践
10. Error Handling 🟢thiserror, anyhow, ? operator / thiserror、anyhow、? 操作符1-2 hours / 1-2 小时Can design an error type hierarchy / 能设计错误类型层次结构
11. Serialization 🟡serde, zero-copy, binary data / serde、零拷贝、二进制数据2-3 hours / 2-3 小时Can write a custom serde deserializer / 能写自定义 serde 反序列化器
12. Unsafe 🔶Superpowers, FFI, UB pitfalls, allocators / 五种“超能力”、FFI、UB 陷阱、分配器2-3 hours / 2-3 小时Can wrap unsafe code in a sound safe API / 能把 unsafe 代码封装成健全的安全 API
13. Macros 🟡macro_rules!, proc macros, syn/quote / macro_rules!、过程宏、syn/quote2-3 hours / 2-3 小时Can write a declarative macro with tt munching / 能写出使用 tt munching 的声明式宏
14. Testing 🟢Unit/integration/doc tests, proptest, criterion / 单元测试、集成测试、文档测试、proptest、criterion1-2 hours / 1-2 小时Can set up property-based tests / 能搭建属性测试
15. API Design 🟡Module layout, ergonomic APIs, feature flags / 模块布局、易用 API、feature 标志2-3 hours / 2-3 小时Can apply the “parse, don’t validate” pattern / 能应用“先解析,不要事后校验”的模式
16. Async 🔶Futures, Tokio, common pitfalls / Future、Tokio、常见陷阱1-2 hours / 1-2 小时Can identify async anti-patterns / 能识别 async 反模式
17. Exercises 🟢Comprehensive practice problems / 综合练习题4-8 hours / 4-8 小时Can solve common Rust pattern puzzles / 能解决常见的 Rust 模式谜题
Appendices / 附录
18. Reference Card / 速查卡Quick-look trait bounds, lifetimes, patterns / trait 约束、生命周期、模式速查As needed / 按需查阅-
19. Capstone Project / 综合项目Type-safe task scheduler / 类型安全任务调度器4-6 hours / 4-6 小时Submit a working implementation / 完成一个可运行实现

Total estimated time: 30-45 hours for thorough study with exercises.

预计总时长:若完整学习并完成练习,大约需要 30-45 小时。

Working Through Exercises / 练习建议

Every chapter ends with a hands-on exercise. For maximum learning:

每章末尾都有动手练习。为了获得最佳学习效果:

  1. Try it yourself first - spend at least 15 minutes before opening the solution / 先自己尝试,至少坚持 15 分钟再打开答案
  2. Type the code - don’t copy-paste; typing builds muscle memory / 手敲代码,不要复制粘贴,输入本身会强化记忆
  3. Modify the solution - add a feature, change a constraint, break something on purpose / 修改答案,比如增加功能、调整约束,或者故意改坏再修复
  4. Check cross-references - most exercises combine patterns from multiple chapters / 查看交叉引用,大多数练习都会结合多章模式

The capstone project (Appendix) ties together patterns from across the book into a single, production-quality system.

综合项目(附录)会把整本书中的多个模式整合成一个具有生产质量的完整系统。

Table of Contents / 目录

Part I: Type-Level Patterns / 第一部分:类型层模式

1. Generics - The Full Picture / 1. 泛型全景图 🟢
Monomorphization, code bloat trade-offs, generics vs enums vs trait objects, const generics, const fn.
单态化、代码膨胀权衡、泛型与枚举和 trait 对象的取舍、const 泛型、const fn

2. Traits In Depth / 2. Trait 深入解析 🟡
Associated types, GATs, blanket impls, marker traits, vtables, HRTBs, extension traits, enum dispatch.
关联类型、GAT、blanket impl、标记 trait、vtable、HRTB、扩展 trait、枚举分发。

3. The Newtype and Type-State Patterns / 3. Newtype 与 Type-State 模式 🟡
Zero-cost type safety, compile-time state machines, builder patterns, config traits.
零成本类型安全、编译期状态机、builder 模式、配置 trait。

4. PhantomData - Types That Carry No Data / 4. PhantomData:不承载数据的类型 🔶
Lifetime branding, unit-of-measure pattern, drop check, variance.
生命周期品牌化、计量单位模式、drop check 与变型。

Part II: Concurrency & Runtime / 第二部分:并发与运行时

5. Channels and Message Passing / 5. Channel 与消息传递 🟢
std::sync::mpsc, crossbeam, select!, backpressure, actor pattern.
std::sync::mpsc、crossbeam、select!、背压与 actor 模式。

6. Concurrency vs Parallelism vs Threads / 6. 并发、并行与线程 🟡
OS threads, scoped threads, rayon, Mutex/RwLock/Atomics, Condvar, OnceLock, lock-free patterns.
操作系统线程、作用域线程、rayon、Mutex/RwLock/原子类型、Condvar、OnceLock 与无锁模式。

7. Closures and Higher-Order Functions / 7. 闭包与高阶函数 🟢
Fn/FnMut/FnOnce, closures as parameters/return values, combinators, higher-order APIs.
Fn/FnMut/FnOnce、闭包作为参数和返回值、组合器与高阶 API。

8. Functional vs. Imperative: When Elegance Wins / 8. 函数式与命令式:优雅何时胜出 🟢
Iterator combinators, adapters, side-effect management, functional vs imperative performance.
迭代器组合器、适配器、副作用管理、函数式与命令式的性能权衡。

9. Smart Pointers and Interior Mutability / 9. 智能指针与内部可变性 🟡
Box, Rc, Arc, Weak, Cell/RefCell, Cow, Pin, ManuallyDrop.
Box、Rc、Arc、Weak、Cell/RefCell、Cow、Pin、ManuallyDrop。

Part III: Systems & Production / 第三部分:系统与生产实践

10. Error Handling Patterns / 10. 错误处理模式 🟢
thiserror vs anyhow, #[from], .context(), ? operator, panics.
thiserror 与 anyhow 的对比、#[from].context()? 操作符与 panic。

11. Serialization, Zero-Copy, and Binary Data / 11. 序列化、零拷贝与二进制数据 🟡
serde fundamentals, enum representations, zero-copy deserialization, repr(C), bytes::Bytes.
serde 基础、枚举表示方式、零拷贝反序列化、repr(C)bytes::Bytes

12. Unsafe Rust - Controlled Danger / 12. Unsafe Rust:受控的危险 🔶
Five superpowers, sound abstractions, FFI, UB pitfalls, arena/slab allocators.
五种“超能力”、健全抽象、FFI、UB 陷阱、arena/slab 分配器。

13. Macros - Code That Writes Code / 13. 宏:生成代码的代码 🟡
macro_rules!, when (not) to use macros, proc macros, derive macros, syn/quote.
macro_rules!、宏的适用与不适用场景、过程宏、派生宏、syn/quote

14. Testing and Benchmarking Patterns / 14. 测试与基准模式 🟢
Unit/integration/doc tests, proptest, criterion, mocking strategies.
单元测试、集成测试、文档测试、proptest、criterion 与 mock 策略。

15. Crate Architecture and API Design / 15. Crate 架构与 API 设计 🟡
Module layout, API design checklist, ergonomic parameters, feature flags, workspaces.
模块布局、API 设计清单、易用参数设计、feature 标志与工作区。

16. Async/Await Essentials / 16. Async/Await 核心要点 🔶
Futures, Tokio quick-start, common pitfalls. (For deep async coverage, see our Async Rust Training.)
Future、Tokio 快速入门与常见陷阱。(若需深入异步内容,请参考 Async Rust Training。)

17. Exercises / 17. 练习 🟢
Comprehensive practice problems and challenges.
全书知识点综合练习与挑战。

Appendices / 附录

Summary and Reference Card / 总结与速查卡
Pattern decision guide, trait bounds cheat sheet, lifetime elision rules, further reading.
模式决策指南、trait 约束速查、生命周期省略规则与延伸阅读。

Capstone Project: Type-Safe Task Scheduler / 综合项目:类型安全任务调度器
Integrate generics, traits, typestate, channels, error handling, and testing into a complete system.
将泛型、trait、typestate、channel、错误处理与测试整合为完整系统。


1. Generics — The Full Picture / 泛型全景图 🟢

What you’ll learn / 你将学到:

  • How monomorphization gives zero-cost generics — and when it causes code bloat / 单态化如何实现零成本泛型 —— 以及何时会引发代码膨胀
  • The decision framework: generics vs enums vs trait objects / 决策框架:泛型 vs 枚举 vs trait 对象
  • Const generics for compile-time array sizes and const fn for compile-time evaluation / 用于编译期数组大小的 const 泛型,以及用于编译期计算的 const fn
  • When to trade static dispatch for dynamic dispatch on cold paths / 何时在冷代码路径上将静态分发换为动态分发

Monomorphization and Zero Cost / 单态化与零成本

Generics in Rust are monomorphized — the compiler generates a specialized copy of each generic function for every concrete type it’s used with. This is the opposite of Java/C# where generics are erased at runtime.

Rust 中的泛型是 单态化(monomorphized) 的 —— 编译器会为每个使用的具体类型生成一份该泛型函数的专门副本。这与 Java/C# 不同,后者的泛型在运行时会被擦除。

fn max_of<T: PartialOrd>(a: T, b: T) -> T {
    if a >= b { a } else { b }
}

fn main() {
    max_of(3_i32, 5_i32);     // Compiler generates max_of_i32 / 编译器生成 max_of_i32
    max_of(2.0_f64, 7.0_f64); // Compiler generates max_of_f64 / 编译器生成 max_of_f64
    max_of("a", "z");         // Compiler generates max_of_str / 编译器生成 max_of_str
}

What the compiler actually produces / 编译器实际生成的内容 (conceptually / 概念上):

#![allow(unused)]
fn main() {
// Three separate functions — no runtime dispatch, no vtable:
// 三个独立的函数 —— 没有运行时分发,没有 vtable:
fn max_of_i32(a: i32, b: i32) -> i32 { if a >= b { a } else { b } }
fn max_of_f64(a: f64, b: f64) -> f64 { if a >= b { a } else { b } }
fn max_of_str<'a>(a: &'a str, b: &'a str) -> &'a str { if a >= b { a } else { b } }
}

Why does max_of_str need <'a> but max_of_i32 doesn’t? i32 and f64 are Copy types — the function returns an owned value. But &str is a reference, so the compiler must know the returned reference’s lifetime. The <'a> annotation says “the returned &str lives at least as long as both inputs.”

为什么 max_of_str 需要 <'a>max_of_i32 不需要? i32f64Copy 类型 —— 函数返回的是拥有所有权的值。但 &str 是引用,因此编译器必须知道返回引用的生命周期。<'a> 标注的意思是“返回的 &str 至少与两个输入参数活得一样久”。

Advantages / 优点:Zero runtime cost — identical to hand-written specialized code. The optimizer can inline, vectorize, and specialize each copy independently.

优点:零运行时开销 —— 与手写的针对特定类型的代码完全一致。优化器可以独立地对每个副本进行内联、向量化和专门优化。

Comparison with C++ / 与 C++ 的比较:Rust generics work like C++ templates but with one crucial difference — bounds checking happens at definition, not instantiation.

与 C++ 的比较:Rust 泛型的工作原理类似于 C++ 模板,但有一个关键区别 —— 约束检查发生在定义时,而不是实例化时

#![allow(unused)]
fn main() {
// Rust: error at definition site — "T doesn't implement Display"
// Rust:在定义位置报错 —— “T 未实现 Display”
fn broken<T>(val: T) {
    println!("{val}"); // ❌ Error: T doesn't implement Display
}

// Fix: add the bound / 修复:添加约束
fn fixed<T: std::fmt::Display>(val: T) {
    println!("{val}"); // ✅
}
}

When Generics Hurt: Code Bloat / 泛型的副作用:代码膨胀

Monomorphization has a cost — binary size. Each unique instantiation duplicates the function body:

单态化是有代价的 —— 即二进制文件体积。每个唯一的实例化都会复制一份函数体:

#![allow(unused)]
fn main() {
// This innocent function... / 这个看似无辜的函数……
fn serialize<T: serde::Serialize>(value: &T) -> Vec<u8> {
    serde_json::to_vec(value).unwrap()
}

// ...used with 50 different types → 50 copies in the binary.
// ……如果用于 50 种不同的类型 → 二进制文件中就会有 50 份副本。
}

Mitigation strategies / 缓解策略

#![allow(unused)]
fn main() {
// 1. Extract the non-generic core ("outline" pattern)
// 1. 提取非泛型核心(“轮廓”模式)
fn serialize<T: serde::Serialize>(value: &T) -> Result<Vec<u8>, serde_json::Error> {
    let json_value = serde_json::to_value(value)?;
    serialize_value(json_value)
}

fn serialize_value(value: serde_json::Value) -> Result<Vec<u8>, serde_json::Error> {
    // This function exists only ONCE in the binary
    // 此函数在二进制文件中只存在一份
    serde_json::to_vec(&value)
}

// 2. Use trait objects (dynamic dispatch) / 2. 使用 trait 对象(动态分发)
fn log_item(item: &dyn std::fmt::Display) {
    // One copy — uses vtable for dispatch / 只有一份拷贝 —— 使用 vtable 进行分发
    println!("[LOG] {item}");
}
}

Generics vs Enums vs Trait Objects — Decision Guide / 决策指南

Approach / 方式Dispatch / 分发Known at / 确定时机Extensible? / 可扩展?Overhead / 开销
Generics (impl Trait / <T: Trait>)Static (静态)Compile time (编译期)✅ (open set / 开放集合)Zero — inlined (零 - 内联)
EnumMatch armCompile time (编译期)❌ (closed set / 封闭集合)Zero (零)
Trait object (dyn Trait)Dynamic (动态)Runtime (运行时)✅ (open set / 开放集合)Vtable overhead (vtable 开销)
flowchart TD
    A["Do you know ALL<br>possible types at<br>compile time?"]
    A -->|"Yes, small<br>closed set"| B["Enum"]
    A -->|"Yes, but set<br>is open"| C["Generics<br>(monomorphized)"]
    A -->|"No — types<br>determined at runtime"| D["dyn Trait"]

    C --> E{"Hot path?<br>(millions of calls)"}
    E -->|Yes| F["Generics<br>(inlineable)"]
    E -->|No| G["dyn Trait<br>is fine"]

    D --> H{"Need mixed types<br>in one collection?"}
    H -->|Yes| I["Vec&lt;Box&lt;dyn Trait&gt;&gt;"]
    H -->|No| C

    style A fill:#e8f4f8,stroke:#2980b9,color:#000
    style B fill:#d4efdf,stroke:#27ae60,color:#000
    style C fill:#d4efdf,stroke:#27ae60,color:#000
    style D fill:#fdebd0,stroke:#e67e22,color:#000
    style F fill:#d4efdf,stroke:#27ae60,color:#000
    style G fill:#fdebd0,stroke:#e67e22,color:#000
    style I fill:#fdebd0,stroke:#e67e22,color:#000
    style E fill:#fef9e7,stroke:#f1c40f,color:#000
    style H fill:#fef9e7,stroke:#f1c40f,color:#000

Const Generics / Const 泛型

Since Rust 1.51, you can parameterize types and functions over constant values, not just types:

从 Rust 1.51 开始,你可以针对 常量值 而不仅仅是类型来对类型和函数进行参数化:

#![allow(unused)]
fn main() {
// Array wrapper parameterized over size / 针对大小进行参数化的数组包装器
struct Matrix<const ROWS: usize, const COLS: usize> {
    data: [[f64; COLS]; ROWS],
}

impl<const ROWS: usize, const COLS: usize> Matrix<ROWS, COLS> {
    fn new() -> Self {
        Matrix { data: [[0.0; COLS]; ROWS] }
    }

    fn transpose(&self) -> Matrix<COLS, ROWS> {
        let mut result = Matrix::<COLS, ROWS>::new();
        for r in 0..ROWS {
            for c in 0..COLS {
                result.data[c][r] = self.data[r][c];
            }
        }
        result
    }
}
}

Const Functions (const fn) / Const 函数

const fn marks a function as evaluable at compile time — Rust’s equivalent of C++ constexpr.

const fn 将函数标记为在编译期可求值 —— 相当于 Rust 中的 C++ constexpr

#![allow(unused)]
fn main() {
// Basic const fn — evaluated at compile time / 基础 const fn —— 在编译期求值
const fn celsius_to_fahrenheit(c: f64) -> f64 {
    c * 9.0 / 5.0 + 32.0
}

const BOILING_F: f64 = celsius_to_fahrenheit(100.0); // Computed at compile time / 编译期计算
}

Key Takeaways — Generics / 关键要点:泛型

  • Monomorphization gives zero-cost abstractions but can cause code bloat / 单态化提供了零成本抽象,但可能导致代码膨胀
  • Const generics ([T; N]) replace C++ template tricks / Const 泛型 ([T; N]) 替代了 C++ 的模板技巧
  • const fn eliminates lazy_static! for simple values / const fn 针对简单值消除了对 lazy_static! 的需求

Exercise: Generic Cache with Eviction / 练习:带逐出机制的泛型缓存 ★★

Build a generic Cache<K, V> struct that stores key-value pairs with a configurable maximum capacity. When full, the oldest entry is evicted (FIFO).

构建一个泛型 Cache<K, V> 结构体,用于存储键值对,并具有可配置的最大容量。当缓存满时,将逐出最旧的条目(FIFO)。

🔑 Solution / 参考答案
#![allow(unused)]
fn main() {
use std::collections::{HashMap, VecDeque};
use std::hash::Hash;

struct Cache<K, V> {
    map: HashMap<K, V>,
    order: VecDeque<K>,
    capacity: usize,
}

impl<K: Eq + Hash + Clone, V> Cache<K, V> {
    fn new(capacity: usize) -> Self {
        Cache {
            map: HashMap::with_capacity(capacity),
            order: VecDeque::with_capacity(capacity),
            capacity,
        }
    }

    fn insert(&mut self, key: K, value: V) {
        if self.map.contains_key(&key) {
            self.map.insert(key, value);
            return;
        }
        if self.map.len() >= self.capacity {
            if let Some(oldest) = self.order.pop_front() {
                self.map.remove(&oldest);
            }
        }
        self.order.push_back(key.clone());
        self.map.insert(key, value);
    }

    fn get(&self, key: &K) -> Option<&V> {
        self.map.get(key)
    }

    fn len(&self) -> usize {
        self.map.len()
    }
}
}

2. Traits In Depth / 2. Trait 深入解析 🟡

What you’ll learn / 你将学到:

  • Associated types vs generic parameters — and when to use each / 关联类型 vs 泛型参数 —— 以及何时使用它们
  • GATs, blanket impls, marker traits, and trait object safety rules / GAT、blanket impl、标记 trait 以及 trait 对象安全规则
  • How vtables and fat pointers work under the hood / vtable 和脂肪指针的底层工作原理
  • Extension traits, enum dispatch, and typed command patterns / 扩展 trait、枚举分发以及类型化命令模式

Associated Types vs Generic Parameters / 关联类型 vs 泛型参数

Both let a trait work with different types, but they serve different purposes:

二者都能让 trait 处理不同的类型,但它们的用途不同:

#![allow(unused)]
fn main() {
// --- ASSOCIATED TYPE: One implementation per type ---
// --- 关联类型:每个类型只有一个实现 ---
trait Iterator {
    type Item; // Each iterator produces exactly ONE kind of item / 每个迭代器只产生一种类型的项

    fn next(&mut self) -> Option<Self::Item>;
}

// A custom iterator that always yields i32 — there's no choice
// 一个总是产生 i32 的自定义迭代器 —— 没有其他选择
struct Counter { max: i32, current: i32 }

impl Iterator for Counter {
    type Item = i32; // Exactly one Item type per implementation / 每个实现只有一个 Item 类型
    fn next(&mut self) -> Option<i32> {
        if self.current < self.max {
            self.current += 1;
            Some(self.current)
        } else {
            None
        }
    }
}

// --- GENERIC PARAMETER: Multiple implementations per type ---
// --- 泛型参数:每个类型可以有多个实现 ---
trait Convert<T> {
    fn convert(&self) -> T;
}

// A single type can implement Convert for MANY target types:
// 一个类型可以为多种目标类型实现 Convert:
impl Convert<f64> for i32 {
    fn convert(&self) -> f64 { *self as f64 }
}
impl Convert<String> for i32 {
    fn convert(&self) -> String { self.to_string() }
}
}

When to use which / 何时该用哪一个

Use / 使用When / 何时
Associated type / 关联类型There’s exactly ONE natural output/result per implementing type (e.g., Iterator::Item). / 每个实现类型恰好有一个自然的输出/结果(例如 Iterator::Item)。
Generic parameter / 泛型参数A type can meaningfully implement the trait for MANY different types (e.g., From<T>). / 一个类型可以有意义地为许多不同的类型实现该 trait(例如 From<T>)。

Intuition / 直觉解析:If it makes sense to ask “what is the Item of this iterator?”, use associated type. If it makes sense to ask “can this convert to f64? to String? to bool?”, use a generic parameter.

直觉解析:如果问“这个迭代器的 Item 是什么?”是有意义的,请使用关联类型。如果问“这个类型能转换成 f64 吗?转换成 String 吗?转换成 bool 吗?”是有意义的,请使用泛型参数。

#![allow(unused)]
fn main() {
// Real-world example: std::ops::Add / 现实世界示例:std::ops::Add
trait Add<Rhs = Self> {
    type Output; // Associated type — addition has ONE result type / 关联类型 —— 加法只有一个结果类型
    fn add(self, rhs: Rhs) -> Self::Output;
}

// Rhs is a generic parameter — you can add different types to Meters:
// Rhs 是一个泛型参数 —— 你可以向 Meters 添加不同的类型:
struct Meters(f64);
struct Centimeters(f64);

impl Add<Meters> for Meters {
    type Output = Meters;
    fn add(self, rhs: Meters) -> Meters { Meters(self.0 + rhs.0) }
}
impl Add<Centimeters> for Meters {
    type Output = Meters;
    fn add(self, rhs: Centimeters) -> Meters { Meters(self.0 + rhs.0 / 100.0) }
}
}

Generic Associated Types (GATs) / 泛型关联类型 (GAT)

Since Rust 1.65, associated types can have generic parameters of their own. This enables lending iterators — iterators that return references tied to the iterator rather than to the underlying collection:

从 Rust 1.65 开始,关联类型可以拥有自己的泛型参数。这使得 借用迭代器(lending iterators) 成为可能 —— 这种迭代器返回的引用绑定到迭代器本身,而不是底层的集合:

#![allow(unused)]
fn main() {
// Without GATs — impossible to express a lending iterator:
// 没有 GAT —— 无法表达借用迭代器:
// trait LendingIterator {
//     type Item<'a>;  // ← This was rejected before 1.65 / 1.65 之前被拒绝
// }

// With GATs (Rust 1.65+):
// 使用 GAT (Rust 1.65+):
trait LendingIterator {
    type Item<'a> where Self: 'a;

    fn next(&mut self) -> Option<Self::Item<'_>>;
}

// Example: an iterator that yields overlapping windows
// 示例:一个产生重叠窗口的迭代器
struct WindowIter<'data> {
    data: &'data [u8],
    pos: usize,
    window_size: usize,
}

impl<'data> LendingIterator for WindowIter<'data> {
    type Item<'a> = &'a [u8] where Self: 'a;

    fn next(&mut self) -> Option<&[u8]> {
        if self.pos + self.window_size <= self.data.len() {
            let window = &self.data[self.pos..self.pos + self.window_size];
            self.pos += 1;
            Some(window)
        } else {
            None
        }
    }
}
}

When you need GATs / 何时需要 GAT:Lending iterators, streaming parsers, or any trait where the associated type’s lifetime depends on the &self borrow. For most code, plain associated types are sufficient.

何时需要 GAT:借用迭代器、流式解析器,或者任何关联类型的生命周期依赖于 &self 借用的 trait。对于大多数代码,普通的关联类型就足够了。

Supertraits and Trait Hierarchies / Supertrait 与 Trait 层次结构

Traits can require other traits as prerequisites, forming hierarchies:

Trait 可以要求其他 trait 作为先决条件,从而形成层次结构:

graph BT
    Display["Display"]
    Debug["Debug"]
    Error["Error"]
    Clone["Clone"]
    Copy["Copy"]
    PartialEq["PartialEq"]
    Eq["Eq"]
    PartialOrd["PartialOrd"]
    Ord["Ord"]

    Error --> Display
    Error --> Debug
    Copy --> Clone
    Eq --> PartialEq
    Ord --> Eq
    Ord --> PartialOrd
    PartialOrd --> PartialEq

    style Display fill:#e8f4f8,stroke:#2980b9,color:#000
    style Debug fill:#e8f4f8,stroke:#2980b9,color:#000
    style Error fill:#fdebd0,stroke:#e67e22,color:#000
    style Clone fill:#d4efdf,stroke:#27ae60,color:#000
    style Copy fill:#d4efdf,stroke:#27ae60,color:#000
    style PartialEq fill:#fef9e7,stroke:#f1c40f,color:#000
    style Eq fill:#fef9e7,stroke:#f1c40f,color:#000
    style PartialOrd fill:#fef9e7,stroke:#f1c40f,color:#000
    style Ord fill:#fef9e7,stroke:#f1c40f,color:#000

Arrows point from subtrait to supertrait: implementing Error requires Display + Debug.

箭头从子 trait 指向 supertrait:实现 Error 需要同时实现 DisplayDebug

A trait can require that implementors also implement other traits:

Trait 可以要求实现者同时实现其他 trait:

#![allow(unused)]
fn main() {
use std::fmt;

// Display is a supertrait of Error / Display 是 Error 的 supertrait
trait Error: fmt::Display + fmt::Debug {
    fn source(&self) -> Option<&(dyn Error + 'static)> { None }
}
// Any type implementing Error MUST also implement Display and Debug
// 任何实现 Error 的类型也必须实现 Display 和 Debug

// Build your own hierarchies / 构建你自己的层次结构:
trait Identifiable {
    fn id(&self) -> u64;
}

trait Timestamped {
    fn created_at(&self) -> chrono::DateTime<chrono::Utc>;
}

// Entity requires both / Entity 同时需要这两者:
trait Entity: Identifiable + Timestamped {
    fn is_active(&self) -> bool;
}

// Implementing Entity forces you to implement all three:
// 实现 Entity 会强制你实现全部这三个 trait:
struct User { id: u64, name: String, created: chrono::DateTime<chrono::Utc> }

impl Identifiable for User {
    fn id(&self) -> u64 { self.id }
}
impl Timestamped for User {
    fn created_at(&self) -> chrono::DateTime<chrono::Utc> { self.created }
}
impl Entity for User {
    fn is_active(&self) -> bool { true }
}
}

Blanket Implementations / Blanket 实现

Implement a trait for ALL types that satisfy some bound:

为满足某些约束的所有类型实现一个 trait:

#![allow(unused)]
fn main() {
// std does this: any type that implements Display automatically gets ToString
// 标准库的工作方式:任何实现了 Display 的类型都会自动获得 ToString
impl<T: fmt::Display> ToString for T {
    fn to_string(&self) -> String {
        format!("{self}")
    }
}
// Now i32, &str, your custom types — anything with Display — gets to_string() for free.
// 现在 i32、&str 以及你的自定义类型 —— 只要有 Display,就能免费获得 to_string()。

// Your own blanket impl / 你自己的 blanket 实现:
trait Loggable {
    fn log(&self);
}

// Every Debug type is automatically Loggable / 每个 Debug 类型都会自动成为 Loggable:
impl<T: std::fmt::Debug> Loggable for T {
    fn log(&self) {
        eprintln!("[LOG] {self:?}");
    }
}

// Now ANY Debug type has .log() / 现在任何 Debug 类型都有了 .log() 方法:
// 42.log();              // [LOG] 42
// "hello".log();         // [LOG] "hello"
// vec![1, 2, 3].log();   // [LOG] [1, 2, 3]
}

Caution / 注意:Blanket impls are powerful but irreversible — you can’t add a more specific impl for a type that’s already covered by a blanket impl (orphan rules + coherence). Design them carefully.

注意:Blanket 实现非常强大,但也是不可逆的 —— 你不能为一个已经被 blanket 实现覆盖的类型添加更具体的实现(受限于孤儿规则和一致性)。请谨慎设计。

Marker Traits / 标记 Trait (Marker Traits)

Traits with no methods — they mark a type as having some property:

不包含任何方法的 trait —— 它们将某个类型标记为具有特定属性:

#![allow(unused)]
fn main() {
// Standard library marker traits / 标准库中的标记 trait:
// Send    — safe to transfer between threads / 可以安全地在线程间转移
// Sync    — safe to share (&T) between threads / 可以安全地在线程间共享 (&T)
// Unpin   — safe to move after pinning / pin 后仍然可以安全移动
// Sized   — has a known size at compile time / 编译时具有已知大小
// Copy    — can be duplicated with memcpy / 可以通过 memcpy 复制

// Your own marker trait / 你自己的标记 trait:
/// Marker: this sensor has been factory-calibrated / 标记:该传感器已通过工厂校准
trait Calibrated {}

struct RawSensor { reading: f64 }
struct CalibratedSensor { reading: f64 }

impl Calibrated for CalibratedSensor {}

// Only calibrated sensors can be used in production:
// 只有经过校准的传感器才能在生产环境中使用:
fn record_measurement<S: Calibrated>(sensor: &S) {
    // ...
}
// record_measurement(&RawSensor { reading: 0.0 }); // ❌ Compile error / 编译错误
// record_measurement(&CalibratedSensor { reading: 0.0 }); // ✅
}

This connects directly to the type-state pattern in Chapter 3.

这与第 3 章中的 状态类型模式 (type-state pattern) 直接相关。

Trait Object Safety Rules / Trait 对象安全规则

Not every trait can be used as dyn Trait. A trait is object-safe only if:

并非所有的 trait 都可以作为 dyn Trait 使用。一个 trait 只有在满足以下条件时才是 对象安全(object-safe) 的:

  1. No Self: Sized bound on the trait itself / trait 本身没有 Self: Sized 约束
  2. No generic type parameters on methods / 方法上没有泛型参数
  3. No use of Self in return position (except via indirection like Box<Self>) / 在返回位置没有使用 Self(通过 Box<Self> 等间接方式除外)
  4. No associated functions (methods must have &self, &mut self, or self) / 没有关联函数(方法必须带有 &self&mut selfself
#![allow(unused)]
fn main() {
// ✅ Object-safe — can be used as dyn Drawable
// ✅ 对象安全 —— 可以用作 dyn Drawable
trait Drawable {
    fn draw(&self);
    fn bounding_box(&self) -> (f64, f64, f64, f64);
}

let shapes: Vec<Box<dyn Drawable>> = vec![/* ... */]; // ✅ Works / 行得通

// ❌ NOT object-safe — uses Self in return position
// ❌ 不对象安全 —— 在返回位置使用了 Self
trait Cloneable {
    fn clone_self(&self) -> Self;
    //                       ^^^^ Can't know the concrete size at runtime / 运行时无法知道具体大小
}
// let items: Vec<Box<dyn Cloneable>> = ...; // ❌ Compile error / 编译错误

// ❌ NOT object-safe — generic method
// ❌ 不对象安全 —— 泛型方法
trait Converter {
    fn convert<T>(&self) -> T;
    //        ^^^ The vtable can't contain infinite monomorphizations
}

// ❌ NOT object-safe — associated function (no self)
trait Factory {
    fn create() -> Self;
    // No &self — how would you call this through a trait object?
}
}

Workarounds:

#![allow(unused)]
fn main() {
// Add `where Self: Sized` to exclude a method from the vtable:
trait MyTrait {
    fn regular_method(&self); // Included in vtable

    fn generic_method<T>(&self) -> T
    where
        Self: Sized; // Excluded from vtable — can't be called via dyn MyTrait
}

// Now dyn MyTrait is valid, but generic_method can only be called
// when the concrete type is known.
// 现在 dyn MyTrait 是有效的,但 generic_method 只能在具体类型已知时调用。
}

Rule of thumb / 经验法则:If you plan to use dyn Trait, keep methods simple — no generics, no Self in return types, no Sized bounds. When in doubt, try let _: Box<dyn YourTrait>; and let the compiler tell you.

经验法则:如果你计划使用 dyn Trait,请保持方法简单 —— 不要使用泛型,不要在返回类型中使用 Self,不要使用 Sized 约束。如果不确定,试着写一行 let _: Box<dyn YourTrait>;,让编译器告诉你答案。

Trait Objects Under the Hood — vtables and Fat Pointers / Trait 对象底层原理 —— vtable 与脂肪指针

A &dyn Trait (or Box<dyn Trait>) is a fat pointer — two machine words:

&dyn Trait(或 Box<dyn Trait>)是一个 脂肪指针(fat pointer) —— 包含两个机器字(machine words):

┌──────────────────────────────────────────────────┐
│  &dyn Drawable (on 64-bit: 16 bytes total)       │
│  &dyn Drawable (在 64 位系统上:总共 16 字节)      │
├──────────────┬───────────────────────────────────┤
│  data_ptr    │  vtable_ptr                       │
│  (8 bytes)   │  (8 bytes)                        │
│  ↓           │  ↓                                │
│  ┌─────────┐ │  ┌──────────────────────────────┐ │
│  │ Circle  │ │  │ vtable for <Circle as        │ │
│  │ {       │ │  │           Drawable>          │ │
│  │  r: 5.0 │ │  │ <Circle as Drawable> 的 vtable │ │
│  │ }       │ │  │                              │ │
│  │         │ │  │  drop_in_place: 0x7f...a0    │ │
│  └─────────┘ │  │  size:           8           │ │
│              │  │  align:          8           │ │
│              │  │  draw:          0x7f...b4    │ │
│              │  │  bounding_box:  0x7f...c8    │ │
│              │  └──────────────────────────────┘ │
└──────────────┴───────────────────────────────────┘

How a vtable call works / vtable 调用是如何工作的 (e.g., shape.draw()):

  1. Load vtable_ptr from the fat pointer (second word) / 从脂肪指针中加载 vtable_ptr(第二个字)
  2. Index into the vtable to find the draw function pointer / 在 vtable 中索引以找到 draw 函数指针
  3. Call it, passing data_ptr as the self argument / 调用它,并将 data_ptr 作为 self 参数传递

This is similar to C++ virtual dispatch in cost (one pointer indirection per call), but Rust stores the vtable pointer in the fat pointer rather than inside the object — so a plain Circle on the stack carries no vtable pointer at all.

这在开销上与 C++ 的虚函数分发类似(每次调用一次指针间接跳转),但 Rust 将 vtable 指针存储在脂肪指针中,而不是对象内部 —— 因此栈上普通的 Circle 根本不携带 vtable 指针。

trait Drawable {
    fn draw(&self);
    fn area(&self) -> f64;
}

struct Circle { radius: f64 }

impl Drawable for Circle {
    fn draw(&self) { println!("Drawing circle r={}", self.radius); }
    fn area(&self) -> f64 { std::f64::consts::PI * self.radius * self.radius }
}

struct Square { side: f64 }

impl Drawable for Square {
    fn draw(&self) { println!("Drawing square s={}", self.side); }
    fn area(&self) -> f64 { self.side * self.side }
}

fn main() {
    let shapes: Vec<Box<dyn Drawable>> = vec![
        Box::new(Circle { radius: 5.0 }),
        Box::new(Square { side: 3.0 }),
    ];

    // Each element is a fat pointer: (data_ptr, vtable_ptr)
    // The vtable for Circle and Square are DIFFERENT
    // 每个元素都是一个脂肪指针:(data_ptr, vtable_ptr)
    // Circle 和 Square 的 vtable 是不同的
    for shape in &shapes {
        shape.draw();  // vtable dispatch → Circle::draw or Square::draw
        println!("  area = {:.2}", shape.area());
    }

    // Size comparison / 大小比较:
    println!("size_of::<&Circle>()        = {}", size_of::<&Circle>());
    // → 8 bytes (one pointer — the compiler knows the type) / 8 字节(一个指针 —— 编译器知道具体类型)
    println!("size_of::<&dyn Drawable>()  = {}", size_of::<&dyn Drawable>());
    // → 16 bytes (data_ptr + vtable_ptr) / 16 字节 (data_ptr + vtable_ptr)
}

Performance cost model / 性能代价模型

Aspect / 方面Static dispatch / 静态分发 (impl Trait / generics)Dynamic dispatch / 动态分发 (dyn Trait)
Call overhead / 调用开销Zero — inlined by LLVM / 零 —— 由 LLVM 内联One pointer indirection per call / 每次调用一次指针间接跳转
Inlining / 内联✅ Compiler can inline / 编译器可以内联❌ Opaque function pointer / 不透明的函数指针
Binary size / 二进制大小Larger (one copy per type) / 较大(每个类型一份副本)Smaller (one shared function) / 较小(一个共享函数)
Pointer size / 指针大小Thin (1 word) / 细指针(1 个字)Fat (2 words) / 脂肪指针(2 个字)
Heterogeneous collections / 异构集合Vec<Box<dyn Trait>>

When vtable cost matters / 何时需要考虑 vtable 开销:In tight loops calling a trait method millions of times, the indirection and inability to inline can be significant (2-10× slower). For cold paths, configuration, or plugin architectures, the flexibility of dyn Trait is worth the small cost.

何时需要考虑 vtable 开销:在数百万次调用 trait 方法的紧凑循环中,间接跳转和无法内联的影响可能非常显著(慢 2-10 倍)。对于冷代码路径、配置或插件架构,dyn Trait 的灵活性值得这点微小的开销。

Higher-Ranked Trait Bounds (HRTBs) / 高阶 Trait 约束 (HRTB)

Sometimes you need a function that works with references of any lifetime, not a specific one. This is where for<'a> syntax appears:

有时你需要一个能处理 任何 生命周期的引用的函数,而不仅仅是某个特定生命周期。这就是 for<'a> 语法的用武之地:

// Problem: this function needs a closure that can process
// references with ANY lifetime, not just one specific lifetime.
// 问题:该函数需要一个能够处理任何生命周期引用的闭包。

// ❌ This is too restrictive — 'a is fixed by the caller:
// fn apply<'a, F: Fn(&'a str) -> &'a str>(f: F, data: &'a str) -> &'a str

// ✅ HRTB: F must work for ALL possible lifetimes:
fn apply<F>(f: F, data: &str) -> &str
where
    F: for<'a> Fn(&'a str) -> &'a str,
{
    f(data)
}

fn main() {
    let result = apply(|s| s.trim(), "  hello  ");
    println!("{result}"); // "hello"
}

When you encounter HRTBs / 何时会遇到 HRTB

  • Fn(&T) -> &U traits — the compiler infers for<'a> automatically in most cases / Fn(&T) -> &U trait —— 编译器在大多数情况下会自动推断 for<'a>
  • Custom trait implementations that must work across different borrows / 必须跨不同借用工作的自定义 trait 实现。
  • Deserialization with serde: for<'de> Deserialize<'de> / 使用 serde 进行反序列化:for<'de> Deserialize<'de>
// serde's DeserializeOwned is defined as:
// trait DeserializeOwned: for<'de> Deserialize<'de> {}
// Meaning: "can be deserialized from data with ANY lifetime"
// (i.e., the result doesn't borrow from the input)
// 含义:“可以从具有任何生命周期的数据中反序列化”
// (即:结果不从输入中借用)

use serde::de::DeserializeOwned;

fn parse_json<T: DeserializeOwned>(input: &str) -> T {
    serde_json::from_str(input).unwrap()
}

Practical advice / 建议:You’ll rarely write for<'a> yourself. It mostly appears in trait bounds on closure parameters, where the compiler handles it implicitly. But recognizing it in error messages (“expected a for<'a> Fn(&'a ...) bound”) helps you understand what the compiler is asking for.

你很少需要亲手编写 for<'a>。它大多出现在闭包参数的 trait 约束中,编译器会隐式处理。但在错误消息中识别出它(“expected a for<'a> Fn(&'a ...) bound”)有助于你理解编译器的要求。

serde_json::from_str(input).unwrap()

}


> **Practical advice / 建议**:You'll rarely write `for<'a>` yourself. It mostly appears in trait bounds on closure parameters, where the compiler handles it implicitly. But recognizing it in error messages ("expected a `for<'a> Fn(&'a ...)` bound") helps you understand what the compiler is asking for.
>
> 你很少需要亲手编写 `for<'a>`。它大多出现在闭包参数的 trait 约束中,编译器会隐式处理。但在错误消息中识别出它(“expected a `for<'a> Fn(&'a ...)` bound”)有助于你理解编译器的要求。

### `impl Trait` — Argument Position vs Return Position / `impl Trait` —— 参数位置 vs 返回位置

`impl Trait` appears in two positions with **different semantics**:

`impl Trait` 出现在两个位置,具有 **不同的语义**:

```rust
// --- Argument-Position impl Trait (APIT) ---
// --- 参数位置的 impl Trait (APIT) ---
// "Caller chooses the type" — syntactic sugar for a generic parameter
// “调用者选择类型” —— 泛型参数的语法糖
fn print_all(items: impl Iterator<Item = i32>) {
    for item in items { println!("{item}"); }
}
// Equivalent to / 等同于:
fn print_all_verbose<I: Iterator<Item = i32>>(items: I) {
    for item in items { println!("{item}"); }
}
// Caller decides / 调用者决定:
// print_all(vec![1,2,3].into_iter())
// print_all(0..10)

// --- Return-Position impl Trait (RPIT) ---
// --- 返回位置的 impl Trait (RPIT) ---
// "Callee chooses the type" — the function picks one concrete type
// “被调用者(函数)选择类型” —— 函数选择一个具体的类型
fn evens(limit: i32) -> impl Iterator<Item = i32> {
    (0..limit).filter(|x| x % 2 == 0)
    // The concrete type is Filter<Range<i32>, Closure>
    // but the caller only sees "some Iterator<Item = i32>"
    // 具体类型是 Filter<Range<i32>, Closure>
    // 但调用者只看到“某种 Iterator<Item = i32>”
}

Key difference / 关键点比较

APIT (fn foo(x: impl T))RPIT (fn foo() -> impl T)
Who picks the type? / 谁来选择类型?Caller (调用者)Callee (函数体)
Monomorphized? / 是否单态化?Yes — one copy per type / 是 —— 每个类型一份副本Yes — one concrete type / 是 —— 对应一个具体类型
Turbofish? / Turbo 鱼语法?No (foo::<X>() not allowed) / 否 (不允许 foo::<X>())N/A
Equivalent to / 类似于fn foo<X: T>(x: X)Existential type / 存在类型

RPIT in Trait Definitions (RPITIT) / Trait 定义中的 RPIT (RPITIT)

Since Rust 1.75, you can use -> impl Trait directly in trait definitions:

从 Rust 1.75 开始,你可以直接在 trait 定义中使用 -> impl Trait

#![allow(unused)]
fn main() {
trait Container {
    fn items(&self) -> impl Iterator<Item = &str>;
    //                 ^^^^ Each implementor returns its own concrete type
    //                 ^^^^ 每个实现者返回其自己的具体类型
}

struct CsvRow {
    fields: Vec<String>,
}

impl Container for CsvRow {
    fn items(&self) -> impl Iterator<Item = &str> {
        self.fields.iter().map(String::as_str)
    }
}

struct FixedFields;

impl Container for FixedFields {
    fn items(&self) -> impl Iterator<Item = &str> {
        ["host", "port", "timeout"].into_iter()
    }
}
}

Before Rust 1.75, you had to use Box<dyn Iterator> or an associated type to achieve this in traits. RPITIT removes the allocation.

在 Rust 1.75 之前,你必须使用 Box<dyn Iterator> 或关联类型才能在 trait 中实现该功能。RPITIT 消除了堆内存分配。

impl Trait vs dyn Trait — Decision Guide / impl Trait vs dyn Trait —— 决策指南

Do you know the concrete type at compile time? / 编译时是否知道具体类型?
├── YES (是) → Use impl Trait or generics (zero cost, inlinable) / 使用 impl Trait 或泛型(零开销,可内联)
└── NO (否)  → Do you need a heterogeneous collection? / 是否需要异构集合?
     ├── YES (是) → Use dyn Trait (Box<dyn T>, &dyn T)
     └── NO (否)  → Do you need the SAME trait object across an API boundary? / 是否需要在 API 边界使用相同的 trait 对象?
          ├── YES (是) → Use dyn Trait
          └── NO (否)  → Use generics / impl Trait
Feature / 特性impl Traitdyn Trait
Dispatch / 分发Static (monomorphized) / 静态 (单态化)Dynamic (vtable) / 动态 (vtable)
Performance / 性能Best — inlinable / 极佳 —— 可内联One indirection per call / 每次调用一次间接跳转
Heterogeneous collections / 异构集合
Binary size per type / 每个类型的二进制大小One copy each / 每个类型一份副本Shared code / 代码共享
Trait must be object-safe? / Trait 必须对象安全?No / 否Yes / 是
Works in trait definitions / 在 trait 定义中可用✅ (Rust 1.75+)Always / 始终可用

Type Erasure with Any and TypeId / 使用 AnyTypeId 进行类型擦除

Sometimes you need to store values of unknown types and downcast them later — a pattern familiar from void* in C or object in C#. Rust provides this through std::any::Any:

有时你需要存储 未知 类型的值并在稍后进行向下转型 (downcast) —— 这种模式在 C 语言的 void* 或 C# 的 object 中很常见。Rust 通过 std::any::Any 提供此功能:

use std::any::Any;

// Store heterogeneous values / 存储异构值:
fn log_value(value: &dyn Any) {
    if let Some(s) = value.downcast_ref::<String>() {
        println!("String: {s}");
    } else if let Some(n) = value.downcast_ref::<i32>() {
        println!("i32: {n}");
    } else {
        // TypeId lets you inspect the type at runtime / TypeId 允许在运行时检查类型:
        println!("Unknown type: {:?}", value.type_id());
    }
}

// Useful for plugin systems, event buses, or ECS-style architectures:
// 对插件系统、事件总线或 ECS 风格的架构非常有用:
struct AnyMap(std::collections::HashMap<std::any::TypeId, Box<dyn Any + Send>>);

impl AnyMap {
    fn new() -> Self { AnyMap(std::collections::HashMap::new()) }

    fn insert<T: Any + Send + 'static>(&mut self, value: T) {
        self.0.insert(std::any::TypeId::of::<T>(), Box::new(value));
    }

    fn get<T: Any + Send + 'static>(&self) -> Option<&T> {
        self.0.get(&std::any::TypeId::of::<T>())?
            .downcast_ref()
    }
}

fn main() {
    let mut map = AnyMap::new();
    map.insert(42_i32);
    map.insert(String::from("hello"));

    assert_eq!(map.get::<i32>(), Some(&42));
    assert_eq!(map.get::<String>().map(|s| s.as_str()), Some("hello"));
    assert_eq!(map.get::<f64>(), None); // Never inserted
}

When to use Any / 何时使用 Any:Plugin/extension systems, type-indexed maps (typemap), error downcasting (anyhow::Error::downcast_ref). Prefer generics or trait objects when the set of types is known at compile time — Any is a last resort that trades compile-time safety for flexibility.

何时使用 Any:插件/扩展系统、类型索引映射 (typemap)、错误向下转型 (anyhow::Error::downcast_ref)。如果在编译时已知类型集,请优先使用泛型或 trait 对象 —— Any 是最后的手段,它牺牲了编译时安全性以换取灵活性。


Extension Traits — Adding Methods to Types You Don’t Own / 扩展 Trait —— 为你不拥有的类型添加方法

Rust’s orphan rule prevents you from implementing a foreign trait on a foreign type. Extension traits are the standard workaround: define a new trait in your crate whose methods have a blanket implementation for any type that meets a bound. The caller imports the trait and the new methods appear on existing types.

Rust 的孤儿规则(orphan rule)阻止你在不属于你的类型上实现不属于你的 trait。扩展 trait 是标准的解决方法:在你的 crate 中定义一个 新 trait,其方法为满足特定约束的任何类型提供 blanket 实现。调用者只需导入该 trait,新方法就会出现在现有类型上。

This pattern is pervasive in the Rust ecosystem: itertools::Itertools, futures::StreamExt, tokio::io::AsyncReadExt, tower::ServiceExt.

这种模式在 Rust 生态系统中非常普遍:itertools::Itertoolsfutures::StreamExttokio::io::AsyncReadExttower::ServiceExt

The Problem / 问题所在

#![allow(unused)]
fn main() {
// We want to add a .mean() method to all iterators that yield f64.
// But Iterator is defined in std and f64 is a primitive — orphan rule prevents:
// 我们想为所有产生 f64 的迭代器添加一个 .mean() 方法。
// 但 Iterator 定义在标准库中,f64 是原生类型 —— 孤儿规则阻止了这样做:
//
// impl<I: Iterator<Item = f64>> I {   // ❌ Cannot add inherent methods to a foreign type
//                                     // ❌ 无法为外部类型添加固有方法
//     fn mean(self) -> f64 { ... }
// }
}

The Solution: An Extension Trait / 解决方案:扩展 Trait

#![allow(unused)]
fn main() {
/// Extension methods for iterators over numeric values / 数值类型迭代器的扩展方法。
pub trait IteratorExt: Iterator {
    /// Computes the arithmetic mean. Returns `None` for empty iterators.
    /// 计算算术平均值。如果是空迭代器则返回 `None`。
    fn mean(self) -> Option<f64>
    where
        Self: Sized,
        Self::Item: Into<f64>;
}

// Blanket implementation — automatically applies to ALL iterators
// Blanket 实现 —— 自动应用于所有迭代器
impl<I: Iterator> IteratorExt for I {
    fn mean(self) -> Option<f64>
    where
        Self: Sized,
        Self::Item: Into<f64>,
    {
        let mut sum: f64 = 0.0;
        let mut count: u64 = 0;
        for item in self {
            sum += item.into();
            count += 1;
        }
        if count == 0 { None } else { Some(sum / count as f64) }
    }
}

// Usage — just import the trait / 使用 —— 导入该 trait 即可:
use crate::IteratorExt;  // One import and the method appears on all iterators / 导入后,该方法就出现在所有迭代器上了

fn analyze_temperatures(readings: &[f64]) -> Option<f64> {
    readings.iter().copied().mean()  // .mean() is now available! / .mean() 现在可用了!
}

fn analyze_sensor_data(data: &[i32]) -> Option<f64> {
    data.iter().copied().mean()  // Works on i32 too (i32: Into<f64>) / 对 i32 同样有效
}
}

Real-World Example: Diagnostic Result Extensions / 现实世界示例:诊断结果扩展

#![allow(unused)]
fn main() {
use std::collections::HashMap;

struct DiagResult {
    component: String,
    passed: bool,
    message: String,
}

/// Extension trait for Vec<DiagResult> — adds domain-specific analysis methods.
/// Vec<DiagResult> 的扩展 trait —— 添加特定领域分析方法。
pub trait DiagResultsExt {
    fn passed_count(&self) -> usize;
    fn failed_count(&self) -> usize;
    fn overall_pass(&self) -> bool;
    fn failures_by_component(&self) -> HashMap<String, Vec<&DiagResult>>;
}

impl DiagResultsExt for Vec<DiagResult> {
    fn passed_count(&self) -> usize {
        self.iter().filter(|r| r.passed).count()
    }

    fn failed_count(&self) -> usize {
        self.iter().filter(|r| !r.passed).count()
    }

    fn overall_pass(&self) -> bool {
        self.iter().all(|r| r.passed)
    }

    fn failures_by_component(&self) -> HashMap<String, Vec<&DiagResult>> {
        let mut map = HashMap::new();
        for r in self.iter().filter(|r| !r.passed) {
            map.entry(r.component.clone()).or_default().push(r);
        }
        map
    }
}

// Now any Vec<DiagResult> has these methods / 现在任何 Vec<DiagResult> 都有了这些方法:
fn report(results: Vec<DiagResult>) {
    if !results.overall_pass() {
        let failures = results.failures_by_component();
        for (component, fails) in &failures {
            eprintln!("{component}: {} failures", fails.len());
        }
    }
}
}

Naming Convention / 命名规范

The Rust ecosystem uses a consistent Ext suffix / Rust 生态系统通常使用统一的 Ext 后缀:

Crate / 包Extension Trait / 扩展 TraitExtends / 扩展了
itertoolsItertoolsIterator
futuresStreamExt, FutureExtStream, Future
tokioAsyncReadExt, AsyncWriteExtAsyncRead, AsyncWrite
towerServiceExtService
bytesBufMut (partial)&mut [u8]
Your crate / 你的包DiagResultsExtVec<DiagResult>

When to Use / 何时使用

Situation / 场景Use Extension Trait? / 是否使用扩展 Trait?
Adding convenience methods to a foreign type / 向外部类型添加便利方法
Grouping domain-specific logic on generic collections / 在泛型集合上分组特定领域逻辑
The method needs access to private fields / 方法需要访问私有字段❌ (use a wrapper/newtype / 使用包装/newtype)
The method logically belongs on a new type you control / 方法在逻辑上属于你控制的新类型❌ (just add it to your type / 直接加到你的类型上)
You want the method available without any import / 希望无需导入即可使用方法❌ (inherent methods only / 仅限固有方法)

Enum Dispatch — Static Polymorphism Without dyn / 枚举分发 —— 无 dyn 的静态多态

When you have a closed set of types implementing a trait, you can replace dyn Trait with an enum whose variants hold the concrete types. This eliminates the vtable indirection and heap allocation while preserving the same caller-facing interface.

当你有一组 封闭集合 的类型实现某个 trait 时,可以用一个变体持有具体类型的枚举来替换 dyn Trait。这消除了 vtable 间接跳转和堆分配,同时保留了相同的面向调用者的接口。

The Problem with dyn Trait / dyn Trait 的问题

#![allow(unused)]
fn main() {
trait Sensor {
    fn read(&self) -> f64;
    fn name(&self) -> &str;
}

struct Gps { lat: f64, lon: f64 }
struct Thermometer { temp_c: f64 }
struct Accelerometer { g_force: f64 }

impl Sensor for Gps {
    fn read(&self) -> f64 { self.lat }
    fn name(&self) -> &str { "GPS" }
}
impl Sensor for Thermometer {
    fn read(&self) -> f64 { self.temp_c }
    fn name(&self) -> &str { "Thermometer" }
}
impl Sensor for Accelerometer {
    fn read(&self) -> f64 { self.g_force }
    fn name(&self) -> &str { "Accelerometer" }
}

// Heterogeneous collection with dyn — works, but has costs:
// 使用 dyn 的异构集合 —— 虽然可行,但有开销:
fn read_all_dyn(sensors: &[Box<dyn Sensor>]) -> Vec<f64> {
    sensors.iter().map(|s| s.read()).collect()
    // Each .read() goes through a vtable indirection / 每次 .read() 都经过 vtable 间接跳转
    // Each Box allocates on the heap / 每个 Box 都在堆上分配
}
}

The Enum Dispatch Solution / 枚举分发解决方案

// Replace the trait object with an enum / 用枚举替换 trait 对象:
enum AnySensor {
    Gps(Gps),
    Thermometer(Thermometer),
    Accelerometer(Accelerometer),
}

impl AnySensor {
    fn read(&self) -> f64 {
        match self {
            AnySensor::Gps(s) => s.read(),
            AnySensor::Thermometer(s) => s.read(),
            AnySensor::Accelerometer(s) => s.read(),
        }
    }

    fn name(&self) -> &str {
        match self {
            AnySensor::Gps(s) => s.name(),
            AnySensor::Thermometer(s) => s.name(),
            AnySensor::Accelerometer(s) => s.name(),
        }
    }
}

// Now: no heap allocation, no vtable, stored inline
// 现在:没有堆分配,没有 vtable,内联存储
fn read_all(sensors: &[AnySensor]) -> Vec<f64> {
    sensors.iter().map(|s| s.read()).collect()
    // Each .read() is a match branch — compiler can inline everything
    // 每个 .read() 都是一个 match 分支 —— 编译器可以内联所有内容
}

fn main() {
    let sensors = vec![
        AnySensor::Gps(Gps { lat: 47.6, lon: -122.3 }),
        AnySensor::Thermometer(Thermometer { temp_c: 72.5 }),
        AnySensor::Accelerometer(Accelerometer { g_force: 1.02 }),
    ];

    for sensor in &sensors {
        println!("{}: {:.2}", sensor.name(), sensor.read());
    }
}

Implement the Trait on the Enum / 在枚举上实现 Trait

For interoperability, you can implement the original trait on the enum itself:

为了实现互操作性,你可以直接在枚举上实现原始 trait:

#![allow(unused)]
fn main() {
impl Sensor for AnySensor {
    fn read(&self) -> f64 {
        match self {
            AnySensor::Gps(s) => s.read(),
            AnySensor::Thermometer(s) => s.read(),
            AnySensor::Accelerometer(s) => s.read(),
        }
    }

    fn name(&self) -> &str {
        match self {
            AnySensor::Gps(s) => s.name(),
            AnySensor::Thermometer(s) => s.name(),
            AnySensor::Accelerometer(s) => s.name(),
        }
    }
}

// Now AnySensor works anywhere a Sensor is expected via generics:
// 现在 AnySensor 可以像泛型一样在任何需要 Sensor 的地方使用:
fn report<S: Sensor>(s: &S) {
    println!("{}: {:.2}", s.name(), s.read());
}
}

Reducing Boilerplate with a Macro / 使用宏减少样板代码

The match-arm delegation is repetitive. A macro eliminates it:

match 分支的委托是重复性的。使用宏可以消除这一点:

#![allow(unused)]
fn main() {
macro_rules! dispatch_sensor {
    ($self:expr, $method:ident $(, $arg:expr)*) => {
        match $self {
            AnySensor::Gps(s) => s.$method($($arg),*),
            AnySensor::Thermometer(s) => s.$method($($arg),*),
            AnySensor::Accelerometer(s) => s.$method($($arg),*),
        }
    };
}

impl Sensor for AnySensor {
    fn read(&self) -> f64     { dispatch_sensor!(self, read) }
    fn name(&self) -> &str    { dispatch_sensor!(self, name) }
}
}

For larger projects, the enum_dispatch crate automates this entirely:

对于较大的项目,enum_dispatch crate 可以将此过程完全自动化:

#![allow(unused)]
fn main() {
use enum_dispatch::enum_dispatch;

#[enum_dispatch]
trait Sensor {
    fn read(&self) -> f64;
    fn name(&self) -> &str;
}

#[enum_dispatch(Sensor)]
enum AnySensor {
    Gps,
    Thermometer,
    Accelerometer,
}
// All delegation code is generated automatically / 所有的委托代码都会自动生成
}

dyn Trait vs Enum Dispatch — Decision Guide / dyn Trait vs 枚举分发 —— 决策指南

Is the set of types closed (known at compile time)? / 类型集是封闭的吗(在编译时已知)?
├── YES (是) → Prefer enum dispatch (faster, no heap allocation) / 优先选择枚举分发(更快、无堆分配)
│         ├── Few variants (< ~20)?     → Manual enum / 变体较少(< ~20)? → 手动编写枚举
│         └── Many variants or growing? → enum_dispatch crate / 变体较多或不断增加? → 使用 enum_dispatch
└── NO (否)  → Must use dyn Trait (plugins, user-provided types) / 必须使用 dyn Trait(插件、用户提供的类型)
Property / 属性dyn TraitEnum Dispatch / 枚举分发
Dispatch cost / 分发开销Vtable indirection (~2ns) / vtable 间接跳转 (~2ns)Branch prediction (~0.3ns) / 分支预测 (~0.3ns)
Heap allocation / 堆分配Usually (Box) / 通常需要 (Box)None (inline) / 无 (内联)
Cache-friendly / 缓存友好性No (pointer chasing) / 否 (指针追逐)Yes (contiguous) / 是 (连续存储)
Open to new types / 是否支持新类型✅ (anyone can impl) / 是 (任何人均可实现)❌ (closed set) / 否 (封闭集合)
Code size / 代码大小Shared / 共享One copy per variant / 每个变体一份副本
Trait safe? / 是否必须对象安全?Yes / 是No / 否
Adding a variant / 添加变体No code changes / 无需修改代码Update enum + match arms / 需要更新枚举和 match 分支

When to Use Enum Dispatch / 何时使用枚举分发

Scenario / 场景Recommendation / 建议
Diagnostic test types (CPU, GPU, NIC, Memory, …) / 诊断测试类型✅ Enum dispatch — closed set, known at compile time / 封闭集合,编译时已知
Bus protocols (SPI, I2C, UART, …) / 总线协议✅ Enum dispatch or Config trait / 枚举分发或 Config trait
Plugin system (user loads .so at runtime) / 插件系统(运行时加载)❌ Use dyn Trait / 使用 dyn Trait
2-3 variants / 2-3 个变体✅ Manual enum dispatch / 手动枚举分发
10+ variants with many methods / 10 个以上变体且具有多个方法enum_dispatch crate
Performance-critical inner loop / 性能关键的内层循环✅ Enum dispatch (eliminates vtable) / 消除 vtable

Capability Mixins — Associated Types as Zero-Cost Composition / 能力注入 (Capability Mixins) —— 作为零成本组合的关联类型

Ruby developers compose behaviour with mixinsinclude SomeModule injects methods into a class. Rust traits with associated types + default methods + blanket impls produce the same result, except:

Ruby 开发者使用 mixins 来组合行为 —— include SomeModule 会将方法注入到类中。具有 关联类型 (associated types) + 默认方法 (default methods) + blanket 实现 (blanket impls) 的 Rust trait 也能达到同样的效果,但有以下不同:

  • Everything resolves at compile time — no method-missing surprises / 所有的内容都在 编译时 解析 —— 不会有“方法缺失”的意外
  • Each associated type is a knob that changes what the default methods produce / 每个关联类型都是一个 控制旋钮,用来改变默认方法产生的结果
  • The compiler monomorphises each combination — zero vtable overhead / 编译器 单态化 每种组合 —— 零 vtable 开销

The Problem: Cross-Cutting Bus Dependencies / 问题:横向切割的总线依赖

Hardware diagnostic routines share common operations — read an IPMI sensor, toggle a GPIO rail, sample a temperature over SPI — but different diagnostics need different combinations. Inheritance hierarchies don’t exist in Rust. Passing every bus handle as a function argument creates unwieldy signatures. We need a way to mix in bus capabilities à la carte.

硬件诊断程序共享一些常见操作 —— 读取 IPMI 传感器、切换 GPIO 栏、通过 SPI 采样温度 —— 但不同的诊断需要不同的组合。Rust 中不存在继承层次结构。将每个总线句柄作为函数参数传递会产生笨重的签名。我们需要一种能够按需 注入 (mix in) 总线能力的方法。

Step 1 — Define “Ingredient” Traits / 第 1 步 —— 定义“组件” Trait

Each ingredient provides one hardware capability via an associated type:

每个组件通过关联类型提供一种硬件能力:

#![allow(unused)]
fn main() {
use std::io;

// ── Bus abstractions (traits the hardware team provides) ──────────
// ── 总线抽象(硬件团队提供的 trait) ──────────
pub trait SpiBus {
    fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> io::Result<()>;
}

pub trait I2cBus {
    fn i2c_read(&self, addr: u8, reg: u8, buf: &mut [u8]) -> io::Result<()>;
    fn i2c_write(&self, addr: u8, reg: u8, data: &[u8]) -> io::Result<()>;
}

pub trait GpioPin {
    fn set_high(&self) -> io::Result<()>;
    fn set_low(&self) -> io::Result<()>;
    fn read_level(&self) -> io::Result<bool>;
}

pub trait IpmiBmc {
    fn raw_command(&self, net_fn: u8, cmd: u8, data: &[u8]) -> io::Result<Vec<u8>>;
    fn read_sensor(&self, sensor_id: u8) -> io::Result<f64>;
}

// ── Ingredient traits — one per bus, carries an associated type ───
// ── 组件 trait —— 每个总线一个,携带一个关联类型 ───
pub trait HasSpi {
    type Spi: SpiBus;
    fn spi(&self) -> &Self::Spi;
}

pub trait HasI2c {
    type I2c: I2cBus;
    fn i2c(&self) -> &Self::I2c;
}

pub trait HasGpio {
    type Gpio: GpioPin;
    fn gpio(&self) -> &Self::Gpio;
}

pub trait HasIpmi {
    type Ipmi: IpmiBmc;
    fn ipmi(&self) -> &Self::Ipmi;
}
}

Each ingredient is tiny, generic, and testable in isolation.

每个组件都非常微小、通用,且可以独立测试。

Step 2 — Define “Mixin” Traits / 第 2 步 —— 定义“注入 (Mixin)” Trait

A mixin trait declares its required ingredients as supertraits, then provides all its methods via defaults — implementors get them for free:

注入 trait 将其所需的组件声明为 supertrait,然后通过 默认方法 (defaults) 提供其所有方法 —— 实现者可以免费获得它们:

#![allow(unused)]
fn main() {
/// Mixin: fan diagnostics — needs I2C (tachometer) + GPIO (PWM enable)
/// 注入:风扇诊断 —— 需要 I2C(转速计)+ GPIO(PWM 使能)
pub trait FanDiagMixin: HasI2c + HasGpio {
    /// Read fan RPM from the tachometer IC over I2C.
    /// 通过 I2C 从转速计 IC 读取风扇转速 (RPM)。
    fn read_fan_rpm(&self, fan_id: u8) -> io::Result<u32> {
        let mut buf = [0u8; 2];
        self.i2c().i2c_read(0x48 + fan_id, 0x00, &mut buf)?;
        Ok(u16::from_be_bytes(buf) as u32 * 60) // tach counts → RPM
    }

    /// Enable or disable the fan PWM output via GPIO.
    /// 通过 GPIO 启用或禁用风扇 PWM 输出。
    fn set_fan_pwm(&self, enable: bool) -> io::Result<()> {
        if enable { self.gpio().set_high() }
        else      { self.gpio().set_low() }
    }

    /// Full fan health check — read RPM + verify within threshold.
    /// 完整的风扇健康检查 —— 读取 RPM + 验证是否在阈值内。
    fn check_fan_health(&self, fan_id: u8, min_rpm: u32) -> io::Result<bool> {
        let rpm = self.read_fan_rpm(fan_id)?;
        Ok(rpm >= min_rpm)
    }
}

/// Mixin: temperature monitoring — needs SPI (thermocouple ADC) + IPMI (BMC sensors)
/// 注入:温度监测 —— 需要 SPI(热电偶 ADC)+ IPMI(BMC 传感器)
pub trait TempMonitorMixin: HasSpi + HasIpmi {
    /// Read a thermocouple via the SPI ADC (e.g. MAX31855).
    /// 通过 SPI ADC(例如 MAX31855)读取热电偶。
    fn read_thermocouple(&self) -> io::Result<f64> {
        let mut rx = [0u8; 4];
        self.spi().spi_transfer(&[0x00; 4], &mut rx)?;
        let raw = i32::from_be_bytes(rx) >> 18; // 14-bit signed
        Ok(raw as f64 * 0.25)
    }

    /// Read a BMC-managed temperature sensor via IPMI.
    /// 通过 IPMI 读取 BMC 管理的温度传感器。
    fn read_bmc_temp(&self, sensor_id: u8) -> io::Result<f64> {
        self.ipmi().read_sensor(sensor_id)
    }

    /// Cross-validate: thermocouple vs BMC must agree within delta.
    /// 交叉验证:热电偶与 BMC 的读数必须在 delta 范围内一致。
    fn validate_temps(&self, sensor_id: u8, max_delta: f64) -> io::Result<bool> {
        let tc = self.read_thermocouple()?;
        let bmc = self.read_bmc_temp(sensor_id)?;
        Ok((tc - bmc).abs() <= max_delta)
    }
}

/// Mixin: power sequencing — needs GPIO (rail enable) + IPMI (event logging)
/// 注入:电源时序 —— 需要 GPIO(导轨使能)+ IPMI(事件日志记录)
pub trait PowerSeqMixin: HasGpio + HasIpmi {
    /// Assert the power-good GPIO and verify via IPMI sensor.
    /// 断言电源良好 (power-good) GPIO 并通过 IPMI 传感器进行验证。
    fn enable_power_rail(&self, sensor_id: u8) -> io::Result<bool> {
        self.gpio().set_high()?;
        std::thread::sleep(std::time::Duration::from_millis(50));
        let voltage = self.ipmi().read_sensor(sensor_id)?;
        Ok(voltage > 0.8) // above 80% nominal = good
    }

    /// De-assert power and log shutdown via IPMI OEM command.
    /// 取消断言电源并通过 IPMI OEM 命令记录关机。
    fn disable_power_rail(&self) -> io::Result<()> {
        self.gpio().set_low()?;
        // Log OEM "power rail disabled" event to BMC
        self.ipmi().raw_command(0x2E, 0x01, &[0x00, 0x01])?;
        Ok(())
    }
}
}

Step 3 — Blanket Impls Make It Truly “Mixin” / 第 3 步 —— Blanket 实现让它成为真正的“注入 (Mixin)”

The magic line — provide the ingredients, get the methods:

神奇的一行 —— 提供组件,获得方法:

#![allow(unused)]
fn main() {
impl<T: HasI2c + HasGpio>  FanDiagMixin    for T {}
impl<T: HasSpi  + HasIpmi>  TempMonitorMixin for T {}
impl<T: HasGpio + HasIpmi>  PowerSeqMixin   for T {}
}

Any struct that implements the right ingredient traits automatically gains every mixin method — no boilerplate, no forwarding, no inheritance.

任何实现了正确组件 trait 的结构体都会 自动 获得每个注入方法 —— 无样板代码、无转发、无继承。

Step 4 — Wire Up Production / 第 4 步 —— 连接生产环境

#![allow(unused)]
fn main() {
// ── Concrete bus implementations (Linux platform) ────────────────
// ── 具体的总线实现(Linux 平台) ────────────────
struct LinuxSpi  { dev: String }
struct LinuxI2c  { dev: String }
struct SysfsGpio { pin: u32 }
struct IpmiTool  { timeout_secs: u32 }

impl SpiBus for LinuxSpi {
    fn spi_transfer(&self, _tx: &[u8], _rx: &mut [u8]) -> io::Result<()> {
        // spidev ioctl — omitted for brevity / 为了简洁起见省略
        Ok(())
    }
}
impl I2cBus for LinuxI2c {
    fn i2c_read(&self, _addr: u8, _reg: u8, _buf: &mut [u8]) -> io::Result<()> {
        // i2c-dev ioctl — omitted for brevity / 为了简洁起见省略
        Ok(())
    }
    fn i2c_write(&self, _addr: u8, _reg: u8, _data: &[u8]) -> io::Result<()> { Ok(()) }
}
impl GpioPin for SysfsGpio {
    fn set_high(&self) -> io::Result<()>  { /* /sys/class/gpio */ Ok(()) }
    fn set_low(&self) -> io::Result<()>   { Ok(()) }
    fn read_level(&self) -> io::Result<bool> { Ok(true) }
}
impl IpmiBmc for IpmiTool {
    fn raw_command(&self, _nf: u8, _cmd: u8, _data: &[u8]) -> io::Result<Vec<u8>> {
        // shells out to ipmitool — omitted for brevity / 为了简洁起见省略
        Ok(vec![])
    }
    fn read_sensor(&self, _id: u8) -> io::Result<f64> { Ok(25.0) }
}

// ── Production platform — all four buses ─────────────────────────
// ── 生产平台 —— 包含所有四个总线 ─────────────────────────
struct DiagPlatform {
    spi:  LinuxSpi,
    i2c:  LinuxI2c,
    gpio: SysfsGpio,
    ipmi: IpmiTool,
}

impl HasSpi  for DiagPlatform { type Spi  = LinuxSpi;  fn spi(&self)  -> &LinuxSpi  { &self.spi  } }
impl HasI2c  for DiagPlatform { type I2c  = LinuxI2c;  fn i2c(&self)  -> &LinuxI2c  { &self.i2c  } }
impl HasGpio for DiagPlatform { type Gpio = SysfsGpio; fn gpio(&self) -> &SysfsGpio { &self.gpio } }
impl HasIpmi for DiagPlatform { type Ipmi = IpmiTool;  fn ipmi(&self) -> &IpmiTool  { &self.ipmi } }

// DiagPlatform now has ALL mixin methods / DiagPlatform 现在拥有了所有注入方法:
fn production_diagnostics(platform: &DiagPlatform) -> io::Result<()> {
    let rpm = platform.read_fan_rpm(0)?;       // from FanDiagMixin / 来自 FanDiagMixin
    let tc  = platform.read_thermocouple()?;   // from TempMonitorMixin / 来自 TempMonitorMixin
    let ok  = platform.enable_power_rail(42)?;  // from PowerSeqMixin / 来自 PowerSeqMixin
    println!("Fan: {rpm} RPM, Temp: {tc}°C, Power: {ok}");
    Ok(())
}
}

Step 5 — Test With Mocks (No Hardware Required) / 第 5 步 —— 使用 Mock 进行测试(无需硬件)

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;
    use std::cell::Cell;

    struct MockSpi  { temp: Cell<f64> }
    struct MockI2c  { rpm: Cell<u32> }
    struct MockGpio { level: Cell<bool> }
    struct MockIpmi { sensor_val: Cell<f64> }

    impl SpiBus for MockSpi {
        fn spi_transfer(&self, _tx: &[u8], rx: &mut [u8]) -> io::Result<()> {
            // Encode mock temp as MAX31855 format
            // 将模拟温度编码为 MAX31855 格式
            let raw = ((self.temp.get() / 0.25) as i32) << 18;
            rx.copy_from_slice(&raw.to_be_bytes());
            Ok(())
        }
    }
    impl I2cBus for MockI2c {
        fn i2c_read(&self, _addr: u8, _reg: u8, buf: &mut [u8]) -> io::Result<()> {
            let tach = (self.rpm.get() / 60) as u16;
            buf.copy_from_slice(&tach.to_be_bytes());
            Ok(())
        }
        fn i2c_write(&self, _: u8, _: u8, _: &[u8]) -> io::Result<()> { Ok(()) }
    }
    impl GpioPin for MockGpio {
        fn set_high(&self)  -> io::Result<()>   { self.level.set(true);  Ok(()) }
        fn set_low(&self)   -> io::Result<()>   { self.level.set(false); Ok(()) }
        fn read_level(&self) -> io::Result<bool> { Ok(self.level.get()) }
    }
    impl IpmiBmc for MockIpmi {
        fn raw_command(&self, _: u8, _: u8, _: &[u8]) -> io::Result<Vec<u8>> { Ok(vec![]) }
        fn read_sensor(&self, _: u8) -> io::Result<f64> { Ok(self.sensor_val.get()) }
    }

    // ── Partial platform: only fan-related buses ─────────────────
    // ── 部分平台:仅包含风扇相关的总线 ─────────────────
    struct FanTestRig {
        i2c:  MockI2c,
        gpio: MockGpio,
    }
    impl HasI2c  for FanTestRig { type I2c  = MockI2c;  fn i2c(&self)  -> &MockI2c  { &self.i2c  } }
    impl HasGpio for FanTestRig { type Gpio = MockGpio; fn gpio(&self) -> &MockGpio { &self.gpio } }
    // FanTestRig gets FanDiagMixin but NOT TempMonitorMixin or PowerSeqMixin
    // FanTestRig 获得了 FanDiagMixin,但没有获得 TempMonitorMixin 或 PowerSeqMixin

    #[test]
    fn fan_health_check_passes_above_threshold() {
        let rig = FanTestRig {
            i2c:  MockI2c  { rpm: Cell::new(6000) },
            gpio: MockGpio { level: Cell::new(false) },
        };
        assert!(rig.check_fan_health(0, 4000).unwrap());
    }

    #[test]
    fn fan_health_check_fails_below_threshold() {
        let rig = FanTestRig {
            i2c:  MockI2c  { rpm: Cell::new(2000) },
            gpio: MockGpio { level: Cell::new(false) },
        };
        assert!(!rig.check_fan_health(0, 4000).unwrap());
    }
}
}

Notice that FanTestRig only implements HasI2c + HasGpio — it gets FanDiagMixin automatically, but the compiler refuses rig.read_thermocouple() because HasSpi is not satisfied. This is mixin scoping enforced at compile time.

注意,FanTestRig 仅实现了 HasI2c + HasGpio —— 它自动获得了 FanDiagMixin,但编译器会 拒绝 rig.read_thermocouple(),因为 HasSpi 未被满足。这是在编译时强制执行的注入作用域。

Conditional Methods — Beyond What Ruby Can Do / 条件方法 —— 超越 Ruby 的能力

Add where bounds to individual default methods. The method only exists when the associated type satisfies the extra bound:

可以向单个默认方法添加 where 约束。方法仅在关联类型满足额外约束时才 存在

#![allow(unused)]
fn main() {
/// Marker trait for DMA-capable SPI controllers / 支持 DMA 的 SPI 控制器的标记 trait
pub trait DmaCapable: SpiBus {
    fn dma_transfer(&self, tx: &[u8], rx: &mut [u8]) -> io::Result<()>;
}

/// Marker trait for interrupt-capable GPIO pins / 支持中断的 GPIO 引脚的标记 trait
pub trait InterruptCapable: GpioPin {
    fn wait_for_edge(&self, timeout_ms: u32) -> io::Result<bool>;
}

pub trait AdvancedDiagMixin: HasSpi + HasGpio {
    // Always available / 始终可用
    fn basic_probe(&self) -> io::Result<bool> {
        let mut rx = [0u8; 1];
        self.spi().spi_transfer(&[0xFF], &mut rx)?;
        Ok(rx[0] != 0x00)
    }

    // Only exists when the SPI controller supports DMA
    // 仅在 SPI 控制器支持 DMA 时存在
    fn bulk_sensor_read(&self, buf: &mut [u8]) -> io::Result<()>
    where
        Self::Spi: DmaCapable,
    {
        self.spi().dma_transfer(&vec![0x00; buf.len()], buf)
    }

    // Only exists when the GPIO pin supports interrupts
    // 仅在 GPIO 引脚支持中断时存在
    fn wait_for_fault_signal(&self, timeout_ms: u32) -> io::Result<bool>
    where
        Self::Gpio: InterruptCapable,
    {
        self.gpio().wait_for_edge(timeout_ms)
    }
}

impl<T: HasSpi + HasGpio> AdvancedDiagMixin for T {}
}

If your platform’s SPI doesn’t support DMA, calling bulk_sensor_read() is a compile error, not a runtime crash. Ruby’s respond_to? check is the closest equivalent — but it happens at deploy time, not compile time.

如果你的平台的 SPI 不支持 DMA,调用 bulk_sensor_read() 将会导致 编译错误,而不是运行时崩溃。Ruby 的 respond_to? 检查是最接近的等效项 —— 但它发生在部署时,而不是编译时。

Composability: Stacking Mixins / 组合性:堆叠注入

Multiple mixins can share the same ingredient — no diamond problem:

多个注入可以共享相同的组件 —— 没有菱形继承问题:

┌─────────────┐    ┌───────────┐    ┌──────────────┐
│ FanDiagMixin│    │TempMonitor│    │ PowerSeqMixin│
│  (I2C+GPIO) │    │ (SPI+IPMI)│    │  (GPIO+IPMI) │
└──────┬──────┘    └─────┬─────┘    └──────┬───────┘
       │                 │                 │
       │   ┌─────────────┴─────────────┐   │
       └──►│      DiagPlatform         │◄──┘
           │ HasSpi+HasI2c+HasGpio     │
           │        +HasIpmi           │
           └───────────────────────────┘

DiagPlatform implements HasGpio once, and both FanDiagMixin and PowerSeqMixin use the same self.gpio(). In Ruby, this would be two modules both calling self.gpio_pin — but if they expected different pin numbers, you’d discover the conflict at runtime. In Rust, you can disambiguate at the type level.

DiagPlatform 只需实现 一次 HasGpio,而 FanDiagMixinPowerSeqMixin 都会使用同一个 self.gpio()。在 Ruby 中,这将是两个模块都调用 self.gpio_pin —— 但如果它们期望不同的引脚编号,你只会在运行时发现冲突。而在 Rust 中,你可以在类型级别消除歧义。

Comparison: Ruby Mixins vs Rust Capability Mixins / 比较:Ruby Mixin vs Rust 能力注入

Dimension / 维度Ruby MixinsRust Capability Mixins / 能力注入
Dispatch / 分发Runtime (method table lookup) / 运行时(方法表查找)Compile-time (monomorphised) / 编译时(单态化)
Safe composition / 安全组合MRO linearisation hides conflicts / MRO 线性化隐藏了冲突Compiler rejects ambiguity / 编译器拒绝歧义
Conditional methods / 条件方法respond_to? at runtime / 运行时 respond_to?where bounds at compile time / 编译时 where 约束
Overhead / 开销Method dispatch + GC / 方法分发 + GCZero-cost (inlined) / 零成本(内联)
Testability / 可测试性Stub/mock via metaprogramming / 通过元编程进行 Stub/mockGeneric over mock types / 对 Mock 类型泛型化
Adding new buses / 添加新总线include at runtime / 运行时 includeAdd ingredient trait, recompile / 添加组件 trait,重新编译
Runtime flexibility / 运行时灵活性extend, prepend, open classesNone (fully static) / 无(完全静态)

When to Use Capability Mixins / 何时使用能力注入

Scenario / 场景Use Mixins? / 是否使用注入?
Multiple diagnostics share bus-reading logic / 多个诊断程序共享总线读取逻辑
Test harness needs different bus subsets / 测试夹具需要不同的总线子集✅ (partial ingredient structs / 部分组件结构体)
Methods only valid for certain bus capabilities (DMA, IRQ) / 仅对特定总线能力有效的方法✅ (conditional where bounds / 条件 where 约束)
You need runtime module loading (plugins) / 需要运行时模块加载(插件)❌ (use dyn Trait or enum dispatch / 使用 dyn Trait 或枚举分发)
Single struct with one bus — no sharing needed / 只有一个总线的单个结构体❌ (keep it simple / 保持简单)
Cross-crate ingredients with coherence issues / 存在一致性问题的跨 crate 组件⚠️ (use newtype wrappers / 使用 newtype 包装)

Key Takeaways — Capability Mixins / 核心要点 —— 能力注入

  1. Ingredient trait = associated type + accessor method (e.g., HasSpi) / 组件 trait = 关联类型 + 访问器方法(例如 HasSpi
  2. Mixin trait = supertrait bounds on ingredients + default method bodies / 注入 trait = 组件上的 supertrait 约束 + 默认方法体
  3. Blanket impl = impl<T: HasX + HasY> Mixin for T {} — auto-injects methods / Blanket 实现 = impl<T: HasX + HasY> Mixin for T {} —— 自动注入方法
  4. Conditional methods = where Self::Spi: DmaCapable on individual defaults / 条件方法 = 在单个默认方法上使用 where Self::Spi: DmaCapable
  5. Partial platforms = test structs that only impl the needed ingredients / 部分平台 = 仅实现所需组件的测试结构体
  6. No runtime cost — the compiler generates specialised code for each platform type / 无运行时开销 —— 编译器会为每个平台类型生成专门的代码

Typed Commands — GADT-Style Return Type Safety / 类型化命令 —— GADT 风格的返回类型安全性

In Haskell, Generalised Algebraic Data Types (GADTs) let each constructor of a data type refine the type parameter — so Expr Int and Expr Bool are enforced by the type checker. Rust has no direct GADT syntax, but traits with associated types achieve the same guarantee: the command type determines the response type, and mixing them up is a compile error.

在 Haskell 中,广义代数数据类型 (GADTs) 允许数据类型的每个构造函数细化类型参数 —— 这样 Expr IntExpr Bool 就能由类型检查器强制执行。Rust 没有直接的 GADT 语法,但 具有关联类型的 trait 可以实现同样的保证:命令类型 决定 了响应类型,而混淆它们会导致编译错误。

This pattern is particularly powerful for hardware diagnostics, where IPMI commands, register reads, and sensor queries each return different physical quantities that should never be confused.

这种模式对于硬件诊断特别强大,在这种场景下,IPMI 命令、寄存器读取和传感器查询各自返回不同的物理量,且永远不应该被混淆。

The Problem: The Untyped Vec<u8> Swamp / 问题:无类型的 Vec<u8> 沼泽

Most C/C++ IPMI stacks — and naïve Rust ports — use raw bytes everywhere:

大多数 C/C++ IPMI 栈 —— 以及幼稚的 Rust 移植版本 —— 到处都在使用原始字节:

#![allow(unused)]
fn main() {
use std::io;

struct BmcConnectionUntyped { timeout_secs: u32 }

impl BmcConnectionUntyped {
    fn raw_command(&self, net_fn: u8, cmd: u8, data: &[u8]) -> io::Result<Vec<u8>> {
        // ... shells out to ipmitool ...
        Ok(vec![0x00, 0x19, 0x00]) // stub
    }
}

fn diagnose_thermal_untyped(bmc: &BmcConnectionUntyped) -> io::Result<()> {
    // Read CPU temperature — sensor ID 0x20
    // 读取 CPU 温度 —— 传感器 ID 0x20
    let raw = bmc.raw_command(0x04, 0x2D, &[0x20])?;
    let cpu_temp = raw[0] as f64;  // 🤞 hope byte 0 is the reading / 🤞 希望第 0 个字节就是读数

    // Read fan speed — sensor ID 0x30
    // 读取风扇速度 —— 传感器 ID 0x30
    let raw = bmc.raw_command(0x04, 0x2D, &[0x30])?;
    let fan_rpm = raw[0] as u32;  // 🐛 BUG: fan speed is 2 bytes LE / 🐛 错误:风扇速度是 2 字节小端序

    // Read inlet voltage — sensor ID 0x40
    // 读取入口电压 —— 传感器 ID 0x40
    let raw = bmc.raw_command(0x04, 0x2D, &[0x40])?;
    let voltage = raw[0] as f64;  // 🐛 BUG: need to divide by 1000 / 🐛 错误:需要除以 1000

    // 🐛 Comparing °C to RPM — compiles, but nonsensical
    // 🐛 比较摄氏度与 RPM —— 虽然能编译通过,但毫无意义
    if cpu_temp > fan_rpm as f64 {
        println!("uh oh");
    }

    // 🐛 Passing Volts as temperature — compiles fine
    // 🐛 将电压作为温度传递 —— 编译正常
    log_temp_untyped(voltage);
    log_volts_untyped(cpu_temp);

    Ok(())
}

fn log_temp_untyped(t: f64)  { println!("Temp: {t}°C"); }
fn log_volts_untyped(v: f64) { println!("Voltage: {v}V"); }
}

Every reading is f64 — the compiler has no idea that one is a temperature, another is RPM, another is voltage. Four distinct bugs compile without warning:

每个读数都是 f64 —— 编译器不知道一个是温度、另一个是 RPM、还有一个是电压。四个不同的错误在没有任何警告的情况下通过了编译:

#Bug / 错误Consequence / 后果Discovered / 何时被发现
1Fan RPM parsed as 1 byte instead of 2 / 风扇 RPM 解析为 1 字节而非 2 字节Reads 25 RPM instead of 6400 / 读数为 25 RPM 而非 6400Production, 3 AM fan-failure flood / 生产环境凌晨三点的风扇故障报警潮
2Voltage not divided by 1000 / 电压没有除以 100012000V instead of 12.0V / 读数为 12000V 而非 12.0VThreshold check flags every PSU / 阈值检查标记了每个电源
3Comparing °C to RPM / 比较摄氏度与 RPMMeaningless boolean / 毫无意义的布尔值Possibly never / 可能永远不会发现
4Voltage passed to log_temp_untyped() / 电压被传递给 log_temp_untyped()Silent data corruption in logs / 日志中无声的数据损坏6 months later, reading history / 6 个月后查阅历史记录时

The Solution: Typed Commands via Associated Types / 解决方案:通过关联类型实现类型化命令

Step 1 — Domain newtypes / 第 1 步 —— 领域 Newtype

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Celsius(f64);

#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Rpm(u32);

#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Volts(f64);

#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Watts(f64);
}

Step 2 — The command trait (the GADT equivalent) / 第 2 步 —— 命令 Trait(GADT 的等效实现)

The associated type Response is the key — it binds each command to its return type:

关联类型 Response 是关键 —— 它将每个命令与其返回类型绑定在一起:

#![allow(unused)]
fn main() {
trait IpmiCmd {
    /// The GADT "index" — determines what execute() returns.
    /// GADT “索引” —— 决定了 execute() 的返回值。
    type Response;

    fn net_fn(&self) -> u8;
    fn cmd_byte(&self) -> u8;
    fn payload(&self) -> Vec<u8>;

    /// Parsing is encapsulated HERE — each command knows its own byte layout.
    /// 解析被封装在这里 —— 每个命令都知道自己的字节布局。
    fn parse_response(&self, raw: &[u8]) -> io::Result<Self::Response>;
}
}

Step 3 — One struct per command, parsing written once / 第 3 步 —— 每个命令一个结构体,解析代码只需写一次

#![allow(unused)]
fn main() {
struct ReadTemp { sensor_id: u8 }
impl IpmiCmd for ReadTemp {
    type Response = Celsius;  // ← "this command returns a temperature" / ← “该命令返回温度”
    fn net_fn(&self) -> u8 { 0x04 }
    fn cmd_byte(&self) -> u8 { 0x2D }
    fn payload(&self) -> Vec<u8> { vec![self.sensor_id] }
    fn parse_response(&self, raw: &[u8]) -> io::Result<Celsius> {
        // Signed byte per IPMI SDR — written once, tested once
        // 根据 IPMI SDR 定义的有符号字节 —— 编写一次,测试一次
        Ok(Celsius(raw[0] as i8 as f64))
    }
}

struct ReadFanSpeed { fan_id: u8 }
impl IpmiCmd for ReadFanSpeed {
    type Response = Rpm;     // ← "this command returns RPM" / ← “该命令返回 RPM”
    fn net_fn(&self) -> u8 { 0x04 }
    fn cmd_byte(&self) -> u8 { 0x2D }
    fn payload(&self) -> Vec<u8> { vec![self.fan_id] }
    fn parse_response(&self, raw: &[u8]) -> io::Result<Rpm> {
        // 2-byte LE — the correct layout, encoded once
        // 2 字节小端序 —— 正确的布局,编码一次即可
        Ok(Rpm(u16::from_le_bytes([raw[0], raw[1]]) as u32))
    }
}

struct ReadVoltage { rail: u8 }
impl IpmiCmd for ReadVoltage {
    type Response = Volts;   // ← "this command returns voltage" / ← “该命令返回电压”
    fn net_fn(&self) -> u8 { 0x04 }
    fn cmd_byte(&self) -> u8 { 0x2D }
    fn payload(&self) -> Vec<u8> { vec![self.rail] }
    fn parse_response(&self, raw: &[u8]) -> io::Result<Volts> {
        // Millivolts → Volts, always correct
        // 毫伏 → 伏特,始终正确
        Ok(Volts(u16::from_le_bytes([raw[0], raw[1]]) as f64 / 1000.0))
    }
}

struct ReadFru { fru_id: u8 }
impl IpmiCmd for ReadFru {
    type Response = String;
    fn net_fn(&self) -> u8 { 0x0A }
    fn cmd_byte(&self) -> u8 { 0x11 }
    fn payload(&self) -> Vec<u8> { vec![self.fru_id, 0x00, 0x00, 0xFF] }
    fn parse_response(&self, raw: &[u8]) -> io::Result<String> {
        Ok(String::from_utf8_lossy(raw).to_string())
    }
}
}

Step 4 — The executor (zero dyn, monomorphised) / 第 4 步 —— 执行器(零 dyn,单态化)

#![allow(unused)]
fn main() {
struct BmcConnection { timeout_secs: u32 }

impl BmcConnection {
    /// Generic over any command — compiler generates one version per command type.
    /// 对任何命令泛型化 —— 编译器为每种命令类型生成一个版本。
    fn execute<C: IpmiCmd>(&self, cmd: &C) -> io::Result<C::Response> {
        let raw = self.raw_send(cmd.net_fn(), cmd.cmd_byte(), &cmd.payload())?;
        cmd.parse_response(&raw)
    }

    fn raw_send(&self, _nf: u8, _cmd: u8, _data: &[u8]) -> io::Result<Vec<u8>> {
        Ok(vec![0x19, 0x00]) // stub — real impl calls ipmitool / 存根 —— 实际实现会调用 ipmitool
    }
}
}

Step 5 — Caller code: all four bugs become compile errors / 第 5 步 —— 调用者代码:所有四个错误都变成了编译错误

#![allow(unused)]
fn main() {
fn diagnose_thermal(bmc: &BmcConnection) -> io::Result<()> {
    let cpu_temp: Celsius = bmc.execute(&ReadTemp { sensor_id: 0x20 })?;
    let fan_rpm:  Rpm     = bmc.execute(&ReadFanSpeed { fan_id: 0x30 })?;
    let voltage:  Volts   = bmc.execute(&ReadVoltage { rail: 0x40 })?;

    // Bug #1 — IMPOSSIBLE: parsing lives in ReadFanSpeed::parse_response
    // 错误 #1 —— 不可能发生:解析逻辑在 ReadFanSpeed::parse_response 中
    // Bug #2 — IMPOSSIBLE: scaling lives in ReadVoltage::parse_response
    // 错误 #2 —— 不可能发生:缩放逻辑在 ReadVoltage::parse_response 中

    // Bug #3 — COMPILE ERROR / 错误 #3 —— 编译错误:
    // if cpu_temp > fan_rpm { }
    //    ^^^^^^^^   ^^^^^^^
    //    Celsius    Rpm      → "mismatched types" ❌

    // Bug #4 — COMPILE ERROR / 错误 #4 —— 编译错误:
    // log_temperature(voltage);
    //                 ^^^^^^^  Volts, expected Celsius ❌

    // Only correct comparisons compile / 只有正确的比较才能编译:
    if cpu_temp > Celsius(85.0) {
        println!("CPU overheating: {:?}", cpu_temp);
    }
    if fan_rpm < Rpm(4000) {
        println!("Fan too slow: {:?}", fan_rpm);
    }

    Ok(())
}

fn log_temperature(t: Celsius) { println!("Temp: {:?}", t); }
fn log_voltage(v: Volts)       { println!("Voltage: {:?}", v); }
}

Macro DSL for Diagnostic Scripts / 诊断脚本的宏 DSL

For large diagnostic routines that run many commands in sequence, a macro gives concise declarative syntax while preserving full type safety:

对于需要按顺序运行许多命令的大型诊断程序,宏可以提供简明的声明式语法,同时保留完整的类型安全性:

#![allow(unused)]
fn main() {
/// Execute a series of typed IPMI commands, returning a tuple of results.
/// Each element of the tuple has the command's own Response type.
/// 执行一系列类型化的 IPMI 命令,返回一个结果元组。
/// 元组的每个元素都具有命令自己的 Response 类型。
macro_rules! diag_script {
    ($bmc:expr; $($cmd:expr),+ $(,)?) => {{
        ( $( $bmc.execute(&$cmd)?, )+ )
    }};
}

fn full_pre_flight(bmc: &BmcConnection) -> io::Result<()> {
    // Expands to: (Celsius, Rpm, Volts, String) — every type tracked
    // 展开为:(Celsius, Rpm, Volts, String) —— 每个类型都被追踪
    let (temp, rpm, volts, board_pn) = diag_script!(bmc;
        ReadTemp     { sensor_id: 0x20 },
        ReadFanSpeed { fan_id:    0x30 },
        ReadVoltage  { rail:      0x40 },
        ReadFru      { fru_id:    0x00 },
    );

    println!("Board: {:?}", board_pn);
    println!("CPU: {:?}, Fan: {:?}, 12V: {:?}", temp, rpm, volts);

    // Type-safe threshold checks / 类型安全的阈值检查:
    assert!(temp  < Celsius(95.0), "CPU too hot");
    assert!(rpm   > Rpm(3000),     "Fan too slow");
    assert!(volts > Volts(11.4),   "12V rail sagging");

    Ok(())
}
}

The macro is just syntactic sugar — the tuple type (Celsius, Rpm, Volts, String) is fully inferred by the compiler. Swap two commands and the destructuring breaks at compile time, not at runtime.

宏仅仅是语法糖 —— 元组类型 (Celsius, Rpm, Volts, String) 完全由编译器推导。交换两个命令,解构逻辑将在编译时报错,而不是在运行时。

Enum Dispatch for Heterogeneous Command Lists / 异构命令列表的枚举分发

When you need a Vec of mixed commands (e.g., a configurable script loaded from JSON), use enum dispatch to stay dyn-free:

当你需要一个混合命令的 Vec(例如从 JSON 加载的可配置脚本)时,请使用枚举分发以保持无 dyn

#![allow(unused)]
fn main() {
enum AnyReading {
    Temp(Celsius),
    Rpm(Rpm),
    Volt(Volts),
    Text(String),
}

enum AnyCmd {
    Temp(ReadTemp),
    Fan(ReadFanSpeed),
    Voltage(ReadVoltage),
    Fru(ReadFru),
}

impl AnyCmd {
    fn execute(&self, bmc: &BmcConnection) -> io::Result<AnyReading> {
        match self {
            AnyCmd::Temp(c)    => Ok(AnyReading::Temp(bmc.execute(c)?)),
            AnyCmd::Fan(c)     => Ok(AnyReading::Rpm(bmc.execute(c)?)),
            AnyCmd::Voltage(c) => Ok(AnyReading::Volt(bmc.execute(c)?)),
            AnyCmd::Fru(c)     => Ok(AnyReading::Text(bmc.execute(c)?)),
        }
    }
}

/// Dynamic diagnostic script — commands loaded at runtime
/// 动态诊断脚本 —— 在运行时加载命令
fn run_script(bmc: &BmcConnection, script: &[AnyCmd]) -> io::Result<Vec<AnyReading>> {
    script.iter().map(|cmd| cmd.execute(bmc)).collect()
}
}

You lose per-element type tracking (everything is AnyReading), but you gain runtime flexibility — and the parsing is still encapsulated in each IpmiCmd impl.

你失去了对每个元素的类型追踪(所有内容都是 AnyReading),但你获得了运行时的灵活性 —— 并且解析逻辑仍然封装在每个 IpmiCmd 实现中。

Testing Typed Commands / 测试类型化命令

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    struct StubBmc {
        responses: std::collections::HashMap<u8, Vec<u8>>,
    }

    impl StubBmc {
        fn execute<C: IpmiCmd>(&self, cmd: &C) -> io::Result<C::Response> {
            let key = cmd.payload()[0]; // sensor ID as key / 传感器 ID 作为 key
            let raw = self.responses.get(&key)
                .ok_or_else(|| io::Error::new(io::ErrorKind::NotFound, "no stub"))?;
            cmd.parse_response(raw)
        }
    }

    #[test]
    fn read_temp_parses_signed_byte() {
        let bmc = StubBmc {
            responses: [( 0x20, vec![0xE7] )].into() // -25 as i8 = 0xE7
        };
        let temp = bmc.execute(&ReadTemp { sensor_id: 0x20 }).unwrap();
        assert_eq!(temp, Celsius(-25.0));
    }

    #[test]
    fn read_fan_parses_two_byte_le() {
        let bmc = StubBmc {
            responses: [( 0x30, vec![0x00, 0x19] )].into() // 0x1900 = 6400
        };
        let rpm = bmc.execute(&ReadFanSpeed { fan_id: 0x30 }).unwrap();
        assert_eq!(rpm, Rpm(6400));
    }

    #[test]
    fn read_voltage_scales_millivolts() {
        let bmc = StubBmc {
            responses: [( 0x40, vec![0xE8, 0x2E] )].into() // 0x2EE8 = 12008 mV
        };
        let v = bmc.execute(&ReadVoltage { rail: 0x40 }).unwrap();
        assert!((v.0 - 12.008).abs() < 0.001);
    }
}
}

Each command’s parsing is tested independently. If ReadFanSpeed changes from 2-byte LE to 4-byte BE in a new IPMI spec revision, you update one parse_response and the test catches regressions.

每个命令的解析都是独立测试的。如果在新的 IPMI 规范修订版中,ReadFanSpeed 从 2 字节小端序变为 4 字节大端序,你只需更新 一个 parse_response,测试即可捕获回归。

How This Maps to Haskell GADTs / 这与 Haskell GADT 的映射关系

| Haskell GADT | Rust Equivalent / Rust 等效实现 | | ──────────────── | ─────────────────────── | | data Cmd a where | trait IpmiCmd { type Response; ... } | | ReadTemp :: SensorId -> Cmd Temp | struct ReadTemp { .. } | | ReadFan :: FanId -> Cmd Rpm | struct ReadFanSpeed { .. } | | eval :: Cmd a -> IO a | fn execute<C: IpmiCmd>(&self, cmd: &C) -> io::Result<C::Response> | | Type refinement in case branches / case 分支中的类型细化 | Monomorphisation: compiler generates / 单态化:编译器生成具体版本 |

Both guarantee: the command determines the return type. Rust achieves it through generic monomorphisation instead of type-level case analysis — same safety, zero runtime cost.

两者都保证:命令决定了返回类型。Rust 通过泛型单态化而不是类型级的 case 分析来实现这一点 —— 安全性相同,且零运行时成本。

Before vs After Summary / 修改前 vs 修改后总结

Dimension / 维度Untyped (Vec<u8>) / 无类型 (Vec<u8>)Typed Commands / 类型化命令
Lines per sensor / 每个传感器的代码行数~3 (duplicated at every call site / 每个调用处重复)~15 (written and tested once / 编写和测试一次)
Parsing errors possible / 可能出现解析错误At every call site / 在每个调用处In one parse_response impl / 在一个 parse_response 实现中
Unit confusion bugs / 单位混淆错误Unlimited / 无限制Zero (compile error) / 零(编译错误)
Adding a new sensor / 添加新传感器Touch N files, copy-paste parsing / 修改 N 个文件,复制粘贴解析逻辑Add 1 struct + 1 impl / 添加 1 个结构体 + 1 个实现
Runtime cost / 运行时开销Identical (monomorphised) / 相同(单态化)
IDE autocomplete / IDE 自动补全f64 everywhereCelsius, Rpm, Volts — self-documenting
Code review burden / 代码审查负担Must verify every raw byte parse / 必须核实每一个原始字节解析Verify one parse_response per sensor / 每个传感器核实一次
Macro DSLN/Adiag_script!(bmc; ReadTemp{..}, ReadFan{..})(Celsius, Rpm)
Dynamic scripts / 动态脚本Manual dispatch / 手动分发AnyCmd enum — still dyn-free / 仍然无 dyn

When to Use Typed Commands / 何时使用类型化命令

Scenario / 场景Recommendation / 建议
IPMI sensor reads with distinct physical units / 具有不同物理单位的 IPMI 传感器读取✅ Typed commands
Register map with different-width fields / 具有不同宽度字段的寄存器映射✅ Typed commands
Network protocol messages (request → response) / 网络协议消息(请求 → 响应)✅ Typed commands
Single command type with one return format / 只有一种返回格式的单个命令类型❌ Overkill — just return the type directly / 过度设计 —— 直接返回类型即可
Prototyping / exploring an unknown device / 原型设计 / 探索未知设备❌ Raw bytes first, type later / 先使用原始字节,稍后再类型化
Plugin system where commands aren’t known at compile time / 编译时未知的插件系统⚠️ Use AnyCmd enum dispatch / 使用 AnyCmd 枚举分发

Key Takeaways — Traits / 核心要点 —— Trait

  • Associated types = one impl per type; generic parameters = many impls per type / 关联类型 = 每个类型一个实现;泛型参数 = 每个类型多个实现
  • GATs unlock lending iterators and async-in-traits patterns / GAT 开启了 lending iterator 和 async-in-traits 模式
  • Use enum dispatch for closed sets (fast); dyn Trait for open sets (flexible) / 对封闭集使用枚举分发(快);对开放集使用 dyn Trait(灵活)
  • Any + TypeId is the escape hatch when compile-time types are unknown / 当编译时类型未知时,Any + TypeId 是逃生舱

See also / 另请参阅: Ch 1 — Generics for monomorphization and when generics cause code bloat. Ch 3 — Newtype & Type-State for using traits with the config trait pattern.

查看 Ch 1 —— 泛型 了解单态化以及泛型何时会导致代码膨胀。查看 Ch 3 —— Newtype 与类型状态模式 了解如何将 trait 与 config trait 模式结合使用。


Exercise: Repository with Associated Types ★★★ (~40 min) / 练习:具有关联类型的 Respository ★★★(约 40 分钟)

Design a Repository trait with associated Error, Id, and Item types. Implement it for an in-memory store and demonstrate compile-time type safety.

设计一个具有关联类型 ErrorIdItemRepository trait。为一个内存存储(in-memory store)实现它,并演示编译时类型安全性。

🔑 Solution / 🔑 解决方案
use std::collections::HashMap;

trait Repository {
    type Item;
    type Id;
    type Error;

    fn get(&self, id: &Self::Id) -> Result<Option<&Self::Item>, Self::Error>;
    fn insert(&mut self, item: Self::Item) -> Result<Self::Id, Self::Error>;
    fn delete(&mut self, id: &Self::Id) -> Result<bool, Self::Error>;
}

#[derive(Debug, Clone)]
struct User {
    name: String,
    email: String,
}

struct InMemoryUserRepo {
    data: HashMap<u64, User>,
    next_id: u64,
}

impl InMemoryUserRepo {
    fn new() -> Self {
        InMemoryUserRepo { data: HashMap::new(), next_id: 1 }
    }
}

impl Repository for InMemoryUserRepo {
    type Item = User;
    type Id = u64;
    type Error = std::convert::Infallible;

    fn get(&self, id: &u64) -> Result<Option<&User>, Self::Error> {
        Ok(self.data.get(id))
    }

    fn insert(&mut self, item: User) -> Result<u64, Self::Error> {
        let id = self.next_id;
        self.next_id += 1;
        self.data.insert(id, item);
        Ok(id)
    }

    fn delete(&mut self, id: &u64) -> Result<bool, Self::Error> {
        Ok(self.data.remove(id).is_some())
    }
}

fn create_and_fetch<R: Repository>(repo: &mut R, item: R::Item) -> Result<(), R::Error>
where
    R::Item: std::fmt::Debug,
    R::Id: std::fmt::Debug,
{
    let id = repo.insert(item)?;
    println!("Inserted with id: {id:?}");
    let retrieved = repo.get(&id)?;
    println!("Retrieved: {retrieved:?}");
    Ok(())
}

fn main() {
    let mut repo = InMemoryUserRepo::new();
    create_and_fetch(&mut repo, User {
        name: "Alice".into(),
        email: "alice@example.com".into(),
    }).unwrap();
}

3. The Newtype and Type-State Patterns / 3. Newtype 与类型状态 (Type-State) 模式 🟡

What you’ll learn / 你将学到:

  • The newtype pattern for zero-cost compile-time type safety / 用于零成本编译时类型安全性的 Newtype 模式
  • Type-state pattern: making illegal state transitions unrepresentable / 类型状态模式:使非法的状态转换变得无法表示
  • Builder pattern with type states for compile-time–enforced construction / 结合类型状态的 Builder 模式,用于编译时强制执行的构建过程
  • Config trait pattern for taming generic parameter explosion / 用于治理泛型参数爆炸的 Config trait 模式

Newtype: Zero-Cost Type Safety / Newtype:零成本类型安全

The newtype pattern wraps an existing type in a single-field tuple struct to create a distinct type with zero runtime overhead:

Newtype 模式将现有类型包装在单字段元组结构体中,以创建一种具有零运行时开销的独特类型:

#![allow(unused)]
fn main() {
// Without newtypes — easy to mix up:
// 不使用 Newtype —— 很容易混淆:
fn create_user(name: String, email: String, age: u32, employee_id: u32) { }
// create_user(name, email, age, id);  — but what if we swap age and id?
// create_user(name, email, age, id);  — 但如果我们交换了 age 和 id 呢?
// create_user(name, email, id, age);  — COMPILES FINE, BUG
// create_user(name, email, id, age);  — 编译正常,但存在 BUG

// With newtypes — the compiler catches mistakes:
// 使用 Newtype —— 编译器会捕获错误:
struct UserName(String);
struct Email(String);
struct Age(u32);
struct EmployeeId(u32);

fn create_user(name: UserName, email: Email, age: Age, id: EmployeeId) { }
// create_user(name, email, EmployeeId(42), Age(30));
// ❌ Compile error: expected Age, got EmployeeId
// ❌ 编译错误:期望 Age 类型,得到的是 EmployeeId 类型
}

impl Deref for Newtypes — Power and Pitfalls / 为 Newtype 实现 Deref —— 威力与陷阱

Implementing Deref on a newtype lets it auto-coerce to the inner type’s reference, giving you all of the inner type’s methods “for free”:

在 Newtype 上实现 Deref 可以让它自动强制转换为内部类型的引用,从而让你“免费”获得内部类型的所有方法:

#![allow(unused)]
fn main() {
use std::ops::Deref;

struct Email(String);

impl Email {
    fn new(raw: &str) -> Result<Self, &'static str> {
        if raw.contains('@') {
            Ok(Email(raw.to_string()))
        } else {
            Err("invalid email: missing @")
        }
    }
}

impl Deref for Email {
    type Target = str;
    fn deref(&self) -> &str { &self.0 }
}

// Now Email auto-derefs to &str:
// 现在 Email 会自动解引用为 &str:
let email = Email::new("user@example.com").unwrap();
println!("Length: {}", email.len()); // Uses str::len via Deref / 通过 Deref 使用 str::len
}

This is convenient — but it effectively punches a hole through your newtype’s abstraction boundary because every method on the target type becomes callable on your wrapper.

这很方便 —— 但它实际上在你的 Newtype 抽象边界上 打了一个洞,因为目标类型上的 每一个 方法在你的包装类型上都变得可调用了。

When Deref IS appropriate / 何时使用 Deref 是合适的

Scenario / 场景Example / 示例Why it’s fine / 为什么没问题
Smart-pointer wrappers / 智能指针包装器Box<T>, Arc<T>, MutexGuard<T>The wrapper’s whole purpose is to behave like T / 包装器的全部目的就是表现得像 T
Transparent “thin” wrappers / 透明的“薄”包装器Stringstr, PathBufPath, Vec<T>[T]The wrapper IS-A superset of the target / 包装器是目标类型的超集
Your newtype genuinely IS the inner type / 你的 Newtype 确实就是内部类型struct Hostname(String) where you always want full string opsRestricting the API would add no value / 限制 API 不会带来任何价值

When Deref is an anti-pattern / 何时 Deref 是一种反模式

Scenario / 场景Problem / 问题
Domain types with invariants / 具有不变性的领域类型Email derefs to &str, so callers can call .split_at(), .trim(), etc. — none of which preserve the “must contain @” invariant. If someone stores the trimmed &str and reconstructs, the invariant is lost. / Email 解引用为 &str,因此调用者可以调用 .split_at().trim() 等 —— 这些都无法保证“必须包含 @”的不变性。如果有人存储了裁剪后的 &str 并重新构建,不变性就会丢失。
Types where you want a restricted API / 想要限制 API 的类型struct Password(String) with Deref<Target = str> leaks .as_bytes(), .chars(), Debug output — exactly what you’re trying to hide. / struct Password(String) 如果实现了 Deref<Target = str>,会泄露 .as_bytes().chars()Debug 输出 —— 而这正是你想要隐藏的。
Fake inheritance / 伪继承Using Deref to make ManagerWidget auto-deref to Widget simulates OOP inheritance. This is explicitly discouraged — see the Rust API Guidelines (C-DEREF). / 使用 DerefManagerWidget 自动解引用为 Widget 以模拟 OOP 继承。这在 Rust 中是明确不鼓励的 —— 参见《Rust API 指南》(C-DEREF)。

Rule of thumb: If your newtype exists to add type safety or restrict the API, don’t implement Deref. If it exists to add capabilities while keeping the inner type’s full surface (like a smart pointer), Deref is the right choice.

经验法则:如果你的 Newtype 存在是为了 增加类型安全限制 API,请不要实现 Deref。如果它的存在是为了在保持内部类型完整表面的同时 增加能力(如智能指针),那么 Deref 是正确的选择。

DerefMut — doubles the risk / DerefMut —— 双重风险

If you also implement DerefMut, callers can mutate the inner value directly, bypassing any validation in your constructors:

如果你还实现了 DerefMut,调用者可以直接 修改 内部值,绕过构造函数中的任何验证:

#![allow(unused)]
fn main() {
use std::ops::{Deref, DerefMut};

struct PortNumber(u16);

impl Deref for PortNumber {
    type Target = u16;
    fn deref(&self) -> &u16 { &self.0 }
}

impl DerefMut for PortNumber {
    fn deref_mut(&mut self) -> &mut u16 { &mut self.0 }
}

let mut port = PortNumber(443);
*port = 0; // Bypasses any validation — now an invalid port
           // 绕过了任何验证 —— 现在是一个无效的端口
}

Only implement DerefMut when the inner type has no invariants to protect.

仅当内部类型没有不变性需要保护时,才实现 DerefMut

Prefer explicit delegation instead / 优先选择显式委托

When you want only some of the inner type’s methods, delegate explicitly:

当你只想使用内部类型的 某些 方法时,请进行显式委托:

#![allow(unused)]
fn main() {
struct Email(String);

impl Email {
    fn new(raw: &str) -> Result<Self, &'static str> {
        if raw.contains('@') { Ok(Email(raw.to_string())) }
        else { Err("missing @") }
    }

    // Expose only what makes sense:
    // 仅暴露有意义的部分:
    pub fn as_str(&self) -> &str { &self.0 }
    pub fn len(&self) -> usize { self.0.len() }
    pub fn domain(&self) -> &str {
        self.0.split('@').nth(1).unwrap_or("")
    }
    // .split_at(), .trim(), .replace() — NOT exposed
    // .split_at(), .trim(), .replace() —— 不暴露
}
}

Clippy and the ecosystem / Clippy 与生态系统

  • clippy::wrong_self_convention can fire when Deref coercion makes method resolution surprising (e.g., is_empty() resolving to the inner type’s version instead of one you intended to shadow). / 当 Deref 强制转换使得方法解析出人意料时(例如,is_empty() 解析为内部类型的版本,而不是你打算遮掩的版本),clippy::wrong_self_convention 可能会触发。
  • The Rust API Guidelines (C-DEREF) state: “only smart pointers should implement Deref.” Treat this as a strong default; deviate only with clear justification. / 《Rust API 指南》(C-DEREF) 指出:“只有智能指针应该实现 Deref。” 请将其视为一个强有力的默认规则;仅在有明确理由时才偏离它。
  • If you need trait compatibility (e.g., passing Email to functions expecting &str), consider implementing AsRef<str> and Borrow<str> instead — they’re explicit conversions without auto-coercion surprises. / 如果你需要 trait 兼容性(例如,将 Email 传递给期望 &str 的函数),请考虑实现 AsRef<str>Borrow<str> —— 它们是显式转换,没有自动强制转换带来的意外。

Decision matrix / 决策矩阵

Do you want ALL methods of the inner type to be callable? / 你是否希望内部类型的所有方法都可调用?
  ├─ YES (是) → Does your type enforce invariants or restrict the API? / 你的类型是否强制执行不变性或限制 API?
  │    ├─ NO (否)  → impl Deref ✅  (smart-pointer / transparent wrapper) / (智能指针 / 透明包装器)
  │    └─ YES (是) → Don't impl Deref ❌ (invariant leaks) / 请勿实现 Deref (不变性泄露)
  └─ NO (否)  → Don't impl Deref ❌  (use AsRef / explicit delegation) / (使用 AsRef / 显式委托)

Type-State: Compile-Time Protocol Enforcement / 类型状态 (Type-State):编译时协议强制执行

The type-state pattern uses the type system to enforce that operations happen in the correct order. Invalid states become unrepresentable.

类型状态模式利用类型系统来强制操作按正确的顺序发生。无效的状态变得 无法表示

stateDiagram-v2
    [*] --> Disconnected: new()
    Disconnected --> Connected: connect()
    Connected --> Authenticated: authenticate()
    Authenticated --> Authenticated: request()
    Authenticated --> [*]: drop

    Disconnected --> Disconnected: ❌ request() won't compile
    Connected --> Connected: ❌ request() won't compile

Each transition consumes self and returns a new type — the compiler enforces valid ordering.

每次转换都会 消耗 self 并返回一个新类型 —— 编译器强制执行有效的顺序。

// Problem: A network connection that must be:
// 1. Created
// 2. Connected
// 3. Authenticated
// 4. Then used for requests
// Calling request() before authenticate() should be a COMPILE error.

// 问题:一个网络连接必须按以下顺序操作:
// 1. 创建 (Created)
// 2. 连接 (Connected)
// 3. 认证 (Authenticated)
// 4. 然后用于请求 (Requests)
// 在认证之前调用 request() 应该是一个编译错误。

// --- Type-state markers (zero-sized types) ---
// --- 类型状态标记(零大小类型)---
struct Disconnected;
struct Connected;
struct Authenticated;

// --- Connection parameterized by state ---
// --- 由状态参数化的连接 ---
struct Connection<State> {
    address: String,
    _state: std::marker::PhantomData<State>,
}

// Only Disconnected connections can connect:
// 只有处于“断开连接”状态的连接才能进行连接:
impl Connection<Disconnected> {
    fn new(address: &str) -> Self {
        Connection {
            address: address.to_string(),
            _state: std::marker::PhantomData,
        }
    }

    fn connect(self) -> Connection<Connected> {
        println!("Connecting to {}...", self.address);
        Connection {
            address: self.address,
            _state: std::marker::PhantomData,
        }
    }
}

// Only Connected connections can authenticate:
// 只有处于“已连接”状态的连接才能进行认证:
impl Connection<Connected> {
    fn authenticate(self, _token: &str) -> Connection<Authenticated> {
        println!("Authenticating...");
        Connection {
            address: self.address,
            _state: std::marker::PhantomData,
        }
    }
}

// Only Authenticated connections can make requests:
// 只有处于“已认证”状态的连接才能发送请求:
impl Connection<Authenticated> {
    fn request(&self, path: &str) -> String {
        format!("GET {} from {}", path, self.address)
    }
}

fn main() {
    let conn = Connection::new("api.example.com");
    // conn.request("/data"); // ❌ Compile error: no method `request` on Connection<Disconnected>

    let conn = conn.connect();
    // conn.request("/data"); // ❌ Compile error: no method `request` on Connection<Connected>

    let conn = conn.authenticate("secret-token");
    let response = conn.request("/data"); // ✅ Only works after authentication
    println!("{response}");
}

Key insight: Each state transition consumes self and returns a new type. You can’t use the old state after transitioning — the compiler enforces it. Zero runtime cost — PhantomData is zero-sized, states are erased at compile time.

核心见解:每次状态转换都会 消耗 self 并返回一个新类型。转换后你无法再使用旧状态 —— 编译器会对此进行强制检查。零运行时成本 —— PhantomData 是零大小的,状态在编译时会被擦除。

Comparison with C++/C#: In C++ or C#, you’d enforce this with runtime checks (if (!authenticated) throw ...). The Rust type-state pattern moves these checks to compile time — invalid states are literally unrepresentable in the type system.

与 C++/C# 的比较:在 C++ 或 C# 中,你会通过运行时检查(如 if (!authenticated) throw ...)来强制执行此操作。Rust 的类型状态模式将这些检查移到了编译时 —— 在类型系统中,无效状态在字面上就是无法表示的。

Builder Pattern with Type States / 结合类型状态的 Builder 模式

A practical application — a builder that enforces required fields:

一个实际应用 —— 强制要求填写必填字段的 Builder:

use std::marker::PhantomData;

// Marker types for required fields
// 必填字段的标记类型
struct NeedsName;
struct NeedsPort;
struct Ready;

struct ServerConfig<State> {
    name: Option<String>,
    port: Option<u16>,
    max_connections: usize, // Optional, has default / 可选,有默认值
    _state: PhantomData<State>,
}

impl ServerConfig<NeedsName> {
    fn new() -> Self {
        ServerConfig {
            name: None,
            port: None,
            max_connections: 100,
            _state: PhantomData,
        }
    }

    fn name(self, name: &str) -> ServerConfig<NeedsPort> {
        ServerConfig {
            name: Some(name.to_string()),
            port: self.port,
            max_connections: self.max_connections,
            _state: PhantomData,
        }
    }
}

impl ServerConfig<NeedsPort> {
    fn port(self, port: u16) -> ServerConfig<Ready> {
        ServerConfig {
            name: self.name,
            port: Some(port),
            max_connections: self.max_connections,
            _state: PhantomData,
        }
    }
}

impl ServerConfig<Ready> {
    fn max_connections(mut self, n: usize) -> Self {
        self.max_connections = n;
        self
    }

    fn build(self) -> Server {
        Server {
            name: self.name.unwrap(),
            port: self.port.unwrap(),
            max_connections: self.max_connections,
        }
    }
}

struct Server {
    name: String,
    port: u16,
    max_connections: usize,
}

fn main() {
    // Must provide name, then port, then can build:
    // 必须提供名称,然后是端口,最后才能构建:
    let server = ServerConfig::new()
        .name("my-server")
        .port(8080)
        .max_connections(500)
        .build();

    // ServerConfig::new().port(8080); // ❌ Compile error: no method `port` on NeedsName
    // ServerConfig::new().name("x").build(); // ❌ Compile error: no method `build` on NeedsPort
}


Case Study: Type-Safe Connection Pool / 案例研究:类型安全的连接池

Real-world systems need connection pools where connections move through well-defined states. Here’s how the typestate pattern enforces correctness in a production pool:

现实世界的系统需要连接池,其中的连接会在定义良好的状态之间移动。以下是类型状态 (Type-state) 模式如何在生产环境的连接池中强制执行正确性:

stateDiagram-v2
    [*] --> Idle: pool.acquire()
    Idle --> Active: conn.begin_transaction()
    Active --> Active: conn.execute(query)
    Active --> Idle: conn.commit() / conn.rollback()
    Idle --> [*]: pool.release(conn)

    Active --> [*]: ❌ cannot release mid-transaction
use std::marker::PhantomData;

// States / 状态
struct Idle;
struct InTransaction;

struct PooledConnection<State> {
    id: u32,
    _state: PhantomData<State>,
}

struct Pool {
    next_id: u32,
}

impl Pool {
    fn new() -> Self { Pool { next_id: 0 } }

    fn acquire(&mut self) -> PooledConnection<Idle> {
        self.next_id += 1;
        println!("[pool] Acquired connection #{}", self.next_id);
        PooledConnection { id: self.next_id, _state: PhantomData }
    }

    // Only idle connections can be released — prevents mid-transaction leaks
    // 只有空闲连接可以被释放 —— 防止事务中途泄露
    fn release(&self, conn: PooledConnection<Idle>) {
        println!("[pool] Released connection #{}", conn.id);
    }
}

impl PooledConnection<Idle> {
    fn begin_transaction(self) -> PooledConnection<InTransaction> {
        println!("[conn #{}] BEGIN", self.id);
        PooledConnection { id: self.id, _state: PhantomData }
    }
}

impl PooledConnection<InTransaction> {
    fn execute(&self, query: &str) {
        println!("[conn #{}] EXEC: {}", self.id, query);
    }

    fn commit(self) -> PooledConnection<Idle> {
        println!("[conn #{}] COMMIT", self.id);
        PooledConnection { id: self.id, _state: PhantomData }
    }

    fn rollback(self) -> PooledConnection<Idle> {
        println!("[conn #{}] ROLLBACK", self.id);
        PooledConnection { id: self.id, _state: PhantomData }
    }
}

fn main() {
    let mut pool = Pool::new();

    let conn = pool.acquire();
    let conn = conn.begin_transaction();
    conn.execute("INSERT INTO users VALUES ('Alice')");
    conn.execute("INSERT INTO orders VALUES (1, 42)");
    let conn = conn.commit(); // Back to Idle / 回到空闲状态
    pool.release(conn);       // ✅ Only works on Idle connections / ✅ 仅适用于空闲连接

    // pool.release(conn_active); // ❌ Compile error: can't release InTransaction
    // ❌ 编译错误:无法释放处于事务中的连接
}

Why this matters in production: A connection leaked mid-transaction holds database locks indefinitely. The typestate pattern makes this impossible — you literally cannot return a connection to the pool until the transaction is committed or rolled back.

为什么这在生产环境中很重要:在事务中途泄露的连接会无限期地持有数据库锁。类型状态架构使这种情况变得不可能发生 —— 在事务提交或回滚之前,你字面上无法将连接返回到池中。


Config Trait Pattern — Taming Generic Parameter Explosion / Config Trait 模式 —— 治理泛型参数爆炸

The Problem / 问题

As a struct takes on more responsibilities, each backed by a trait-constrained generic, the type signature grows unwieldy:

随着结构体承担更多职责,而每个职责都由受 trait 约束的泛型支持,其类型签名会变得难以管理:

#![allow(unused)]
fn main() {
trait SpiBus   { fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError>; }
trait ComPort  { fn com_send(&self, data: &[u8]) -> Result<usize, BusError>; }
trait I3cBus   { fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError>; }
trait SmBus    { fn smbus_read_byte(&self, addr: u8, cmd: u8) -> Result<u8, BusError>; }
trait GpioBus  { fn gpio_set(&self, pin: u32, high: bool); }

// ❌ Every new bus trait adds another generic parameter
// ❌ 每增加一个新的总线 trait 都会增加另一个泛型参数
struct DiagController<S: SpiBus, C: ComPort, I: I3cBus, M: SmBus, G: GpioBus> {
    spi: S,
    com: C,
    i3c: I,
    smbus: M,
    gpio: G,
}
// impl blocks, function signatures, and callers all repeat the full list.
// Adding a 6th bus means editing every mention of DiagController<S, C, I, M, G>.

// impl 块、函数签名和调用者都必须重复完整的列表。
// 添加第 6 个总线意味着要修改每一处提到的 DiagController<S, C, I, M, G>。
}

This is often called “generic parameter explosion.” It compounds across impl blocks, function parameters, and downstream consumers — each of which must repeat the full parameter list.

这通常被称为 “泛型参数爆炸”。它会在 impl 块、函数参数和下游消费者之间产生复合影响 —— 每一个都必须重复完整的参数列表。

The Solution: A Config Trait / 解决方案:Config Trait

Bundle all associated types into a single trait. The struct then has one generic parameter regardless of how many component types it contains:

将所有关联类型捆绑到一个 trait 中。这样,无论包含多少个组件类型,结构体都只有 一个 泛型参数:

#![allow(unused)]
fn main() {
#[derive(Debug)]
enum BusError {
    Timeout,
    NakReceived,
    HardwareFault(String),
}

// --- Bus traits (unchanged) ---
// --- 总线 trait(保持不变)---
trait SpiBus {
    fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError>;
    fn spi_write(&self, data: &[u8]) -> Result<(), BusError>;
}

trait ComPort {
    fn com_send(&self, data: &[u8]) -> Result<usize, BusError>;
    fn com_recv(&self, buf: &mut [u8], timeout_ms: u32) -> Result<usize, BusError>;
}

trait I3cBus {
    fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError>;
    fn i3c_write(&self, addr: u8, data: &[u8]) -> Result<(), BusError>;
}

// --- The Config trait: one associated type per component ---
// --- Config trait:每个组件一个关联类型 ---
trait BoardConfig {
    type Spi: SpiBus;
    type Com: ComPort;
    type I3c: I3cBus;
}

// --- DiagController has exactly ONE generic parameter ---
// --- DiagController 只有唯一一个泛型参数 ---
struct DiagController<Cfg: BoardConfig> {
    spi: Cfg::Spi,
    com: Cfg::Com,
    i3c: Cfg::I3c,
}
}

DiagController<Cfg> will never gain another generic parameter. Adding a 4th bus means adding one associated type to BoardConfig and one field to DiagController — no downstream signature changes.

DiagController<Cfg> 永远不会再增加多余的泛型参数。添加第 4 个总线只需在 BoardConfig 中添加一个关联类型,并在 DiagController 中添加一个字段 —— 下游签名无需任何改动。

Implementing the Controller / 实现控制器

#![allow(unused)]
fn main() {
impl<Cfg: BoardConfig> DiagController<Cfg> {
    fn new(spi: Cfg::Spi, com: Cfg::Com, i3c: Cfg::I3c) -> Self {
        DiagController { spi, com, i3c }
    }

    fn read_flash_id(&self) -> Result<u32, BusError> {
        let cmd = [0x9F]; // JEDEC Read ID
        let mut id = [0u8; 4];
        self.spi.spi_transfer(&cmd, &mut id)?;
        Ok(u32::from_be_bytes(id))
    }

    fn send_bmc_command(&self, cmd: &[u8]) -> Result<Vec<u8>, BusError> {
        self.com.com_send(cmd)?;
        let mut resp = vec![0u8; 256];
        let n = self.com.com_recv(&mut resp, 1000)?;
        resp.truncate(n);
        Ok(resp)
    }

    fn read_sensor_temp(&self, sensor_addr: u8) -> Result<i16, BusError> {
        let mut buf = [0u8; 2];
        self.i3c.i3c_read(sensor_addr, &mut buf)?;
        Ok(i16::from_be_bytes(buf))
    }

    fn run_full_diag(&self) -> Result<DiagReport, BusError> {
        let flash_id = self.read_flash_id()?;
        let bmc_resp = self.send_bmc_command(b"VERSION\n")?;
        let cpu_temp = self.read_sensor_temp(0x48)?;
        let gpu_temp = self.read_sensor_temp(0x49)?;

        Ok(DiagReport {
            flash_id,
            bmc_version: String::from_utf8_lossy(&bmc_resp).to_string(),
            cpu_temp_c: cpu_temp,
            gpu_temp_c: gpu_temp,
        })
    }
}

#[derive(Debug)]
struct DiagReport {
    flash_id: u32,
    bmc_version: String,
    cpu_temp_c: i16,
    gpu_temp_c: i16,
}
}

Production Wiring / 生产环境连路

One impl BoardConfig selects the concrete hardware drivers:

通过一个 impl BoardConfig 即可选择具体的硬件驱动程序:

struct PlatformSpi  { dev: String, speed_hz: u32 }
struct UartCom      { dev: String, baud: u32 }
struct LinuxI3c     { dev: String }

impl SpiBus for PlatformSpi {
    fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError> {
        // ioctl(SPI_IOC_MESSAGE) in production / 生产环境中使用 ioctl
        rx[0..4].copy_from_slice(&[0xEF, 0x40, 0x18, 0x00]);
        Ok(())
    }
    fn spi_write(&self, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}

impl ComPort for UartCom {
    fn com_send(&self, _data: &[u8]) -> Result<usize, BusError> { Ok(0) }
    fn com_recv(&self, buf: &mut [u8], _timeout: u32) -> Result<usize, BusError> {
        let resp = b"BMC v2.4.1\n";
        buf[..resp.len()].copy_from_slice(resp);
        Ok(resp.len())
    }
}

impl I3cBus for LinuxI3c {
    fn i3c_read(&self, _addr: u8, buf: &mut [u8]) -> Result<(), BusError> {
        buf[0] = 0x00; buf[1] = 0x2D; // 45°C
        Ok(())
    }
    fn i3c_write(&self, _addr: u8, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}

// ✅ One struct, one impl — all concrete types resolved here
// ✅ 一个结构体,一个实现 —— 此处解析了所有具体类型
struct ProductionBoard;
impl BoardConfig for ProductionBoard {
    type Spi = PlatformSpi;
    type Com = UartCom;
    type I3c = LinuxI3c;
}

fn main() {
    let ctrl = DiagController::<ProductionBoard>::new(
        PlatformSpi { dev: "/dev/spidev0.0".into(), speed_hz: 10_000_000 },
        UartCom     { dev: "/dev/ttyS0".into(),     baud: 115200 },
        LinuxI3c    { dev: "/dev/i3c-0".into() },
    );
    let report = ctrl.run_full_diag().unwrap();
    println!("{report:#?}");
}

Test Wiring with Mocks / 使用 Mock 进行测试连路

Swap the entire hardware layer by defining a different BoardConfig:

通过定义不同的 BoardConfig 即可切换整个硬件层:

#![allow(unused)]
fn main() {
struct MockSpi  { flash_id: [u8; 4] }
struct MockCom  { response: Vec<u8> }
struct MockI3c  { temps: std::collections::HashMap<u8, i16> }

impl SpiBus for MockSpi {
    fn spi_transfer(&self, _tx: &[u8], rx: &mut [u8]) -> Result<(), BusError> {
        rx[..4].copy_from_slice(&self.flash_id);
        Ok(())
    }
    fn spi_write(&self, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}

impl ComPort for MockCom {
    fn com_send(&self, _data: &[u8]) -> Result<usize, BusError> { Ok(0) }
    fn com_recv(&self, buf: &mut [u8], _timeout: u32) -> Result<usize, BusError> {
        let n = self.response.len().min(buf.len());
        buf[..n].copy_from_slice(&self.response[..n]);
        Ok(n)
    }
}

impl I3cBus for MockI3c {
    fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError> {
        let temp = self.temps.get(&addr).copied().unwrap_or(0);
        buf[..2].copy_from_slice(&temp.to_be_bytes());
        Ok(())
    }
    fn i3c_write(&self, _addr: u8, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}

struct TestBoard;
impl BoardConfig for TestBoard {
    type Spi = MockSpi;
    type Com = MockCom;
    type I3c = MockI3c;
}

#[cfg(test)]
mod tests {
    // ... test code ...
}
}

Adding a New Bus Later / 稍后添加新总线

When you need a 4th bus, only two things change — BoardConfig and DiagController. No downstream signature changes. The generic parameter count stays at one:

当你需要第 4 个总线时,只有两处需要更改 —— BoardConfigDiagController下游签名无需改动。 泛型参数的数量保持为 1:

#![allow(unused)]
fn main() {
trait SmBus {
    fn smbus_read_byte(&self, addr: u8, cmd: u8) -> Result<u8, BusError>;
}

// 1. Add one associated type:
// 1. 添加一个关联类型:
trait BoardConfig {
    type Spi: SpiBus;
    type Com: ComPort;
    type I3c: I3cBus;
    type Smb: SmBus;     // ← new / 新增
}

// 2. Add one field:
// 2. 添加一个字段:
struct DiagController<Cfg: BoardConfig> {
    spi: Cfg::Spi,
    com: Cfg::Com,
    i3c: Cfg::I3c,
    smb: Cfg::Smb,       // ← new / 新增
}

// 3. Provide the concrete type in each config impl:
// 3. 在每个 config 实现中提供具体类型:
impl BoardConfig for ProductionBoard {
    type Spi = PlatformSpi;
    type Com = UartCom;
    type I3c = LinuxI3c;
    type Smb = LinuxSmbus; // ← new / 新增
}
}

When to Use This Pattern / 何时使用此模式

Situation / 场景Use Config Trait? / 是否使用 Config Trait?Alternative / 替代方案
3+ trait-constrained generics on a struct / 结构体上有 3 个以上受 trait 约束的泛型✅ Yes (是)
Need to swap entire hardware/platform layer / 需要切换整个硬件/平台层✅ Yes (是)
Only 1-2 generics / 只有 1-2 个泛型❌ Overkill (过度设计)Direct generics / 直接使用泛型
Need runtime polymorphism / 需要运行时多态❌ No (否)dyn Trait objects / dyn Trait 对象
Open-ended plugin system / 开放式插件系统❌ No (否)Type-map / Any
Component traits form a natural group (board, platform) / 组件 trait 构成了自然的分组(主板、平台)✅ Yes (是)

Key Properties / 关键特性

  • One generic parameter foreverDiagController<Cfg> never gains more <A, B, C, ...> / 永远只有一个泛型参数 —— DiagController<Cfg> 绝不会增加成 <A, B, C, ...>
  • Fully static dispatch — no vtables, no dyn, no heap allocation for trait objects / 完全静态分发 —— 没有 vtable,没有 dyn,没有为 trait 对象进行的堆分配。
  • Clean test swapping — define TestBoard with mock impls, zero conditional compilation / 简洁的测试切换 —— 使用 mock 实现定义 TestBoard,零条件编译。
  • Compile-time safety — forget an associated type → compile error, not runtime crash / 编译时安全 —— 忘记关联类型 → 编译错误,而不是运行时崩溃。
  • Battle-tested — this is the pattern used by Substrate/Polkadot’s frame system to manage 20+ associated types through a single Config trait / 经过实战检验 —— 这是 Substrate/Polkadot 的 frame 系统使用的模式,通过单个 Config trait 管理 20 多个关联类型。

Key Takeaways — Newtype & Type-State / 核心要点 —— Newtype 与类型状态

  • Newtypes give compile-time type safety at zero runtime cost / Newtype 以零运行时成本提供编译时类型安全性
  • Type-state makes illegal state transitions a compile error, not a runtime bug / 类型状态使非法状态转换成为编译错误,而非运行时漏洞
  • Config traits tame generic parameter explosion in large systems / Config trait 在大型系统中治理了泛型参数爆炸

See also / 另请参阅: Ch 4 — PhantomData for the zero-sized markers that power type-state. Ch 2 — Traits In Depth for associated types used in the config trait pattern.

查看 Ch 4 —— PhantomData 了解驱动类型状态的零大小标记。查看 Ch 2 —— Trait 深入解析 了解 config trait 模式中使用的关联类型。


Case Study: Dual-Axis Typestate — Vendor × Protocol State / 案例研究:双轴类型状态 —— 厂商 × 协议状态

The patterns above handle one axis at a time: typestate enforces protocol order, and trait abstraction handles multiple vendors. Real systems often need both simultaneously: a wrapper Handle<Vendor, State> where available methods depend on which vendor is plugged in and which state the handle is in.

上述模式一次只处理一个轴:类型状态强制执行 协议顺序,而 trait 抽象处理 多厂商 情况。现实中的系统通常需要 同时满足这两者:一个包装器 Handle<Vendor, State>,其中可用方法取决于插入了 哪个厂商 的组件,以及句柄处于 哪个状态

This section shows the dual-axis conditional impl pattern — where impl blocks are gated on both a vendor trait bound and a state marker trait.

本节展示了 双轴条件 impl 模式 —— 其中 impl 块同时受厂商 trait 约束和状态标记 trait 的限制。

The Two-Dimensional Problem / 二维问题

Consider a debug probe interface (JTAG/SWD). Multiple vendors make probes, and every probe must be unlocked before registers become accessible. Some vendors additionally support direct memory reads — but only after an extended unlock that configures the memory access port:

考虑一个调试探针接口 (JTAG/SWD)。多个厂商生产探针,每个探针在访问寄存器之前都必须先解锁。某些厂商还额外支持直接内存读取 —— 但这仅在通过 扩展解锁 配置了内存访问端口之后才可用:

graph LR
    subgraph "All vendors"
        L["🔒 Locked"] -- "unlock()" --> U["🔓 Unlocked"]
    end
    subgraph "Memory-capable vendors only"
        U -- "extended_unlock()" --> E["🔓🧠 ExtendedUnlocked"]
    end

    U -. "read_reg() / write_reg()" .-> U
    E -. "read_reg() / write_reg()" .-> E
    E -. "read_memory() / write_memory()" .-> E

    style L fill:#fee,stroke:#c33
    style U fill:#efe,stroke:#3a3
    style E fill:#eef,stroke:#33c

The capability matrix — which methods exist for which (vendor, state) combination — is two-dimensional:

能力矩阵 —— 即哪些方法适用于哪种 (厂商, 状态) 组合 —— 是二维的:

block-beta
    columns 4
    space header1["Locked"] header2["Unlocked"] header3["ExtendedUnlocked"]
    basic["Basic Vendor"]:1 b1["unlock()"] b2["read_reg()\nwrite_reg()"] b3["— unreachable —"]
    memory["Memory Vendor"]:1 m1["unlock()"] m2["read_reg()\nwrite_reg()\nextended_unlock()"] m3["read_reg()\nwrite_reg()\nread_memory()\nwrite_memory()"]

    style b1 fill:#ffd,stroke:#aa0
    style b2 fill:#efe,stroke:#3a3
    style b3 fill:#eee,stroke:#999,stroke-dasharray: 5 5
    style m1 fill:#ffd,stroke:#aa0
    style m2 fill:#efe,stroke:#3a3
    style m3 fill:#eef,stroke:#33c

The challenge: express this matrix entirely at compile time, with static dispatch, so that calling extended_unlock() on a basic probe or read_memory() on an unlocked-but-not-extended handle is a compile error.

面临的挑战:完全在编译时 表达这个矩阵,并使用静态分发,使得在基础探针上调用 extended_unlock() 或在已解锁但未扩展的句柄上调用 read_memory() 都会导致编译错误。

The Solution: Jtag<V, S> with Marker Traits / 解决方案:带标记 Trait 的 Jtag<V, S>

Step 1 — State tokens and capability markers: / 第 1 步 —— 状态令牌和能力标记:

use std::marker::PhantomData;

// Zero-sized state tokens — no runtime cost
// 零大小状态令牌 —— 无运行时成本
struct Locked;
struct Unlocked;
struct ExtendedUnlocked;

// Marker traits express which capabilities each state has
// 标记 trait 表达了每个状态具有哪些能力
trait HasRegAccess {}
impl HasRegAccess for Unlocked {}
impl HasRegAccess for ExtendedUnlocked {}

trait HasMemAccess {}
impl HasMemAccess for ExtendedUnlocked {}

Why marker traits, not just concrete states? / 为什么使用标记 trait 而不仅仅是具体状态?

Writing impl<V, S: HasRegAccess> Jtag<V, S> means read_reg() works in any state with register access — today that’s Unlocked and ExtendedUnlocked, but if you add DebugHalted tomorrow, you just add one line: impl HasRegAccess for DebugHalted {}. Every register function works with it automatically — zero code changes.

编写 impl<V, S: HasRegAccess> Jtag<V, S> 意味着 read_reg() 可以在 任何 具有寄存器访问能力的项上工作 —— 目前是 UnlockedExtendedUnlocked,但如果你明天添加了 DebugHalted,你只需增加一行:impl HasRegAccess for DebugHalted {}。所有的寄存器函数都会自动适配它 —— 零代码改动。

Step 2 — Vendor traits (raw operations): / 第 2 步 —— 厂商 trait(原始操作):

// Every probe vendor implements these
// 每个探针厂商都要实现这些
trait JtagVendor {
    fn raw_unlock(&mut self);
    fn raw_read_reg(&self, addr: u32) -> u32;
    fn raw_write_reg(&mut self, addr: u32, val: u32);
}

// Vendors with memory access also implement this super-trait
// 具有内存访问能力的厂商还要实现这个 super-trait
trait JtagMemoryVendor: JtagVendor {
    fn raw_extended_unlock(&mut self);
    fn raw_read_memory(&self, addr: u64, buf: &mut [u8]);
    fn raw_write_memory(&mut self, addr: u64, data: &[u8]);
}

Step 3 — The wrapper with conditional impl blocks: / 第 3 步 —— 带有条件 impl 块的包装器:

struct Jtag<V, S = Locked> {
    vendor: V,
    _state: PhantomData<S>,
}

// Construction — always starts Locked / 构建 —— 初始始终为 Locked 状态
impl<V: JtagVendor> Jtag<V, Locked> {
    fn new(vendor: V) -> Self {
        Jtag { vendor, _state: PhantomData }
    }

    fn unlock(mut self) -> Jtag<V, Unlocked> {
        self.vendor.raw_unlock();
        Jtag { vendor: self.vendor, _state: PhantomData }
    }
}

// Register I/O — any vendor, any state with HasRegAccess
// 寄存器 I/O —— 任何厂商,任何具有 HasRegAccess 能力的状态
impl<V: JtagVendor, S: HasRegAccess> Jtag<V, S> {
    fn read_reg(&self, addr: u32) -> u32 {
        self.vendor.raw_read_reg(addr)
    }
    fn write_reg(&mut self, addr: u32, val: u32) {
        self.vendor.raw_write_reg(addr, val);
    }
}

// Extended unlock — only memory-capable vendors, only from Unlocked
// 扩展解锁 —— 仅限具备内存访问能力的厂商,且仅能从 Unlocked 状态执行
impl<V: JtagMemoryVendor> Jtag<V, Unlocked> {
    fn extended_unlock(mut self) -> Jtag<V, ExtendedUnlocked> {
        self.vendor.raw_extended_unlock();
        Jtag { vendor: self.vendor, _state: PhantomData }
    }
}

// Memory I/O — only memory-capable vendors, only ExtendedUnlocked
// 内存 I/O —— 仅限具备内存访问能力的厂商,且仅限 ExtendedUnlocked 状态
impl<V: JtagMemoryVendor, S: HasMemAccess> Jtag<V, S> {
    fn read_memory(&self, addr: u64, buf: &mut [u8]) {
        self.vendor.raw_read_memory(addr, buf);
    }
    fn write_memory(&mut self, addr: u64, data: &[u8]) {
        self.vendor.raw_write_memory(addr, data);
    }
}

Each impl block encodes one cell (or row) of the capability matrix. The compiler enforces the matrix — no runtime checks anywhere.

每个 impl 块都编码了能力矩阵中的一个单元格(或一行)。编译器强制执行该矩阵 —— 到处都没有运行时检查。

Vendor Implementations / 厂商实现

Adding a vendor means implementing raw methods on one struct — no per-state struct duplication, no delegation boilerplate:

添加厂商意味着在 一个结构体 上实现原始方法 —— 无需为每个状态重复结构体,也没有委托模板代码:

// Vendor A: basic probe — register access only
// 厂商 A:基础探针 —— 仅限寄存器访问
struct BasicProbe { port: u16 }

impl JtagVendor for BasicProbe {
    fn raw_unlock(&mut self)                    { /* TAP reset sequence */ }
    fn raw_read_reg(&self, addr: u32) -> u32    { /* DR scan */  0 }
    fn raw_write_reg(&mut self, addr: u32, val: u32) { /* DR scan */ }
}
// BasicProbe does NOT impl JtagMemoryVendor.
// extended_unlock() will not compile on Jtag<BasicProbe, _>.
// BasicProbe 没有实现 JtagMemoryVendor。
// 在 Jtag<BasicProbe, _> 上调用 extended_unlock() 将无法通过编译。

// Vendor B: full-featured probe — registers + memory
// 厂商 B:全功能探针 —— 寄存器 + 内存
struct DapProbe { serial: String }

impl JtagVendor for DapProbe {
    fn raw_unlock(&mut self)                    { /* SWD switch, read DPIDR */ }
    fn raw_read_reg(&self, addr: u32) -> u32    { /* AP register read */ 0 }
    fn raw_write_reg(&mut self, addr: u32, val: u32) { /* AP register write */ }
}

impl JtagMemoryVendor for DapProbe {
    fn raw_extended_unlock(&mut self)           { /* select MEM-AP, power up */ }
    fn raw_read_memory(&self, addr: u64, buf: &mut [u8])  { /* MEM-AP read */ }
    fn raw_write_memory(&mut self, addr: u64, data: &[u8]) { /* MEM-AP write */ }
}

What the Compiler Prevents / 编译器能阻止什么

Attempt / 尝试Error / 错误Why / 原因
Jtag<_, Locked>::read_reg()no method read_regLocked doesn’t impl HasRegAccess / Locked 没有实现 HasRegAccess
Jtag<BasicProbe, _>::extended_unlock()no method extended_unlockBasicProbe doesn’t impl JtagMemoryVendor / BasicProbe 没有实现 JtagMemoryVendor
Jtag<_, Unlocked>::read_memory()no method read_memoryUnlocked doesn’t impl HasMemAccess / Unlocked 没有实现 HasMemAccess
Calling unlock() twice / 调用两次 unlock()value used after move / 值在移动后被使用unlock() consumes self / unlock() 会消耗 self

All four errors are caught at compile time. No panics, no Option, no runtime state enum.

所有这四类错误都会在 编译时 被捕获。没有恐慌 (Panic),没有 Option,也没有运行时状态枚举。

Writing Generic Functions / 编写泛型函数

Functions bind only the axes they care about:

函数仅绑定它们所关心的轴:

/// Works with ANY vendor, ANY state that grants register access.
/// 适用于任何厂商、任何授权寄存器访问的状态。
fn read_idcode<V: JtagVendor, S: HasRegAccess>(jtag: &Jtag<V, S>) -> u32 {
    jtag.read_reg(0x00)
}

/// Only compiles for memory-capable vendors in ExtendedUnlocked state.
/// 仅适用于具备内存能力的厂商且处于 ExtendedUnlocked 状态。
fn dump_firmware<V: JtagMemoryVendor, S: HasMemAccess>(jtag: &Jtag<V, S>) {
    let mut buf = [0u8; 256];
    jtag.read_memory(0x0800_0000, &mut buf);
}

read_idcode doesn’t care whether you’re in Unlocked or ExtendedUnlocked — it only requires HasRegAccess. This is where marker traits pay off over hardcoding specific states in signatures.

read_idcode 不关心你处于 Unlocked 还是 ExtendedUnlocked 状态 —— 它只需要 HasRegAccess。这就是标记 trait 相比于在签名中硬编码特定状态的优势所在。

Same Pattern, Different Domain: Storage Backends / 相同模式,不同领域:存储后端

The dual-axis technique isn’t hardware-specific. Here’s the same structure for a storage layer where some backends support transactions:

双轴技术并非硬件专有。下面是针对存储层的相同结构,其中某些后端支持事务:

// --- States / 状态 ---
struct Closed;
struct Open;
struct InTransaction;

trait HasReadWrite {}
impl HasReadWrite for Open {}
impl HasReadWrite for InTransaction {}

// --- Vendor traits / 厂商 trait ---
trait StorageBackend {
    fn raw_open(&mut self);
    fn raw_read(&self, key: &[u8]) -> Option<Vec<u8>>;
    fn raw_write(&mut self, key: &[u8], value: &[u8]);
}

trait TransactionalBackend: StorageBackend {
    fn raw_begin(&mut self);
    fn raw_commit(&mut self);
    fn raw_rollback(&mut self);
}

// --- Wrapper / 包装器 ---
struct Store<B, S = Closed> { backend: B, _s: PhantomData<S> }

impl<B: StorageBackend> Store<B, Closed> {
    fn open(mut self) -> Store<B, Open> {
        self.backend.raw_open();
        Store { backend: self.backend, _s: PhantomData }
    }
}

impl<B: StorageBackend, S: HasReadWrite> Store<B, S> {
    fn read(&self, key: &[u8]) -> Option<Vec<u8>>  { self.backend.raw_read(key) }
    fn write(&mut self, key: &[u8], val: &[u8])    { self.backend.raw_write(key, val) }
}

impl<B: TransactionalBackend> Store<B, Open> {
    fn begin(mut self) -> Store<B, InTransaction>   {
        self.backend.raw_begin();
        Store { backend: self.backend, _s: PhantomData }
    }
}

impl<B: TransactionalBackend> Store<B, InTransaction> {
    fn commit(mut self) -> Store<B, Open>           {
        self.backend.raw_commit();
        Store { backend: self.backend, _s: PhantomData }
    }
    fn rollback(mut self) -> Store<B, Open>         {
        self.backend.raw_rollback();
        Store { backend: self.backend, _s: PhantomData }
    }
}

A flat-file backend implements StorageBackend only — begin() won’t compile. A database backend adds TransactionalBackend — the full Open → InTransaction → Open cycle becomes available.

普通文件后端仅实现 StorageBackend —— begin() 将无法编译。数据库后端则增加了 TransactionalBackend —— 于是完整的 Open → InTransaction → Open 循环操作就通过类型系统变为可用。

When to Reach for This Pattern / 何时应该使用此模式

Signal / 信号Why dual-axis fits / 为什么双轴模式适用
Two independent axes: “who provides it” and “what state is it in” / 存在两个独立的轴:“谁提供它”以及“它处于什么状态”The impl block matrix directly encodes both / impl 块矩阵直接对这两者进行了编码
Some providers have strictly more capabilities than others / 某些提供者比其他提供者具备严格更多的能力Super-trait (MemoryVendor: Vendor) + conditional impl / 超级 Trait (MemoryVendor: Vendor) + 条件 impl
Misusing state or capability is a safety/correctness bug / 误用状态或能力会导致安全/正确性漏洞Compile-time prevention > runtime checks / 编译时预防 > 运行时检查
You want static dispatch (no vtables) / 你想要静态分发(无 vtable)PhantomData + generics = zero-cost / PhantomData + 泛型 = 零成本
Signal / 信号Consider something simpler / 考虑更简单的方案
Only one axis varies (state OR vendor, not both) / 只有一个轴在变化(状态或厂商,而非两者)Single-axis typestate or plain trait objects / 单轴类型状态或普通的 trait 对象
Three or more independent axes / 三个或更多独立轴Config Trait Pattern (above) bundles axes into associated types / Config Trait 模式(见上文)将多个轴捆绑到关联类型中
Runtime polymorphism is acceptable / 运行时多态是可以接受的enum state + dyn dispatch is simpler / enum 状态 + dyn 分发更简单

When two axes become three or more: / 当两个轴增加到三个或更多时:

If you find yourself writing Handle<V, S, D, T> — vendor, state, debug level, transport — the generic parameter list is telling you something. Consider collapsing the vendor axis into an associated-type config trait (the Config Trait Pattern from earlier in this chapter), keeping only the state axis as a generic parameter: Handle<Cfg, S>. The config trait bundles type Vendor, type Transport, etc. into one parameter, and the state axis retains its compile-time transition guarantees. This is a natural evolution, not a rewrite — you lift vendor-related types into Cfg and leave the typestate machinery untouched.

如果你发现自己正在编写 Handle<V, S, D, T> —— 厂商 (V)、状态 (S)、调试级别 (D)、传输协议 (T) —— 泛型参数列表实际上在向你传递某种信号。可以考虑将 厂商相关 的轴收缩进一个使用关联类型的 Config Trait 中(即本章前面提到的 Config Trait 模式),仅保留 状态 轴作为泛型参数:Handle<Cfg, S>。Config Trait 将 type Vendortype Transport 等捆绑为单个参数,而状态轴则保留其编译时状态转换保证。这是一种自然的演进,而非重写 —— 你只需将与厂商相关的类型提升到 Cfg 中,而无需触动类型状态机制的底层核心。

Key Takeaway: / 核心要点:

The dual-axis pattern is the intersection of typestate and trait-based abstraction. Each impl block maps to one cell of the (vendor × state) matrix. The compiler enforces the entire matrix — no runtime state checks, no impossible-state panics, no cost.

双轴模式是类型状态 (Typestate) 与基于 Trait 的抽象之交集。每个 impl 块都映射到(厂商 × 状态)矩阵中的一个单元格。编译器强制执行整个矩阵 —— 没有运行时状态检查,没有不可能状态导致的恐慌 (Panic),且不增加任何成本。


Exercise: Type-Safe State Machine ★★ (~30 min) / 练习:类型状态机 ★★(约 30 分钟)

Build a traffic light state machine using the type-state pattern. The light must transition Red → Green → Yellow → Red and no other order should be possible.

使用类型状态 (Type-state) 模式构建一个交通信号灯状态机。信号灯必须按 红 (Red) → 绿 (Green) → 黄 (Yellow) → 红 (Red) 的顺序循环,且不能出现其他任何顺序。

🔑 Solution / 参考答案
use std::marker::PhantomData;

// --- States / 状态 ---
struct Red;
struct Green;
struct Yellow;

struct TrafficLight<State> {
    _state: PhantomData<State>,
}

impl TrafficLight<Red> {
    fn new() -> Self {
        println!("🔴 Red — STOP"); // 红灯 —— 停止
        TrafficLight { _state: PhantomData }
    }

    fn go(self) -> TrafficLight<Green> {
        println!("🟢 Green — GO"); // 绿灯 —— 通信
        TrafficLight { _state: PhantomData }
    }
}

impl TrafficLight<Green> {
    fn caution(self) -> TrafficLight<Yellow> {
        println!("🟡 Yellow — CAUTION"); // 黄灯 —— 警告
        TrafficLight { _state: PhantomData }
    }
}

impl TrafficLight<Yellow> {
    fn stop(self) -> TrafficLight<Red> {
        println!("🔴 Red — STOP"); // 红灯 —— 停止
        TrafficLight { _state: PhantomData }
    }
}

fn main() {
    let light = TrafficLight::new(); // Red / 红灯
    let light = light.go();          // Green / 绿灯
    let light = light.caution();     // Yellow / 黄灯
    let _light = light.stop();       // Red / 红灯

    // light.caution(); // ❌ Compile error: no method `caution` on Red
    // ❌ 编译错误:Red 类型上没有 `caution` 方法

    // TrafficLight::new().stop(); // ❌ Compile error: no method `stop` on Red
    // ❌ 编译错误:Red 类型上没有 `stop` 方法
}

Key takeaway: Invalid transitions are compile errors, not runtime panics.

核心要点:无效的状态转换在编译阶段就会报错,而不是在运行时产生恐慌。


4. PhantomData — Types That Carry No Data / 4. PhantomData —— 不携带数据的类型 🔶

What you’ll learn / 你将学到:

  • Why PhantomData<T> exists and the three problems it solves / 为什么 PhantomData<T> 存在以及它解决的三个问题
  • Lifetime branding for compile-time scope enforcement / 用于编译时作用域强制执行的生命周期烙印 (Lifetime Branding)
  • The unit-of-measure pattern for dimension-safe arithmetic / 用于维度安全算术的单位测量模式 (Unit-of-measure Pattern)
  • Variance (covariant, contravariant, invariant) and how PhantomData controls it / 型变(协变、逆变、不变性)以及 PhantomData 如何控制它

What PhantomData Solves / PhantomData 解决了什么

PhantomData<T> is a zero-sized type that tells the compiler “this struct is logically associated with T, even though it doesn’t contain a T.” It affects variance, drop checking, and auto-trait inference — without using any memory.

PhantomData<T> 是一种零大小的类型,它告诉编译器:“尽管该结构体不包含 T,但在逻辑上它与 T 相关联。”它会影响型变 (Variance)、析构检查 (Drop Checking) 和自动 trait 推导 (Auto-trait Inference) —— 且不占用任何内存。

#![allow(unused)]
fn main() {
use std::marker::PhantomData;

// Without PhantomData:
// 不使用 PhantomData:
struct Slice<'a, T> {
    ptr: *const T,
    len: usize,
    // Problem: compiler doesn't know this struct borrows from 'a
    // or that it's associated with T for drop-check purposes

    // 问题:编译器不知道这个结构体是从 'a 借用的,
    // 或者出于析构检查的目的,它与 T 有关。
}

// With PhantomData:
// 使用 PhantomData:
struct Slice<'a, T> {
    ptr: *const T,
    len: usize,
    _marker: PhantomData<&'a T>,
    // Now the compiler knows:
    // 1. This struct borrows data with lifetime 'a
    // 2. It's covariant over 'a (lifetimes can shrink)
    // 3. Drop check considers T

    // 现在编译器知道了:
    // 1. 该结构体借用了生命周期为 'a 的数据
    // 2. 它对 'a 是协变的(生命周期可以缩小)
    // 3. 析构检查会考虑 T
}
}

The three jobs of PhantomData / PhantomData 的三项职责:

Job / 职责Example / 示例What It Does / 它的作用
Lifetime binding / 生命周期绑定PhantomData<&'a T>Struct is treated as borrowing 'a / 结构体被视为借用了 'a
Ownership simulation / 所有权模拟PhantomData<T>Drop check assumes struct owns a T / 析构检查假设结构体拥有一个 T
Variance control / 型变控制PhantomData<fn(T)>Makes struct contravariant over T / 使结构体对 T 是逆变的

Lifetime Branding / 生命周期烙印

Use PhantomData to prevent mixing values from different “sessions” or “contexts”:

使用 PhantomData 来防止混用来自不同“会话 (Sessions)”或“上下文 (Contexts)”的值:

use std::marker::PhantomData;

/// A handle that's valid only within a specific arena's lifetime
/// 仅在特定 Arena(内存池)生命周期内有效的句柄
struct ArenaHandle<'arena> {
    index: usize,
    _brand: PhantomData<&'arena ()>,
}

struct Arena {
    data: Vec<String>,
}

impl Arena {
    fn new() -> Self {
        Arena { data: Vec::new() }
    }

    /// Allocate a string and return a branded handle
    /// 分配一个字符串并返回一个带烙印的句柄
    fn alloc<'a>(&'a mut self, value: String) -> ArenaHandle<'a> {
        let index = self.data.len();
        self.data.push(value);
        ArenaHandle { index, _brand: PhantomData }
    }

    /// Look up by handle — only accepts handles from THIS arena
    /// 按句柄查找 —— 仅接受来自该 Arena 的句柄
    fn get<'a>(&'a self, handle: ArenaHandle<'a>) -> &'a str {
        &self.data[handle.index]
    }
}

fn main() {
    let mut arena1 = Arena::new();
    let handle1 = arena1.alloc("hello".to_string());

    // Can't use handle1 with a different arena — lifetimes won't match
    // 不能在不同的 arena 中使用 handle1 —— 生命周期不匹配
    // let mut arena2 = Arena::new();
    // arena2.get(handle1); // ❌ Lifetime mismatch / 生命周期不匹配

    println!("{}", arena1.get(handle1)); // ✅
}

Unit-of-Measure Pattern / 单位测量模式

Prevent mixing incompatible units at compile time, with zero runtime cost:

通过零运行时成本,在编译时防止混用不兼容的单位:

use std::marker::PhantomData;
use std::ops::{Add, Mul};

// Unit marker types (zero-sized)
// 单位标记类型(零大小)
struct Meters;
struct Seconds;
struct MetersPerSecond;

#[derive(Debug, Clone, Copy)]
struct Quantity<Unit> {
    value: f64,
    _unit: PhantomData<Unit>,
}

impl<U> Quantity<U> {
    fn new(value: f64) -> Self {
        Quantity { value, _unit: PhantomData }
    }
}

// Can only add same units:
// 只能加和相同单位:
impl<U> Add for Quantity<U> {
    type Output = Quantity<U>;
    fn add(self, rhs: Self) -> Self::Output {
        Quantity::new(self.value + rhs.value)
    }
}

// Meters / Seconds = MetersPerSecond (custom trait)
// 米 / 秒 = 米每秒 (自定义 trait)
impl std::ops::Div<Quantity<Seconds>> for Quantity<Meters> {
    type Output = Quantity<MetersPerSecond>;
    fn div(self, rhs: Quantity<Seconds>) -> Quantity<MetersPerSecond> {
        Quantity::new(self.value / rhs.value)
    }
}

fn main() {
    let dist = Quantity::<Meters>::new(100.0);
    let time = Quantity::<Seconds>::new(9.58);
    let speed = dist / time; // Quantity<MetersPerSecond>
    println!("Speed: {:.2} m/s", speed.value); // 10.44 m/s

    // let nonsense = dist + time; // ❌ Compile error: can't add Meters + Seconds
    // ❌ 编译错误:不能将“米”和“秒”相加
}

This is pure type-system magicPhantomData<Meters> is zero-sized, so Quantity<Meters> has the same layout as f64. No wrapper overhead at runtime, but full unit safety at compile time.

这是纯粹的类型系统魔术 —— PhantomData<Meters> 是零大小的,因此 Quantity<Meters> 的内存布局与 f64 完全相同。在运行时没有包装开销,但在编译时具备完整的单位安全性。

PhantomData and Drop Check / PhantomData 与析构检查

When the compiler checks whether a struct’s destructor might access expired data, it uses PhantomData to decide:

当编译器检查结构体的析构函数 (Destructor) 是否可能访问过期数据时,它会使用 PhantomData 来做决定:

#![allow(unused)]
fn main() {
use std::marker::PhantomData;

// PhantomData<T> — compiler assumes we MIGHT drop a T
// This means T must outlive our struct
// PhantomData<T> —— 编译器假设我们 *可能* 会析构一个 T
// 这意味着 T 的生命周期必须长于我们的结构体
struct OwningSemantic<T> {
    ptr: *const T,
    _marker: PhantomData<T>,  // "I logically own a T" / “我在逻辑上拥有一个 T”
}

// PhantomData<*const T> — compiler assumes we DON'T own T
// More permissive — T doesn't need to outlive us
// PhantomData<*const T> —— 编译器假设我们 *不* 拥有 T
// 更宽松 —— T 不需要比我们活得更久
struct NonOwningSemantic<T> {
    ptr: *const T,
    _marker: PhantomData<*const T>,  // "I just point to T" / “我只是指向 T”
}
}

Practical rule / 实践规则: When wrapping raw pointers, choose PhantomData carefully:

在包装原始指针时,请谨慎选择 PhantomData:

  • Writing a container that owns its data? → PhantomData<T> / 在编写拥有其所有权的数据容器时? → 使用 PhantomData<T>
  • Writing a view/reference type? → PhantomData<&'a T> or PhantomData<*const T> / 在编写视图/引用类型时? → 使用 PhantomData<&'a T>PhantomData<*const T>

Variance — Why PhantomData’s Type Parameter Matters / 型变 —— 为什么 PhantomData 的类型参数很重要

Variance determines whether a generic type can be substituted with a sub- or super-type (in Rust, “subtype” means “has a longer lifetime”). Getting variance wrong causes either rejected-good-code or unsound-accepted-code.

型变 (Variance) 决定了泛型类型是否可以被其子类型或超类型替换(在 Rust 中,“子类型”意味着“具有更长的生命周期”)。搞错型变要么会导致正确的代码被拒绝,要么会导致不安全的代码被接受。

graph LR
    subgraph Covariant ["Covariant / 协变"]
        direction TB
        A1["&'long T"] -->|"can become / 可变为"| A2["&'short T"]
    end

    subgraph Contravariant ["Contravariant / 逆变"]
        direction TB
        B1["fn(&'short T)"] -->|"can become / 可变为"| B2["fn(&'long T)"]
    end

    subgraph Invariant ["Invariant / 不变性"]
        direction TB
        C1["&'a mut T"] ---|"NO substitution / 不允许替换"| C2["&'b mut T"]
    end

    style A1 fill:#d4efdf,stroke:#27ae60,color:#000
    style A2 fill:#d4efdf,stroke:#27ae60,color:#000
    style B1 fill:#e8daef,stroke:#8e44ad,color:#000
    style B2 fill:#e8daef,stroke:#8e44ad,color:#000
    style C1 fill:#fadbd8,stroke:#e74c3c,color:#000
    style C2 fill:#fadbd8,stroke:#e74c3c,color:#000

The Three Variances / 三种型变

Variance / 型变Meaning / 含义“Can I substitute…” / “是否可以替换……”Rust example / Rust 示例
Covariant / 协变Subtype flows through / 子类型关系得以保留'long where 'short expected ✅ / 在期望 'short 的地方使用 'long&'a T, Vec<T>, Box<T>
Contravariant / 逆变Subtype flows against / 子类型关系反转'short where 'long expected ✅ / 在期望 'long 的地方使用 'shortfn(T) (作为参数位置)
Invariant / 不变性No substitution allowed / 不允许任何替换Neither direction ❌ / 任何方向都不行&mut T, Cell<T>, UnsafeCell<T>

Why &'a T is Covariant Over 'a / 为什么 &'a T'a 是协变的

fn print_str(s: &str) {
    println!("{s}");
}

fn main() {
    let owned = String::from("hello");
    // owned lives for the entire function ('long)
    // print_str expects &'_ str ('short — just for the call)
    // owned 在整个函数中存活 ('long)
    // print_str 期望 &'_ str ('short —— 仅在调用期间存活)

    print_str(&owned); // ✅ Covariance: 'long → 'short is safe / 协变:'long → 'short 是安全的
    // A longer-lived reference can always be used where a shorter one is needed.
    // 在需要短生命周期引用的地方,总是可以使用长生命周期的引用。
}

Why &mut T is Invariant Over T / 为什么 &mut TT 是不变的

#![allow(unused)]
fn main() {
// If &mut T were covariant over T, this would compile:
// 如果 &mut T 对 T 是协变的,以下代码将能通过编译:
fn evil(s: &mut &'static str) {
    // We could write a shorter-lived &str into a &'static str slot!
    // 我们可能会将一个较短生命周期的 &str 写入一个 &'static str 槽位!
    let local = String::from("temporary");
    // *s = &local; // ← Would create a dangling &'static str / 这会产生一个悬空的 &'static str
}

// Invariance prevents this: &'static str ≠ &'a str when mutating.
// The compiler rejects the substitution entirely.

// “不变性”可以防止这种情况:在修改时,&'static str ≠ &'a str。
// 编译器会完全拒绝这种替换。
}

How PhantomData Controls Variance / PhantomData 如何控制型变

PhantomData<X> gives your struct the same variance as X:

PhantomData<X> 使你的结构体具有与 X 相同的型变能力

#![allow(unused)]
fn main() {
use std::marker::PhantomData;

// Covariant over 'a — a Ref<'long> can be used as Ref<'short>
// 对 'a 是协变的 —— Ref<'long> 可以被当作 Ref<'short> 使用
struct Ref<'a, T> {
    ptr: *const T,
    _marker: PhantomData<&'a T>,  // Covariant over 'a, covariant over T / 对 'a 协变,对 T 协变
}

// Invariant over T — prevents unsound lifetime shortening of T
// 对 T 是不变的 —— 防止对 T 的生命周期进行不安全的缩短
struct MutRef<'a, T> {
    ptr: *mut T,
    _marker: PhantomData<&'a mut T>,  // Covariant over 'a, INVARIANT over T / 对 'a 协变,对 T 不变
}

// Contravariant over T — useful for callback containers
// 对 T 是逆变的 —— 对回调容器很有用
struct CallbackSlot<T> {
    _marker: PhantomData<fn(T)>,  // Contravariant over T / 对 T 逆变
}
}

PhantomData variance cheat sheet / PhantomData 型变速查表:

PhantomData type / PhantomData 类型Variance over T / 针对 T 的型变Variance over 'a / 针对 'a 的型变Use when / 适用场景
PhantomData<T>Covariant / 协变You logically own a T / 逻辑上拥有一个 T
PhantomData<&'a T>Covariant / 协变Covariant / 协变You borrow a T with lifetime 'a / 以 ’a 生命周期借用 T
PhantomData<&'a mut T>Invariant / 不变Covariant / 协变You mutably borrow T / 可变借用 T
PhantomData<*const T>Covariant / 协变Non-owning pointer to T / 对 T 的非拥有型指针
PhantomData<*mut T>Invariant / 不变Non-owning mutable pointer / 非拥有型可变指针
PhantomData<fn(T)>Contravariant / 逆变T appears in argument position / T 出现在参数位置
PhantomData<fn() -> T>Covariant / 协变T appears in return position / T 出现在返回值位置
PhantomData<fn(T) -> T>Invariant / 不变T in both positions cancels out / T 在两个位置上相互抵消

Worked Example: Why This Matters in Practice / 实战案例:为什么这在实践中很重要

use std::marker::PhantomData;

// A token that brands values with a session lifetime.
// MUST be covariant over 'a — otherwise callers can't shorten
// the lifetime when passing to functions that need a shorter borrow.

// 一个带有会话生命周期烙印的令牌。
// 必须对 'a 是协变的 —— 否则在传递给需要较短借用的函数时,调用者无法缩短其生命周期。

struct SessionToken<'a> {
    id: u64,
    _brand: PhantomData<&'a ()>,  // ✅ Covariant — callers can shorten 'a / ✅ 协变 —— 调用者可以缩短 'a
    // _brand: PhantomData<fn(&'a ())>,  // ❌ Contravariant — breaks ergonomics / ❌ 逆变 —— 破坏易用性
}

fn use_token(token: &SessionToken<'_>) {
    println!("Using token {}", token.id);
}

fn main() {
    let token = SessionToken { id: 42, _brand: PhantomData };
    use_token(&token); // ✅ Works because SessionToken is covariant over 'a
                       // ✅ 因为 SessionToken 对 'a 是协变的,所以可以正常工作
}

Decision rule / 决策规则: Start with PhantomData<&'a T> (covariant). Switch to PhantomData<&'a mut T> (invariant) only if your abstraction hands out mutable access to T. Use PhantomData<fn(T)> (contravariant) almost never — it’s only correct for callback-storage scenarios.

决策规则:首先尝试使用 PhantomData<&'a T>(协变)。只有当你的抽象会分发对 T 的可变访问权限时,才切换到 PhantomData<&'a mut T>(不变性)。几乎永远不要使用 PhantomData<fn(T)>(逆变)—— 它仅在回调存储场景下才是正确的。

Key Takeaways — PhantomData / 核心要点 —— PhantomData

  • PhantomData<T> carries type/lifetime information without runtime cost / PhantomData<T> 在不产生运行时成本的情况下携带类型/生命周期信息
  • Use it for lifetime branding, variance control, and unit-of-measure patterns / 将其用于生命周期烙印、型变控制和单位测量模式
  • Drop check: PhantomData<T> tells the compiler your type logically owns a T / 析构检查:PhantomData<T> 告诉编译器你的类型在逻辑上拥有一个 T

See also / 另请参阅: Ch 3 — Newtype & Type-State for type-state patterns that use PhantomData. Ch 11 — Unsafe Rust for how PhantomData interacts with raw pointers.

参见 Ch 3 —— Newtype 与类型状态 了解使用 PhantomData 的类型状态模式。参见 Ch 11 —— 不安全 Rust 了解 PhantomData 如何与原始指针交互。


Exercise: Unit-of-Measure with PhantomData ★★ (~30 min) / 练习:使用 PhantomData 的单位测量模式 ★★(约 30 分钟)

Extend the unit-of-measure pattern to support:

  • Meters, Seconds, Kilograms
  • Addition of same units
  • Multiplication: Meters * Meters = SquareMeters
  • Division: Meters / Seconds = MetersPerSecond

扩展单位测量模式以支持:

  • Meters (米), Seconds (秒), Kilograms (千克)
  • 相同单位的加法
  • 乘法:Meters * Meters = SquareMeters (平方米)
  • 除法:Meters / Seconds = MetersPerSecond (米每秒)
🔑 Solution / 参考答案
use std::marker::PhantomData;
use std::ops::{Add, Mul, Div};

// --- Unit markers / 单位标记 ---
#[derive(Clone, Copy)]
struct Meters;
#[derive(Clone, Copy)]
struct Seconds;
#[derive(Clone, Copy)]
struct Kilograms;
#[derive(Clone, Copy)]
struct SquareMeters;
#[derive(Clone, Copy)]
struct MetersPerSecond;

#[derive(Debug, Clone, Copy)]
struct Qty<U> {
    value: f64,
    _unit: PhantomData<U>,
}

impl<U> Qty<U> {
    fn new(v: f64) -> Self { Qty { value: v, _unit: PhantomData } }
}

// Same units can be added
// 相同单位可以相加
impl<U> Add for Qty<U> {
    type Output = Qty<U>;
    fn add(self, rhs: Self) -> Self::Output { Qty::new(self.value + rhs.value) }
}

// Meters * Meters = SquareMeters
impl Mul<Qty<Meters>> for Qty<Meters> {
    type Output = Qty<SquareMeters>;
    fn mul(self, rhs: Qty<Meters>) -> Qty<SquareMeters> {
        Qty::new(self.value * rhs.value)
    }
}

// Meters / Seconds = MetersPerSecond
impl Div<Qty<Seconds>> for Qty<Meters> {
    type Output = Qty<MetersPerSecond>;
    fn div(self, rhs: Qty<Seconds>) -> Qty<MetersPerSecond> {
        Qty::new(self.value / rhs.value)
    }
}

fn main() {
    let width = Qty::<Meters>::new(5.0);
    let height = Qty::<Meters>::new(3.0);
    let area = width * height; // Qty<SquareMeters>
    println!("Area: {:.1} m²", area.value);

    let dist = Qty::<Meters>::new(100.0);
    let time = Qty::<Seconds>::new(9.58);
    let speed = dist / time;
    println!("Speed: {:.2} m/s", speed.value);

    let sum = width + height; // Same unit ✅ / 相同单位 ✅
    println!("Sum: {:.1} m", sum.value);

    // let bad = width + time; // ❌ Compile error: can't add Meters + Seconds
    // ❌ 编译错误:无法将“米”和“秒”相加
}

5. Channels and Message Passing / 5. 通道与消息传递 🟢

What you’ll learn / 你将学到:

  • std::sync::mpsc basics and when to upgrade to crossbeam-channel / std::sync::mpsc 的基础知识以及何时升级到 crossbeam-channel
  • Channel selection with select! for multi-source message handling / 使用 select! 进行多源消息处理的通道选择
  • Bounded vs unbounded channels and backpressure strategies / 有界与无界通道以及背压 (Backpressure) 策略
  • The actor pattern for encapsulating concurrent state / 用于封装并发状态的 Actor 模式

std::sync::mpsc — The Standard Channel / std::sync::mpsc —— 标准通道

Rust’s standard library provides a multi-producer, single-consumer channel:

Rust 的标准库提供了一个多生产者、单消费者 (MPSC) 的通道:

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    // Create a channel: tx (transmitter) and rx (receiver)
    // 创建一个通道:tx(发送端)和 rx(接收端)
    let (tx, rx) = mpsc::channel();

    // Spawn a producer thread
    // 启动一个生产者线程
    let tx1 = tx.clone(); // Clone for multiple producers / 为多个生产者克隆发送端
    thread::spawn(move || {
        for i in 0..5 {
            tx1.send(format!("producer-1: msg {i}")).unwrap();
            thread::sleep(Duration::from_millis(100));
        }
    });

    // Second producer
    // 第二个生产者
    thread::spawn(move || {
        for i in 0..5 {
            tx.send(format!("producer-2: msg {i}")).unwrap();
            thread::sleep(Duration::from_millis(150));
        }
    });

    // Consumer: receive all messages
    // 消费者:接收所有消息
    for msg in rx {
        // rx iterator ends when ALL senders are dropped
        // rx 迭代器在所有发送端都被丢弃 (Drop) 时结束
        println!("Received: {msg}");
    }
    println!("All producers done.");
}

Note: .unwrap() on .send() is used for brevity. It panics if the receiver has been dropped. Production code should handle SendError gracefully.

注意:在 .send() 上使用 .unwrap() 是为了简洁。如果接收端已被丢弃 (Drop),它会产生恐慌。生产代码应当优雅地处理 SendError

Key properties / 关键特性:

  • Unbounded / 无界 by default (can fill memory if consumer is slow) / 默认是无界的(如果消费者过慢,可能会填满内存)
  • mpsc::sync_channel(N) creates a bounded / 有界 channel with backpressure / mpsc::sync_channel(N) 创建一个带有背压的有界通道
  • rx.recv() blocks the current thread until a message arrives / rx.recv() 阻塞当前线程直到消息到达
  • rx.try_recv() returns immediately with Err(TryRecvError::Empty) if nothing is ready / rx.try_recv() 在没有就绪消息时立即返回 Err(TryRecvError::Empty)
  • The channel closes when all Senders are dropped / 当所有发送端 (Sender) 都被丢弃时,通道关闭
#![allow(unused)]
fn main() {
// Bounded channel with backpressure:
// 带有背压的有界通道:
let (tx, rx) = mpsc::sync_channel(10); // Buffer of 10 messages / 缓冲区容量为 10 条消息

thread::spawn(move || {
    for i in 0..1000 {
        tx.send(i).unwrap(); // BLOCKS if buffer is full — natural backpressure
                             // 如果缓冲区已满,则阻塞 —— 产生自然的背压
    }
});
}

Note: .unwrap() is used for brevity. In production, handle SendError (receiver dropped) instead of panicking.

注意:此处使用 .unwrap() 是为了简洁。在生产环境中,请处理 SendError(接收端已丢弃)而不是直接产生恐慌。

crossbeam-channel — The Production Workhorse / crossbeam-channel —— 生产环境的主力军

crossbeam-channel is the de facto standard for production channel usage. It’s faster than std::sync::mpsc and supports multi-consumer (mpmc):

crossbeam-channel 是生产环境中通道使用的事实标准。它比 std::sync::mpsc 更快,并且支持多消费者模式 (MPMC):

// Cargo.toml:
//   [dependencies]
//   crossbeam-channel = "0.5"
use crossbeam_channel::{bounded, unbounded, select, Sender, Receiver};
use std::thread;
use std::time::Duration;

fn main() {
    // Bounded MPMC channel
    // 有界 MPMC 通道
    let (tx, rx) = bounded::<String>(100);

    // Multiple producers
    // 多个生产者
    for id in 0..4 {
        let tx = tx.clone();
        thread::spawn(move || {
            for i in 0..10 {
                tx.send(format!("worker-{id}: item-{i}")).unwrap();
            }
        });
    }
    drop(tx); // Drop the original sender so the channel can close
              // 丢弃原始发送端,以便通道能够关闭

    // Multiple consumers (not possible with std::sync::mpsc!)
    // 多个消费者(这在 std::sync::mpsc 中是不可能的!)
    let rx2 = rx.clone();
    let consumer1 = thread::spawn(move || {
        while let Ok(msg) = rx.recv() {
            println!("[consumer-1] {msg}");
        }
    });
    let consumer2 = thread::spawn(move || {
        while let Ok(msg) = rx2.recv() {
            println!("[consumer-2] {msg}");
        }
    });

    consumer1.join().unwrap();
    consumer2.join().unwrap();
}

Channel Selection (select!) / 通道选择 (select!)

Listen on multiple channels simultaneously — like select in Go:

同时监听多个通道 —— 类似于 Go 语言中的 select

use crossbeam_channel::{bounded, tick, after, select};
use std::time::Duration;

fn main() {
    let (work_tx, work_rx) = bounded::<String>(10);
    let ticker = tick(Duration::from_secs(1));        // Periodic tick / 周期性 Tick
    let deadline = after(Duration::from_secs(10));     // One-shot timeout / 一次性超时

    // Producer
    // 生产者
    let tx = work_tx.clone();
    std::thread::spawn(move || {
        for i in 0..100 {
            tx.send(format!("job-{i}")).unwrap();
            std::thread::sleep(Duration::from_millis(500));
        }
    });
    drop(work_tx);

    loop {
        select! {
            recv(work_rx) -> msg => {
                match msg {
                    Ok(job) => println!("Processing: {job}"), // 正在处理
                    Err(_) => {
                        println!("Work channel closed"); // 工作通道已关闭
                        break;
                    }
                }
            },
            recv(ticker) -> _ => {
                println!("Tick — heartbeat"); // Tick —— 心跳
            },
            recv(deadline) -> _ => {
                println!("Deadline reached — shutting down"); // 截止时间已到 —— 正在关闭
                break;
            },
        }
    }
}

Go comparison / 与 Go 的对比: This is exactly like Go’s select statement over channels. crossbeam’s select! macro randomizes order to prevent starvation, just like Go.

这与 Go 语言在通道上的 select 语句完全相同。crossbeam 的 select! 宏会自动随机化执行顺序以此来防止饥饿 (Starvation) 现象,这同样与 Go 的行为一致。

Bounded vs Unbounded and Backpressure / 有界与无界及背压

Type / 类型Behavior When Full / 满载时的行为Memory / 内存Use Case / 使用场景
Unbounded / 无界Never blocks (grows heap) / 永不阻塞(堆增长)Unbounded ⚠️ / 无限制 ⚠️Rare — only when producer is slower than consumer / 罕见 —— 仅当生产者慢于消费者时
Bounded / 有界send() blocks until space / 阻塞直到有空间Fixed / 固定Production default — prevents OOM / 生产环境默认 —— 防止内存溢出 (OOM)
Rendezvous / 交汇 (bounded(0))send() blocks until receiver is ready / 阻塞直到接收者准备就绪None / 无Synchronization / handoff / 同步或移交
#![allow(unused)]
fn main() {
// Rendezvous channel — zero capacity, direct handoff
// 交汇通道 —— 零容量,直接移交
let (tx, rx) = crossbeam_channel::bounded(0);
// tx.send(x) blocks until rx.recv() is called, and vice versa.
// This synchronizes the two threads precisely.

// tx.send(x) 会阻塞直到 rx.recv() 被调用,反之亦然。
// 这可以在两个线程之间实现精确同步。
}

Rule / 规则: Always use bounded channels in production unless you can prove the producer will never outpace the consumer.

规则:除非你能证明生产者永远不会超过消费者的处理速度,否则请在生产环境中始终使用有界通道。

Actor Pattern with Channels / 使用通道的 Actor 模式

The actor pattern uses channels to serialize access to mutable state — no mutexes needed:

Actor 模式利用通道来序列化对可变状态的访问 —— 无需使用互斥锁 (Mutex):

use std::sync::mpsc;
use std::thread;

// Messages the actor can receive
// Actor 可以接收的消息
enum CounterMsg {
    Increment,
    Decrement,
    Get(mpsc::Sender<i64>), // Reply channel / 用于回复的通道
}

struct CounterActor {
    count: i64,
    rx: mpsc::Receiver<CounterMsg>,
}

impl CounterActor {
    fn new(rx: mpsc::Receiver<CounterMsg>) -> Self {
        CounterActor { count: 0, rx }
    }

    fn run(mut self) {
        while let Ok(msg) = self.rx.recv() {
            match msg {
                CounterMsg::Increment => self.count += 1,
                CounterMsg::Decrement => self.count -= 1,
                CounterMsg::Get(reply) => {
                    let _ = reply.send(self.count);
                }
            }
        }
    }
}

// Actor handle — cheap to clone, Send + Sync
// Actor 句柄 —— 克隆成本低,支持 Send + Sync
#[derive(Clone)]
struct Counter {
    tx: mpsc::Sender<CounterMsg>,
}

impl Counter {
    fn spawn() -> Self {
        let (tx, rx) = mpsc::channel();
        thread::spawn(move || CounterActor::new(rx).run());
        Counter { tx }
    }

    fn increment(&self) { let _ = self.tx.send(CounterMsg::Increment); }
    fn decrement(&self) { let _ = self.tx.send(CounterMsg::Decrement); }

    fn get(&self) -> i64 {
        let (reply_tx, reply_rx) = mpsc::channel();
        self.tx.send(CounterMsg::Get(reply_tx)).unwrap();
        reply_rx.recv().unwrap()
    }
}

fn main() {
    let counter = Counter::spawn();

    // Multiple threads can safely use the counter — no mutex!
    // 多个线程可以安全地使用计数器 —— 无需互斥锁!
    let handles: Vec<_> = (0..10).map(|_| {
        let counter = counter.clone();
        thread::spawn(move || {
            for _ in 0..1000 {
                counter.increment();
            }
        })
    }).collect();

    for h in handles { h.join().unwrap(); }
    println!("Final count: {}", counter.get()); // 10000
}

When to use actors vs mutexes / 何时使用 Actor 而非互斥锁: Actors are great when the state has complex invariants, operations take a long time, or you want to serialize access without thinking about lock ordering. Mutexes are simpler for short critical sections.

何时使用 Actor 而非互斥锁:当状态具有复杂的不可变式 (Invariants)、操作耗时较长、或者你希望无需考虑锁顺序 (Lock Ordering) 就实现序列化访问时,Actor 是绝佳选择。而对于简短的临界区 (Critical Sections),互斥锁则更为简单。

Key Takeaways — Channels / 核心要点 —— 通道

  • crossbeam-channel is the production workhorse — faster and more feature-rich than std::sync::mpsc / crossbeam-channel 是生产环境中的主力军 —— 比 std::sync::mpsc 更快且功能更丰富
  • select! replaces complex multi-source polling with declarative channel selection / select! 通过声明式的通道选择,取代了复杂的多个源的轮询
  • Bounded channels provide natural backpressure; unbounded channels risk OOM / 有界通道提供自然的背压;无界通道则存在内存溢出 (OOM) 的风险

See also / 另请参阅: Ch 6 — Concurrency for threads, Mutex, and shared state. Ch 16 — Async/Await Essentials for async channels (tokio::sync::mpsc).

参见 Ch 6 —— 并发 了解线程、互斥锁和共享状态。参见 Ch 16 —— Async/Await 核心要点 了解异步通道 (tokio::sync::mpsc)。


Exercise: Channel-Based Worker Pool ★★★ (~45 min) / 练习:基于通道的工作池 ★★★(约 45 分钟)

Build a worker pool using channels where:

  • A dispatcher sends Job structs through a channel
  • N workers consume jobs and send results back
  • Use std::sync::mpsc with Arc<Mutex<Receiver>> for work-stealing

构建一个使用通道的工作池,其中:

  • 调度器 (Dispatcher) 通过通道发送 Job 结构体
  • N 个工作者 (Workers) 消费这些任务并将结果发回
  • 使用 std::sync::mpsc 配合 Arc<Mutex<Receiver>> 实现任务窃取 (Work-stealing) 机制
🔑 Solution / 参考答案
use std::sync::mpsc;
use std::thread;

struct Job {
    id: u64,
    data: String,
}

struct JobResult {
    job_id: u64,
    output: String,
    worker_id: usize,
}

fn worker_pool(jobs: Vec<Job>, num_workers: usize) -> Vec<JobResult> {
    let (job_tx, job_rx) = mpsc::channel::<Job>();
    let (result_tx, result_rx) = mpsc::channel::<JobResult>();

    // Arc<Mutex<_>> allows sharing the single Receiver among all workers
    // Arc<Mutex<_>> 允许在所有工作者之间共享单个接收端
    let job_rx = std::sync::Arc::new(std::sync::Mutex::new(job_rx));

    let mut handles = Vec::new();
    for worker_id in 0..num_workers {
        let job_rx = job_rx.clone();
        let result_tx = result_tx.clone();
        handles.push(thread::spawn(move || {
            loop {
                // Workers compete for the lock to receive a job
                // 工作者通过竞争锁来接收任务
                let job = {
                    let rx = job_rx.lock().unwrap();
                    rx.recv()
                };
                match job {
                    Ok(job) => {
                        let output = format!("processed '{}' by worker {worker_id}", job.data);
                        result_tx.send(JobResult {
                            job_id: job.id, output, worker_id,
                        }).unwrap();
                    }
                    Err(_) => break, // Channel closed / 通道已关闭
                }
            }
        }));
    }
    // Very important: drop the result_tx in the dispatcher thread
    // Otherwise result_rx.into_iter() will never end!
    
    // 非常重要:在调度器线程中丢弃 result_tx
    // 否则 result_rx.into_iter() 永远不会结束!
    drop(result_tx);

    let num_jobs = jobs.len();
    for job in jobs {
        job_tx.send(job).unwrap();
    }
    drop(job_tx); // Signals workers to exit when done / 通知工作者完成后退出

    let results: Vec<_> = result_rx.into_iter().collect();
    assert_eq!(results.len(), num_jobs);

    for h in handles { h.join().unwrap(); }
    results
}

fn main() {
    let jobs: Vec<Job> = (0..20).map(|i| Job {
        id: i, data: format!("task-{i}"),
    }).collect();

    let results = worker_pool(jobs, 4);
    for r in &results {
        println!("[worker {}] job {}: {}", r.worker_id, r.job_id, r.output);
    }
}

6. Concurrency vs Parallelism vs Threads / 6. 并发、并行与线程 🟡

What you’ll learn / 你将学到:

  • The precise distinction between concurrency and parallelism / 并发与并行之间的精确区别
  • OS threads, scoped threads, and rayon for data parallelism / OS 线程、作用域线程以及用于数据并行的 rayon
  • Shared state primitives: Arc, Mutex, RwLock, Atomics, Condvar / 共享状态原语:Arc、Mutex、RwLock、原子操作、Condvar
  • Lazy initialization with OnceLock/LazyLock and lock-free patterns / 使用 OnceLock/LazyLock 进行延迟初始化以及无锁模式

Terminology: Concurrency ≠ Parallelism / 术语:并发 ≠ 并行

These terms are often confused. Here is the precise distinction:

这两个术语经常被混淆。以下是它们的精确区别:

Concurrency / 并发Parallelism / 并行
Definition / 定义Managing multiple tasks that can make progress / 管理多个可以取得进展的任务Executing multiple tasks simultaneously / 同时执行多个任务
Hardware requirement / 硬件要求One core is enough / 单核即可Requires multiple cores / 需要多核
Analogy / 类比One cook, multiple dishes (switching between them) / 一名厨师,多道菜(在它们之间切换)Multiple cooks, each working on a dish / 多名厨师,每人负责一道菜
Rust tools / Rust 工具async/await, channels, select!rayon, thread::spawn, par_iter()
Concurrency (single core):           Parallelism (multi-core):
并发 (单核):                          并行 (多核):
                                      
Task A: ██░░██░░██                   Task A: ██████████
Task B: ░░██░░██░░                   Task B: ██████████
─────────────────→ time              ─────────────────→ time
(interleaved on one core)           (simultaneous on two cores)
(单核上交错执行)                      (双核上同时执行)

std::thread — OS Threads / std::thread —— OS 线程

Rust threads map 1:1 to OS threads. Each gets its own stack (typically 2-8 MB):

Rust 线程与操作系统线程是一一对应的。每个线程都有自己的栈(通常为 2-8 MB):

use std::thread;
use std::time::Duration;

fn main() {
    // Spawn a thread — takes a closure
    // 派生一个线程 —— 接收一个闭包
    let handle = thread::spawn(|| {
        for i in 0..5 {
            println!("spawned thread: {i}");
            thread::sleep(Duration::from_millis(100));
        }
        42 // Return value / 返回值
    });

    // Do work on the main thread simultaneously
    // 同时在主线程上执行工作
    for i in 0..3 {
        println!("main thread: {i}");
        thread::sleep(Duration::from_millis(150));
    }

    // Wait for the thread to finish and get its return value
    // 等待线程结束并获取其返回值
    let result = handle.join().unwrap(); // unwrap panics if thread panicked
                                         // 如果线程发生恐慌,unwrap 也会产生恐慌
    println!("Thread returned: {result}");
}

Thread::spawn type requirements / Thread::spawn 的类型要求:

#![allow(unused)]
fn main() {
// The closure must be:
// 闭包必须满足:
// 1. Send — can be transferred to another thread / Send —— 可以转移到另一个线程
// 2. 'static — can't borrow from the calling scope / 'static —— 不能从调用作用域借用
// 3. FnOnce — takes ownership of captured variables / FnOnce —— 获取所捕获变量的所有权

let data = vec![1, 2, 3];

// ❌ Borrows data — not 'static
// ❌ 借用 data —— 不是 'static 的
// thread::spawn(|| println!("{data:?}"));

// ✅ Move ownership into the thread
// ✅ 将所有权移动(Move)到线程中
thread::spawn(move || println!("{data:?}"));
// data is no longer accessible here
// 此处 data 已不再可用
}

Scoped Threads (std::thread::scope) / 作用域线程 (std::thread::scope)

Since Rust 1.63, scoped threads solve the 'static requirement — threads can borrow from the parent scope:

从 Rust 1.63 开始,作用域线程 (Scoped Threads) 解决了 'static 的限制 —— 线程现在可以从父级作用域中借用变量:

use std::thread;

fn main() {
    let mut data = vec![1, 2, 3, 4, 5];

    thread::scope(|s| {
        // Thread 1: borrow shared reference
        // 线程 1:借用不可变引用
        s.spawn(|| {
            let sum: i32 = data.iter().sum();
            println!("Sum: {sum}");
        });

        // Thread 2: also borrow shared reference (multiple readers OK)
        // 线程 2:也借用不可变引用(多个读取者是没问题的)
        s.spawn(|| {
            let max = data.iter().max().unwrap();
            println!("Max: {max}");
        });

        // ❌ Can't mutably borrow while shared borrows exist:
        // ❌ 当存在不可变借用时,无法进行可变借用:
        // s.spawn(|| data.push(6));
    });
    // ALL scoped threads joined here — guaranteed before scope returns
    // 所有的作用域线程都在此处汇合 (Joined) —— 保证在作用域返回前完成

    // Now safe to mutate — all threads have finished
    // 现在修改操作是安全的 —— 所有线程都已运行结束
    data.push(6);
    println!("Updated: {data:?}");
}

This is huge: Before scoped threads, you had to Arc::clone() everything to share with threads. Now you can borrow directly, and the compiler proves all threads finish before the data goes out of scope.

这是重大的改进:在有作用域线程之前,你必须通过 Arc::clone() 每一个变量来实现在线程间的共享。现在你可以直接借用,并且编译器会证明所有子线程都在数据超出作用域之前已经结束了。

rayon — Data Parallelism / rayon —— 数据并行

rayon provides parallel iterators that distribute work across a thread pool automatically:

rayon 提供了并行迭代器,能够自动将工作分配到线程池中执行:

// Cargo.toml: rayon = "1"
use rayon::prelude::*;

fn main() {
    let data: Vec<u64> = (0..1_000_000).collect();

    // Sequential:
    // 串行:
    let sum_seq: u64 = data.iter().map(|x| x * x).sum();

    // Parallel — just change .iter() to .par_iter():
    // 并行 —— 只需将 .iter() 改为 .par_iter():
    let sum_par: u64 = data.par_iter().map(|x| x * x).sum();

    assert_eq!(sum_seq, sum_par);

    // Parallel sort:
    // 并行排序:
    let mut numbers = vec![5, 2, 8, 1, 9, 3];
    numbers.par_sort();

    // Parallel processing with map/filter/collect:
    // 使用 map/filter/collect 进行并行处理:
    let results: Vec<_> = data
        .par_iter()
        .filter(|&&x| x % 2 == 0)
        .map(|&x| expensive_computation(x))
        .collect();
}

fn expensive_computation(x: u64) -> u64 {
    // Simulate CPU-heavy work
    // 模拟高负荷 CPU 工作
    (0..1000).fold(x, |acc, _| acc.wrapping_mul(7).wrapping_add(13))
}

When to use rayon vs threads / 何时使用 rayon 与线程:

Use / 使用方式When / 适用场景
rayon::par_iter()Processing collections in parallel (map, filter, reduce) / 并行处理集合(如 map、filter、reduce 等)
thread::spawnLong-running background tasks, I/O workers / 长时间运行的后台任务、I/O 工作者
thread::scopeShort-lived parallel tasks that borrow local data / 需要借用局部数据的短生命周期并行任务
async + tokioI/O-bound concurrency (networking, file I/O) / I/O 密集型并发(网络访问、文件 I/O)

Shared State: Arc, Mutex, RwLock, Atomics / 共享状态:Arc、Mutex、RwLock、原子操作

When threads need shared mutable state, Rust provides safe abstractions:

当多个线程需要共享可变状态时,Rust 提供了安全的原语:

Note: .unwrap() on .lock(), .read(), and .write() is used for brevity throughout these examples. These calls fail only if another thread panicked while holding the lock (“poisoning”). Production code should decide whether to recover from poisoned locks or propagate the error.

注意:在这些示例中,为了简洁,对 .lock().read().write() 使用了 .unwrap()。只有当另一个线程在持有锁时发生恐慌(即“锁中毒”,poisoning),这些调用才会失败。生产环境的代码应该决定是从中毒的锁中恢复还是继续传播错误。

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex, RwLock};
use std::sync::atomic::{AtomicU64, Ordering};
use std::thread;

// --- Arc<Mutex<T>>: Shared + Exclusive access ---
// --- Arc<Mutex<T>>: 共享 + 排他性访问 ---
fn mutex_example() {
    let counter = Arc::new(Mutex::new(0u64));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                let mut guard = counter.lock().unwrap();
                *guard += 1;
            } // Guard dropped → lock released / Guard 被丢弃 → 锁被释放
        }));
    }

    for h in handles { h.join().unwrap(); }
    println!("Counter: {}", counter.lock().unwrap()); // 10000
}

// --- Arc<RwLock<T>>: Multiple readers OR one writer ---
// --- Arc<RwLock<T>>: 多个读取者 或 一个写入者 ---
fn rwlock_example() {
    let config = Arc::new(RwLock::new(String::from("initial")));

    // Many readers — don't block each other
    // 许多读取者 —— 彼此不阻塞
    let readers: Vec<_> = (0..5).map(|id| {
        let config = Arc::clone(&config);
        thread::spawn(move || {
            let guard = config.read().unwrap();
            println!("Reader {id}: {guard}");
        })
    }).collect();

    // Writer — blocks and waits for all readers to finish
    // 写入者 —— 阻塞并等待所有读取者完成
    {
        let mut guard = config.write().unwrap();
        *guard = "updated".to_string();
    }

    for r in readers { r.join().unwrap(); }
}

// --- Atomics: Lock-free for simple values ---
// --- 原子操作:针对简单值的无锁操作 ---
fn atomic_example() {
    let counter = Arc::new(AtomicU64::new(0));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                counter.fetch_add(1, Ordering::Relaxed);
                // No lock, no mutex — hardware atomic instruction
                // 无锁,无互斥锁 —— 使用硬件原子指令
            }
        }));
    }

    for h in handles { h.join().unwrap(); }
    println!("Atomic counter: {}", counter.load(Ordering::Relaxed)); // 10000
}
}

Quick Comparison / 快速对比

Primitive / 原语Use Case / 使用场景Cost / 成本Contention / 争用情况
Mutex<T>Short critical sections / 简短的临界区Lock + unlock / 加锁 + 解锁Threads wait in line / 线程排队等待
RwLock<T>Read-heavy, rare writes / 读多写少Reader-writer lock / 读写锁Readers concurrent, writer exclusive / 读取者并发,写入者排他
AtomicU64 etc.Counters, flags / 计数器、标志位Hardware CAS / 硬件级 CASLock-free — no waiting / 无锁 —— 无需等待
ChannelsMessage passing / 消息传递Queue ops / 队列操作Producer/consumer decouple / 生产者/消费者解耦

Condition Variables (Condvar) / 条件变量 (Condvar)

A Condvar lets a thread wait until another thread signals that a condition is true, without busy-looping. It is always paired with a Mutex:

Condvar 允许线程在不进行忙轮询 (Busy-looping) 的情况下 等待,直到另一个线程发出信号表明某个条件为真。它总是与一个 Mutex 配对使用:

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex, Condvar};
use std::thread;

let pair = Arc::new((Mutex::new(false), Condvar::new()));
let pair2 = Arc::clone(&pair);

// Spawned thread: wait until ready == true
// 派生出的线程:等待直到 ready == true
let handle = thread::spawn(move || {
    let (lock, cvar) = &*pair2;
    let mut ready = lock.lock().unwrap();
    while !*ready {
        // atomically unlocks + sleeps
        // 原子地解锁并进入休眠
        ready = cvar.wait(ready).unwrap();
    }
    println!("Worker: condition met, proceeding"); // 工作者:条件满足,正在继续
});

// Main thread: set ready = true, then signal
// 主线程:设置 ready = true,然后发出信号
{
    let (lock, cvar) = &*pair;
    let mut ready = lock.lock().unwrap();
    *ready = true;
    cvar.notify_one(); // wake one waiting thread (use notify_all for many)
                       // 唤醒一个等待中的线程(若有多个,使用 notify_all)
}
handle.join().unwrap();
}

Pattern / 模式: Always re-check the condition in a while loop after wait() returns — spurious wakeups are allowed by the OS.

模式:在 wait() 返回后,务必在一个 while 循环中重新检查条件 —— 因为操作系统允许发生“虚假唤醒 (Spurious Wakeups)”。

Lazy Initialization: OnceLock and LazyLock / 延迟初始化:OnceLock 与 LazyLock

Before Rust 1.80, initializing a global static that requires runtime computation (e.g., parsing a config, compiling a regex) needed the lazy_static! macro or the once_cell crate. The standard library now provides two types that cover these use cases natively:

在 Rust 1.80 之前,初始化一个需要运行时计算的全局静态变量(例如:解析配置文件、编译正则表达式)通常需要 lazy_static! 宏或者 once_cell crate。现在,标准库原生提供了两种类型来涵盖这些用例:

#![allow(unused)]
fn main() {
use std::sync::{OnceLock, LazyLock};
use std::collections::HashMap;

// OnceLock — initialize on first use via `get_or_init`.
// Useful when the init value depends on runtime arguments.

// OnceLock —— 通过 `get_or_init` 在首次使用时进行初始化
// 当初始化值依赖于运行时参数时非常有用

static CONFIG: OnceLock<HashMap<String, String>> = OnceLock::new();

fn get_config() -> &'static HashMap<String, String> {
    CONFIG.get_or_init(|| {
        // Expensive: read & parse config file — happens exactly once.
        // 耗时操作:读取并解析配置文件 —— 此操作只会发生一次
        let mut m = HashMap::new();
        m.insert("log_level".into(), "info".into());
        m
    })
}

// LazyLock — initialize on first access, closure provided at definition site.
// Equivalent to lazy_static! but without a macro.

// LazyLock —— 在首次访问时进行初始化,闭包在定义时提供
// 相当于 `lazy_static!` 但无需使用宏

static REGEX: LazyLock<regex::Regex> = LazyLock::new(|| {
    regex::Regex::new(r"^[a-zA-Z0-9_]+$").unwrap()
});

fn is_valid_identifier(s: &str) -> bool {
    REGEX.is_match(s) // First call compiles the regex; subsequent calls reuse it.
                      // 首次调用会编译正则;后续调用则复用结果
}
}
Type / 类型Stabilized / 稳定版本Init Timing / 初始化时机Use When / 适用场景
OnceLock<T>Rust 1.70Call-site (get_or_init) / 调用处Init depends on runtime args / 初始化依赖于运行时参数
LazyLock<T>Rust 1.80Definition-site (closure) / 定义处Init is self-contained / 初始化是自包含的
lazy_static!Definition-site (macro) / 定义处Pre-1.80 codebases (migrate away) / 早期代码仓库(建议迁移)
const fn + staticAlways / 始终支持Compile-time / 编译时Value is computable at compile time / 值在编译时即可算出

Migration tip / 迁移建议: Replace lazy_static! { static ref X: T = expr; } with static X: LazyLock<T> = LazyLock::new(|| expr); — same semantics, no macro, no external dependency.

迁移建议:将 lazy_static! { static ref X: T = expr; } 替换为 static X: LazyLock<T> = LazyLock::new(|| expr); —— 两者语义相同,但无需使用宏及其外部依赖。

Lock-Free Patterns / 无锁模式

For high-performance code, avoid locks entirely:

对于高性能代码,可以尝试完全避开锁:

#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;

// Pattern 1: Spin lock (educational — prefer std::sync::Mutex)
// 模式 1:自旋锁(教学用途 —— 生产中请优先使用 std::sync::Mutex)

// ⚠️ WARNING: This is a teaching example only. Real spinlocks need:
//   - A RAII guard (so a panic while holding doesn't deadlock forever)
//   - Fairness guarantees (this starves under contention)
//   - Backoff strategies (exponential backoff, yield to OS)
// Use std::sync::Mutex or parking_lot::Mutex in production.

// ⚠️ 警告:这仅仅是一个教学示例。真实的自旋锁需要:
//   - RAII 卫哨 (Guard)(这样在持有锁时发生恐慌就不会导致永久死锁)
//   - 公平性保证(本示例在存在争用时会导致饥饿)
//   - 退避策略 (Backoff)(指数退避、让出 OS 时间片等)
// 在生产环境中,请使用 std::sync::Mutex 或 parking_lot::Mutex。

struct SpinLock {
    locked: AtomicBool,
}

impl SpinLock {
    fn new() -> Self { SpinLock { locked: AtomicBool::new(false) } }

    fn lock(&self) {
        while self.locked
            .compare_exchange_weak(false, true, Ordering::Acquire, Ordering::Relaxed)
            .is_err()
        {
            std::hint::spin_loop(); // CPU hint: we're spinning / CPU 提示:我们正在自旋
        }
    }

    fn unlock(&self) {
        self.locked.store(false, Ordering::Release);
    }
}

// Pattern 2: Lock-free SPSC (single producer, single consumer)
// Use crossbeam::queue::ArrayQueue or similar in production
// roll-your-own only for learning.

// 模式 2:无锁 SPSC(单生产者、单消费者)
// 生产环境中请使用 crossbeam::queue::ArrayQueue 或类似的库
// 自行实现仅用于学习。

// Pattern 3: Sequence counter for wait-free reads
// ⚠️ Best for single-machine-word types (u64, f64); wider T may tear on read.

// 模式 3:用于无等待读取的序列计数器(Sequence Counter)
// ⚠️ 最适用于单机器字类型(如 u64, f64);更宽的类型 T 在读取时可能会发生撕裂 (Tearing)。

struct SeqLock<T: Copy> {
    seq: AtomicUsize,
    data: std::cell::UnsafeCell<T>,
}

unsafe impl<T: Copy + Send> Sync for SeqLock<T> {}

impl<T: Copy> SeqLock<T> {
    fn new(val: T) -> Self {
        SeqLock {
            seq: AtomicUsize::new(0),
            data: std::cell::UnsafeCell::new(val),
        }
    }

    fn read(&self) -> T {
        loop {
            let s1 = self.seq.load(Ordering::Acquire);
            if s1 & 1 != 0 { continue; } // Writer in progress, retry / 写入正在进行中,重试

            // SAFETY: We use ptr::read_volatile to prevent the compiler from
            // reordering or caching the read. The SeqLock protocol (checking
            // s1 == s2 after reading) ensures we retry if a writer was active.
            // This mirrors the C SeqLock pattern where the data read must use
            // volatile/relaxed semantics to avoid tearing under concurrency.
            // 安全性:我们使用 ptr::read_volatile 来防止编译器重排序或缓存读取。
            // SeqLock 协议(读取后检查 s1 == s2)确保了如果有写入者处于活跃状态,我们会进行重试。
            // 这与 C 语言的 SeqLock 模式类似,其中数据读取必须使用 volatile/relaxed 语义,以避免在并发下发生数据撕裂。
            let value = unsafe { core::ptr::read_volatile(self.data.get() as *const T) };

            // Acquire fence: ensures the data read above is ordered before
            // we re-check the sequence counter.
            // Acquire 屏障:确保上述数据读取在重新检查序列计数器之前完成。
            std::sync::atomic::fence(Ordering::Acquire);
            let s2 = self.seq.load(Ordering::Relaxed);

            if s1 == s2 { return value; } // No writer intervened / 没有写入者介入
            // else retry / 否则重试
        }
    }

    /// # Safety contract
    /// Only ONE thread may call `write()` at a time. If multiple writers
    /// are needed, wrap the `write()` call in an external `Mutex`.
    /// # 安全合约
    /// 每次只能有一个线程调用 `write()`。如果需要多个写入者,请将 `write()` 调用封装在外部 `Mutex` 中。
    fn write(&self, val: T) {
        // Increment to odd (signals write in progress).
        // AcqRel: the Acquire side prevents the subsequent data write
        // from being reordered before this increment (readers must see
        // odd before they could observe a partial write). The Release
        // side is technically unnecessary for a single writer but
        // harmless and consistent.
        // 增加到奇数(信号表示写入正在进行中)。
        // AcqRel:Acquire 端阻止后续数据写入在此增量之前被重排序(读取者必须在观察到部分写入之前看到奇数)。
        // Release 端对于单个写入者来说技术上不是必需的,但无害且保持一致性。
        self.seq.fetch_add(1, Ordering::AcqRel);
        // SAFETY: Single-writer invariant upheld by caller (see doc above).
        // UnsafeCell allows interior mutation; seq counter protects readers.
        // 安全性:调用者需维护单写入者不变性(参见上方文档)。
        // UnsafeCell 允许内部可变性;序列计数器保护读取者。
        unsafe { *self.data.get() = val; }
        // Increment to even (signals write complete).
        // Release: ensure the data write is visible before readers see the even seq.
        // 增加到偶数(信号表示写入完成)。
        // Release:确保数据写入在读取者看到偶数序列之前可见。
        self.seq.fetch_add(1, Ordering::Release);
    }
}
}

⚠️ Rust memory model caveat: The non-atomic write through UnsafeCell in write() concurrent with the non-atomic ptr::read_volatile in read() is technically a data race under the Rust abstract machine — even though the SeqLock protocol ensures readers always retry on stale data. This mirrors the C kernel SeqLock pattern and is sound in practice on all modern hardware for types T that fit in a single machine word (e.g., u64). For wider types, consider using AtomicU64 for the data field or wrapping access in a Mutex. See the Rust unsafe code guidelines for the evolving story on UnsafeCell concurrency.

⚠️ Rust 内存模型警告:在 Rust 抽象机下,write() 中通过 UnsafeCell 进行的非原子写入与 read() 中非原子的 ptr::read_volatile 在并发时,技术上构成了“数据竞争 (Data Race)” —— 尽管 SeqLock 协议确保了读取者总是在数据陈旧时进行重试。这参考了 C 内核的 SeqLock 模式,并且在所有现代硬件上对于能装入单个机器字(例如 u64)的类型 T 来说,实践中是可靠的。对于更宽的类型,请考虑为数据字段使用 AtomicU64 或将访问封装在 Mutex 中。请参阅 Rust 非安全代码指南,了解 UnsafeCell 并发演进的故事。

Practical advice / 实践建议: Lock-free code is hard to get right. Use Mutex or RwLock unless profiling shows lock contention is your bottleneck. When you do need lock-free, reach for proven crates (crossbeam, arc-swap, dashmap) rather than rolling your own.

实践建议:无锁代码很难写对。除非性能分析表明锁争用是你的瓶颈,否则请使用 MutexRwLock。当你确实需要无锁方案时,请选用已经过验证的 crate(如 crossbeamarc-swapdashmap),而不是由于好奇而自行实现。

Key Takeaways — Concurrency / 核心要点 —— 并发

  • Scoped threads (thread::scope) let you borrow stack data without Arc / 作用域线程 (thread::scope) 允许你在不使用 Arc 的情况下借用栈数据
  • rayon::par_iter() parallelizes iterators with one method call / rayon::par_iter() 通过一个方法调用即可实现迭代器并行化
  • Use OnceLock/LazyLock instead of lazy_static!; use Mutex before reaching for atomics / 使用 OnceLock/LazyLock 代替 lazy_static!;在考虑原子操作之前先尝试使用 Mutex
  • Lock-free code is hard — prefer proven crates over hand-rolled implementations / 无锁代码非常困难 —— 优先使用经过验证的 crate,而不是自行实现

See also / 另请参阅: Ch 5 — Channels for message-passing concurrency. Ch 9 — Smart Pointers for Arc/Rc details.

参见 Ch 5 —— 通道 了解消息传递并发。参见 Ch 9 —— 智能指针 了解 Arc/Rc 的细节。

flowchart TD
    A["Need shared<br>mutable state?<br>需要共享可变状态吗?"] -->|Yes / 是| B{"How much<br>contention?<br>争用程度如何?"}
    A -->|No / 否| C["Use channels<br>(Ch 5)<br>使用通道 (Ch 5)"]

    B -->|"Read-heavy / 读多"| D["RwLock"]
    B -->|"Short critical<br>section / 简短临界区"| E["Mutex"]
    B -->|"Simple counter<br>or flag / 简单计数或标志"| F["Atomics / 原子操作"]
    B -->|"Complex state / 复杂状态"| G["Actor + channels<br>Actor + 通道"]

    H["Need parallelism?<br>需要并行吗?"] -->|"Collection<br>processing / 集合处理"| I["rayon::par_iter"]
    H -->|"Background task / 后台任务"| J["thread::spawn"]
    H -->|"Borrow local data / 借用局部数据"| K["thread::scope"]

    style A fill:#e8f4f8,stroke:#2980b9,color:#000
    style B fill:#fef9e7,stroke:#f1c40f,color:#000
    style C fill:#d4efdf,stroke:#27ae60,color:#000
    style D fill:#fdebd0,stroke:#e67e22,color:#000
    style E fill:#fdebd0,stroke:#e67e22,color:#000
    style F fill:#fdebd0,stroke:#e67e22,color:#000
    style G fill:#fdebd0,stroke:#e67e22,color:#000
    style H fill:#e8f4f8,stroke:#2980b9,color:#000
    style I fill:#d4efdf,stroke:#27ae60,color:#000
    style J fill:#d4efdf,stroke:#27ae60,color:#000
    style K fill:#d4efdf,stroke:#27ae60,color:#000

Exercise: Parallel Map with Scoped Threads ★★ (~25 min) / 练习:使用作用域线程实现并行映射 ★★(约 25 分钟)

Write a function parallel_map<T, R>(data: &[T], f: fn(&T) -> R, num_threads: usize) -> Vec<R> that splits data into num_threads chunks and processes each in a scoped thread. Do not use rayon — use std::thread::scope.

编写一个函数 parallel_map<T, R>(data: &[T], f: fn(&T) -> R, num_threads: usize) -> Vec<R>,将 data 分成 num_threads 个块,并在作用域线程中处理每个块。请勿使用 rayon —— 使用 std::thread::scope

🔑 Solution / 参考答案
fn parallel_map<T: Sync, R: Send>(data: &[T], f: fn(&T) -> R, num_threads: usize) -> Vec<R> {
    let chunk_size = (data.len() + num_threads - 1) / num_threads;
    let mut results = Vec::with_capacity(data.len());

    std::thread::scope(|s| {
        let mut handles = Vec::new();
        for chunk in data.chunks(chunk_size) {
            handles.push(s.spawn(move || {
                chunk.iter().map(f).collect::<Vec<_>>()
            }));
        }
        for h in handles {
            results.extend(h.join().unwrap());
        }
    });

    results
}

fn main() {
    let data: Vec<u64> = (1..=20).collect();
    let squares = parallel_map(&data, |x| x * x, 4);
    assert_eq!(squares, (1..=20).map(|x: u64| x * x).collect::<Vec<_>>());
    println!("Parallel squares: {squares:?}");
}

7. Closures and Higher-Order Functions / 7. 闭包与高阶函数 🟢

What you’ll learn / 你将学到:

  • The three closure traits (Fn, FnMut, FnOnce) and how capture works / 三种闭包 Trait (Fn, FnMut, FnOnce) 以及捕获的工作原理
  • Passing closures as parameters and returning them from functions / 将闭包作为参数传递,并从函数中返回闭包
  • Combinator chains and iterator adapters for functional-style programming / 函数式编程风格下的组合器链与迭代器适配器
  • Designing your own higher-order APIs with the right trait bounds / 使用正确的 Trait Bound 设计自己的高阶 API

Fn, FnMut, FnOnce — The Closure Traits / Fn, FnMut, FnOnce —— 闭包 Trait

Every closure in Rust implements one or more of three traits, based on how it captures variables:

Rust 中的每个闭包都会根据它捕获变量的方式,实现三个 Trait 中的一个或多个:

#![allow(unused)]
fn main() {
// FnOnce — consumes captured values (can only be called once)
// FnOnce —— 消耗所捕获的值(只能被调用一次)
let name = String::from("Alice");
let greet = move || {
    println!("Hello, {name}!"); // Takes ownership of `name` / 获取了 `name` 的所有权
    drop(name); // name is consumed / name 被消耗(丢弃)了
};
greet(); // ✅ First call / 首次调用
// greet(); // ❌ Can't call again — `name` was consumed / ❌ 无法再次调用 —— `name` 已被消耗

// FnMut — mutably borrows captured values (can be called many times)
// FnMut —— 以可变方式借用捕获的值(可以被调用多次)
let mut count = 0;
let mut increment = || {
    count += 1; // Mutably borrows `count` / 可变借用 `count`
};
increment(); // count == 1
increment(); // count == 2

// Fn — immutably borrows captured values (can be called many times, concurrently)
// Fn —— 以不可变方式借用捕获的值(可以并行被多次调用)
let prefix = "Result";
let display = |x: i32| {
    println!("{prefix}: {x}"); // Immutably borrows `prefix` / 不可变借用 `prefix`
};
display(1);
display(2);
}

The hierarchy / 等级体系: Fn : FnMut : FnOnce — each is a subtrait of the next:

等级体系Fn : FnMut : FnOnce —— 每一个都是后者的子 Trait:

FnOnce  ← everything can be called at least once / 任何闭包都至少能被调用一次
 ↑
FnMut   ← can be called repeatedly (may mutate state) / 可以重复调用(可能会修改状态)
 ↑
Fn      ← can be called repeatedly and concurrently (no mutation) / 可以重复并并行调用(不修改状态)

If a closure implements Fn, it also implements FnMut and FnOnce.

如果一个闭包实现了 Fn,它也同时实现了 FnMutFnOnce

Closures as Parameters and Return Values / 作为参数和返回值的闭包

// --- Parameters / 参数 ---

// Static dispatch (monomorphized — fastest)
// 静态分发(单态化 —— 速度最快)
fn apply_twice<F: Fn(i32) -> i32>(f: F, x: i32) -> i32 {
    f(f(x))
}

// Also written with impl Trait:
// 也可以使用 impl Trait 形式书写:
fn apply_twice_v2(f: impl Fn(i32) -> i32, x: i32) -> i32 {
    f(f(x))
}

// Dynamic dispatch (trait object — flexible, slight overhead)
// 动态分发(Trait 对象 —— 灵活,但有轻微开销)
fn apply_dyn(f: &dyn Fn(i32) -> i32, x: i32) -> i32 {
    f(x)
}

// --- Return Values / 返回值 ---

// Can't return closures by value without boxing (they have anonymous types):
// 闭包具有匿名类型,不使用 Boxing 就无法按值返回闭包:
fn make_adder(n: i32) -> Box<dyn Fn(i32) -> i32> {
    Box::new(move |x| x + n)
}

// With impl Trait (simpler, monomorphized, but can't be dynamic):
// 使用 impl Trait(更简单,单态化,但无法动态):
fn make_adder_v2(n: i32) -> impl Fn(i32) -> i32 {
    move |x| x + n
}

fn main() {
    let double = |x: i32| x * 2;
    println!("{}", apply_twice(double, 3)); // 12

    let add5 = make_adder(5);
    println!("{}", add5(10)); // 15
}

Combinator Chains and Iterator Adapters / 组合器链与迭代器适配器

Higher-order functions shine with iterators — this is idiomatic Rust:

高阶函数在迭代器中大放异彩 —— 这是地道的 Rust 风格:

#![allow(unused)]
fn main() {
// C-style loop (imperative):
// C 语言风格循环(命令式):
let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
let mut result = Vec::new();
for x in &data {
    if x % 2 == 0 {
        result.push(x * x);
    }
}

// Idiomatic Rust (functional combinator chain):
// 地道的 Rust 风格(函数式组合器链):
let result: Vec<i32> = data.iter()
    .filter(|&&x| x % 2 == 0)
    .map(|&x| x * x)
    .collect();

// Same performance — iterators are lazy and optimized by LLVM
// 性能相同 —— 迭代器是延迟计算(Lazy)的,并由 LLVM 进行了优化
assert_eq!(result, vec![4, 16, 36, 64, 100]);
}

Common combinators cheat sheet / 常用组合器速查表:

Combinator / 组合器What It Does / 功能Example / 示例
.map(f)Transform each element / 转换每个元素`.map(
.filter(p)Keep elements where predicate is true / 保留谓词为真的元素`.filter(
.filter_map(f)Map + filter in one step (returns Option) / 一步完成映射与过滤(返回 Option`.filter_map(
.flat_map(f)Map then flatten nested iterators / 先映射再扁平化嵌套迭代器`.flat_map(
.fold(init, f)Reduce to single value / 归约为单个值`.fold(0,
.any(p) / .all(p)Short-circuit boolean check / 短路布尔检查`.any(
.enumerate()Add index / 添加索引`.enumerate().map(
.zip(other)Pair with another iterator / 与另一个迭代器配对.zip(labels.iter())
.take(n) / .skip(n)First/skip N elements / 获取/跳过前 N 个元素.take(10)
.chain(other)Concatenate two iterators / 连接两个迭代器.chain(extra.iter())
.peekable()Look ahead without consuming / 在不消耗元素的情况下查看后续内容.peek()
.collect()Gather into a collection / 收集到集合中.collect::<Vec<_>>()

Implementing Your Own Higher-Order APIs / 实现你自己的高阶 API

Design APIs that accept closures for customization:

设计接受闭包作为参数的 API,以便进行自定义:

#![allow(unused)]
fn main() {
/// Retry an operation with a configurable strategy
/// 使用可配置策略重试操作
fn retry<T, E, F, S>(
    mut operation: F,
    mut should_retry: S,
    max_attempts: usize,
) -> Result<T, E>
where
    F: FnMut() -> Result<T, E>,
    S: FnMut(&E, usize) -> bool, // (error, attempt) → try again? / (错误, 尝试次数) → 是否重试?
{
    for attempt in 1..=max_attempts {
        match operation() {
            Ok(val) => return Ok(val),
            Err(e) if attempt < max_attempts && should_retry(&e, attempt) => {
                continue;
            }
            Err(e) => return Err(e),
        }
    }
    unreachable!()
}

// Usage — caller controls retry logic:
// 使用示例 —— 调用者控制重试逻辑:
}
#![allow(unused)]
fn main() {
fn connect_to_database() -> Result<(), String> { Ok(()) }
fn http_get(_url: &str) -> Result<String, String> { Ok(String::new()) }
trait TransientError { fn is_transient(&self) -> bool; }
impl TransientError for String { fn is_transient(&self) -> bool { true } }
let url = "http://example.com";
let result = retry(
    || connect_to_database(),
    |err, attempt| {
        eprintln!("Attempt {attempt} failed: {err}");
        true // Always retry / 总是重试
    },
    3,
);

// Usage — retry only specific errors:
// 使用示例 —— 仅重试特定错误:
let result = retry(
    || http_get(url),
    |err, _| err.is_transient(), // Only retry transient errors / 仅重试瞬态错误
    5,
);
}

The with Pattern — Bracketed Resource Access / with 模式 —— 括号式资源访问

Sometimes you need to guarantee that a resource is in a specific state for the duration of an operation, and restored afterward — regardless of how the caller’s code exits (early return, ?, panic). Instead of exposing the resource directly and hoping callers remember to set up and tear down, lend it through a closure:

有时你需要保证某个资源在操作期间处于特定状态,并在操作结束后恢复 —— 无论调用者的代码如何退出(提前返回、使用 ?、恐慌等)。与其直接暴露资源并寄希望于调用者记得进行设置和清理,不如 通过闭包将其借出

set up → call closure with resource → tear down
设置 (Set up) → 使用资源调用闭包 → 清理 (Tear down)

The caller never touches setup or teardown. They can’t forget, can’t get it wrong, and can’t hold the resource beyond the closure’s scope.

调用者无需接触设置或清理逻辑。他们不会忘记,也不会出错,更无法在闭包作用域之外持有该资源。

Example: GPIO Pin Direction / 示例:GPIO 引脚方向

A GPIO controller manages pins that support bidirectional I/O. Some callers need the pin configured as input, others as output. Rather than exposing raw pin access and trusting callers to set direction correctly, the controller provides with_pin_input and with_pin_output:

GPIO 控制器负责管理支持双向 I/O 的引脚。有些调用者需要引脚配置为输入,有些则需要配置为输出。控制器不再暴露原始引脚访问权限并信任调用者能够正确设置方向,而是提供 with_pin_inputwith_pin_output 方法:

/// GPIO pin direction — not public, callers never set this directly.
/// GPIO 引脚方向 —— 不公开,调用者永远无法直接设置。
#[derive(Debug, Clone, Copy, PartialEq)]
enum Direction { In, Out }

/// A GPIO pin handle lent to the closure. Cannot be stored or cloned —
/// it exists only for the duration of the callback.
/// 借给闭包的 GPIO 引脚句柄。它不能被存储或克隆 ——
/// 它仅在回调执行期间存在。
pub struct GpioPin<'a> {
    pin_number: u8,
    _controller: &'a GpioController,
}

impl GpioPin<'_> {
    pub fn read(&self) -> bool {
        // Read pin level from hardware register / 从硬件寄存器读取引脚电平
        println!("  reading pin {}", self.pin_number);
        true // stub
    }

    pub fn write(&self, high: bool) {
        // Drive pin level via hardware register / 通过硬件寄存器驱动引脚电平
        println!("  writing pin {} = {high}", self.pin_number);
    }
}

pub struct GpioController {
    current_direction: std::cell::Cell<Option<Direction>>,
}

impl GpioController {
    pub fn new() -> Self {
        GpioController {
            current_direction: std::cell::Cell::new(None),
        }
    }

    /// Configure pin as input, run the closure, restore state.
    /// The caller receives a `GpioPin` that lives only for the callback.
    /// 将引脚配置为输入,运行闭包,随后恢复状态。
    /// 调用者接收到一个仅在回调函数中存活的 `GpioPin`。
    pub fn with_pin_input<R>(
        &self,
        pin: u8,
        mut f: impl FnMut(&GpioPin<'_>) -> R,
    ) -> R {
        let prev = self.current_direction.get();
        self.set_direction(pin, Direction::In);
        let handle = GpioPin { pin_number: pin, _controller: self };
        let result = f(&handle);
        // Restore previous direction (or leave as-is — policy choice)
        // 恢复之前的方向(或保持不变 —— 这取决于策略选择)
        if let Some(dir) = prev {
            self.set_direction(pin, dir);
        }
        result
    }

    /// Configure pin as output, run the closure, restore state.
    /// 将引脚配置为输出,运行闭包,随后恢复状态。
    pub fn with_pin_output<R>(
        &self,
        pin: u8,
        mut f: impl FnMut(&GpioPin<'_>) -> R,
    ) -> R {
        let prev = self.current_direction.get();
        self.set_direction(pin, Direction::Out);
        let handle = GpioPin { pin_number: pin, _controller: self };
        let result = f(&handle);
        if let Some(dir) = prev {
            self.set_direction(pin, dir);
        }
        result
    }

    fn set_direction(&self, pin: u8, dir: Direction) {
        println!("  [hw] pin {pin} → {dir:?}");
        self.current_direction.set(Some(dir));
    }
}

fn main() {
    let gpio = GpioController::new();

    // Caller 1: needs input — doesn't know or care how direction is managed
    // 调用者 1:需要输入 —— 既不知道也不关心方向是如何管理的
    let level = gpio.with_pin_input(4, |pin| {
        pin.read()
    });
    println!("Pin 4 level: {level}");

    // Caller 2: needs output — same API shape, different guarantee
    // 调用者 2:需要输出 —— 同样的 API 形式,不同的保证
    gpio.with_pin_output(4, |pin| {
        pin.write(true);
        // do more work...
        pin.write(false);
    });

    // Can't use the pin handle outside the closure:
    // 无法在闭包之外使用引脚句柄:
    // let escaped_pin = gpio.with_pin_input(4, |pin| pin);
    // ❌ ERROR: borrowed value does not live long enough
}

What the with pattern guarantees: / with 模式提供的保证:

  • Direction is always set before the caller’s code runs / 在调用者代码运行之前,方向 始终已设置
  • Direction is always restored after, even if the closure returns early / 即使闭包提前返回,方向 始终会在之后恢复
  • The GpioPin handle cannot escape the closure — the borrow checker enforces this via the lifetime tied to the controller reference / GpioPin 句柄 无法逃逸 出闭包 —— 借用检查器通过绑定到控制器引脚的生命周期来强制执行此规则
  • Callers never import Direction, never call set_direction — the API is impossible to misuse / 调用者永远不需要导入 Direction,也不需要调用 set_direction —— 该 API 几乎不可能被误用

Where This Pattern Appears / 此模式在何处出现

The with pattern shows up throughout Rust’s standard library and ecosystem:

with 模式贯穿于 Rust 的标准库和生态系统中:

APISetup / 设置Callback / 回调Teardown / 清理
std::thread::scopeCreate scope / 创建作用域|s| { s.spawn(...) }Join all threads / 汇合所有线程
Mutex::lockAcquire lock / 获取锁Use MutexGuard / 使用 MutexGuardRelease on drop / 丢弃时释放
tempfile::tempdirCreate temp directory / 创建临时目录Use path / 使用路径Delete on drop / 丢弃时删除
std::io::BufWriter::newBuffer writes / 缓冲写入Write operations / 写入操作Flush on drop / 丢弃时刷新
GPIO with_pin_* (above)Set direction / 设置方向Use pin handle / 使用引脚句柄Restore direction / 恢复方向

The closure-based variant is strongest when:

基于闭包的变体在以下情况下最为强大:

  • Setup and teardown are paired and forgetting either is a bug / 设置与清理必须成对出现,遗漏任何一个都是 Bug
  • The resource shouldn’t outlive the operation — the borrow checker enforces this naturally / 资源不应在操作结束后继续存活 —— 借用检查器可以自然地强制执行这一点
  • Multiple configurations exist (with_pin_input vs with_pin_output) — each with_* method encapsulates a different setup without exposing the configuration to the caller / 存在多种配置选择(如 with_pin_inputwith_pin_output) —— 每个 with_* 方法都封装了不同的设置,且无需向调用者暴露具体的配置细节

with vs RAII (Drop): Both guarantee cleanup. Use RAII / Drop when the caller needs to hold the resource across multiple statements and function calls. Use with when the operation is bracketed — one setup, one block of work, one teardown — and you don’t want the caller to be able to break the bracket.

with 对比 RAII (Drop):两者都能保证清理工作。当调用者需要跨多个语句或函数调用持有资源时,请使用 RAII / Drop。当操作是 括号式 (Bracketed) 的 —— 即:一次设置、一段工作、一次清理 —— 并且你不希望调用者能够打破这个“括号”约束时,请使用 with

FnMut vs Fn in API design / API 设计中的 FnMut 与 Fn: Use FnMut as the default bound — it’s the most flexible (callers can pass Fn or FnMut closures). Only require Fn if you need to call the closure concurrently (e.g., from multiple threads). Only require FnOnce if you call it exactly once.

API 设计中的 FnMut 与 Fn:请将 FnMut 作为默认的 Trait Bound —— 它是最灵活的(调用者可以传递 FnFnMut 闭包)。只有在你需要并发地执行闭包(例如多线程环境下)时,才要求 Fn。只有在你确定仅调用闭包一次时,才要求 FnOnce

Key Takeaways — Closures / 核心要点 —— 闭包

  • Fn borrows, FnMut borrows mutably, FnOnce consumes — accept the weakest bound your API needs / Fn 借用,FnMut 可变借用,FnOnce 消耗 —— 你的 API 应接受能满足需求的、强度最弱的 Bound
  • impl Fn in parameters, Box<dyn Fn> for storage, impl Fn in return (or Box<dyn Fn> if dynamic) / 参数中使用 impl Fn,存储时使用 Box<dyn Fn>,返回时使用 impl Fn(若需动态则使用 Box<dyn Fn>
  • Combinator chains (map, filter, and_then) compose cleanly and inline to tight loops / 组合器链(mapfilterand_then)可以通过清晰的组合内联为紧凑的循环
  • The with pattern (bracketed access via closure) guarantees setup/teardown and prevents resource escape — use it when the caller shouldn’t manage configuration lifecycle / with 模式(通过闭包进行的括号式访问)保证了设置与清理,并防止资源逃逸 —— 当不应由调用者管理配置生命周期时,请使用该模式

See also / 另请参阅: Ch 2 — Traits In Depth for how Fn/FnMut/FnOnce relate to trait objects. Ch 8 — Functional vs. Imperative for when to choose combinators over loops. Ch 15 — API Design for ergonomic parameter patterns.

参见 Ch 2 —— Trait 深入解析 了解 Fn/FnMut/FnOnce 与 Trait 对象的联系。参见 Ch 8 —— 函数式对比命令式 了解何时选择组合器而非循环。参见 Ch 15 —— API 设计 了解符合人体工程学的参数模式。

graph TD
    FnOnce["FnOnce<br>(can call once / 仅能调用一次)"]
    FnMut["FnMut<br>(can call many times,<br>may mutate captures / 可调用多次,<br>可以修改捕获变量)"]
    Fn["Fn<br>(can call many times,<br>immutable captures / 可调用多次,<br>不可变捕获)"]

    Fn -->|"implements / 实现了"| FnMut
    FnMut -->|"implements / 实现了"| FnOnce

    style Fn fill:#d4efdf,stroke:#27ae60,color:#000
    style FnMut fill:#fef9e7,stroke:#f1c40f,color:#000
    style FnOnce fill:#fadbd8,stroke:#e74c3c,color:#000

Every Fn is also FnMut, and every FnMut is also FnOnce. Accept FnMut by default — it’s the most flexible bound for callers.

每个 Fn 都实现了 FnMut,而每个 FnMut 都实现了 FnOnce。默认情况下请接受 FnMut —— 因为它对调用者来说是最灵活的。


Exercise: Higher-Order Combinator Pipeline ★★ (~25 min) / 练习:高阶组合器流水线 ★★(约 25 分钟)

Create a Pipeline struct that chains transformations. It should support .pipe(f) to add a transformation and .execute(input) to run the full chain.

创建一个 Pipeline 结构体来链接各种转换操作。它应支持通过 .pipe(f) 添加转换,并通过 .execute(input) 运行整个链条。

🔑 Solution / 参考答案
struct Pipeline<T> {
    transforms: Vec<Box<dyn Fn(T) -> T>>,
}

impl<T: 'static> Pipeline<T> {
    fn new() -> Self {
        Pipeline { transforms: Vec::new() }
    }

    fn pipe(mut self, f: impl Fn(T) -> T + 'static) -> Self {
        self.transforms.push(Box::new(f));
        self
    }

    fn execute(self, input: T) -> T {
        self.transforms.into_iter().fold(input, |val, f| f(val))
    }
}

fn main() {
    let result = Pipeline::new()
        .pipe(|s: String| s.trim().to_string())
        .pipe(|s| s.to_uppercase())
        .pipe(|s| format!(">>> {s} <<<"))
        .execute("  hello world  ".to_string());

    println!("{result}"); // >>> HELLO WORLD <<<

    let result = Pipeline::new()
        .pipe(|x: i32| x * 2)
        .pipe(|x| x + 10)
        .pipe(|x| x * x)
        .execute(5);

    println!("{result}"); // (5*2 + 10)^2 = 400
}

Chapter 8 — Functional vs. Imperative: When Elegance Wins (and When It Doesn’t) / 第 8 章 —— 函数式与命令式:优雅何时胜出(以及何时不会)

Difficulty / 难度: 🟡 Intermediate / 中级 | Time / 预计用时: 2–3 hours / 2–3 小时 | Prerequisites / 先决条件: Ch 7 — Closures

Rust gives you genuine parity between functional and imperative styles. Unlike Haskell (functional by fiat) or C (imperative by default), Rust lets you choose — and the right choice depends on what you’re expressing. This chapter builds the judgment to pick well.

Rust 让函数式与命令式风格拥有了真正的对等地位。与 Haskell(强制函数式)或 C(默认命令式)不同,Rust 让你自由选择 —— 而正确的选择取决于你想要表达的内容。本章旨在帮助你建立良好的判断力,从而做出明智的选择。

The core principle: Functional style shines when you’re transforming data through a pipeline. Imperative style shines when you’re managing state transitions with side effects. Most real code has both, and the skill is knowing where the boundary falls.

核心原则:函数式风格在 通过流水线转换数据 时大放异彩;而命令式风格则在 管理带有副作用的状态转换 时更胜一筹。大多数真实代码都兼具两者,而技能的关键在于掌握它们之间的界限在哪里。


8.1 The Combinator You Didn’t Know You Wanted / 8.1 你一直想要却未曾察觉的组合器

Many Rust developers write this:

许多 Rust 开发者会这样写:

#![allow(unused)]
fn main() {
let value = if let Some(x) = maybe_config() {
    x
} else {
    default_config()
};
process(value);
}

When they could write this:

但他们其实可以写成这样:

#![allow(unused)]
fn main() {
process(maybe_config().unwrap_or_else(default_config));
}

Or this common pattern:

或者是这种常见的模式:

#![allow(unused)]
fn main() {
let display_name = if let Some(name) = user.nickname() {
    name.to_uppercase()
} else {
    "ANONYMOUS".to_string()
};
}

Which is:

可以简化为:

#![allow(unused)]
fn main() {
let display_name = user.nickname()
    .map(|n| n.to_uppercase())
    .unwrap_or_else(|| "ANONYMOUS".to_string());
}

The functional version isn’t just shorter — it tells you what is happening (transform, then default) without making you trace control flow. The if let version makes you read the branches to figure out that both paths end up in the same place.

函数式版本不仅仅是为了缩短代码 —— 它直接告诉了你 正在发生什么(先转换,再应用默认值),而不需要你去追踪控制流。if let 版本则强制让你去阅读各个分支,才能弄清楚这两条路径最终会汇聚到同一个地方。

The Option combinator family / Option 组合器家族

Here’s the mental model: Option<T> is a one-element-or-empty collection. Every combinator on Option has an analogy to a collection operation.

下面是它的心智模型:Option<T> 是一个包含“一个元素”或“为空”的集合。Option 上的每个组合器在集合操作中都有类比。

You write… / 你写下…Instead of… / 代替…What it communicates / 它所表达的意图
opt.unwrap_or(default)if let Some(x) = opt { x } else { default }“Use this value or fall back” / “使用该值或回退”
`opt.unwrap_or_else(expensive())`
opt.map(f)match opt { Some(x) => Some(f(x)), None => None }“Transform the inside, propagate absence” / “转换内部值,传播空值”
opt.and_then(f)match opt { Some(x) => f(x), None => None }“Chain fallible operations” (flatmap) / “链式调用可能失败的操作”
`opt.filter(xpred(x))`
opt.zip(other)if let (Some(a), Some(b)) = (opt, other) { Some((a,b)) } else { None }“Both or neither” / “两者皆有或两者皆无”
opt.or(fallback)if opt.is_some() { opt } else { fallback }“First available” / “第一个可用项”
`opt.or_else(try_another())`
opt.map_or(default, f)if let Some(x) = opt { f(x) } else { default }“Transform or default” — one-liner / “转换或应用默认值” —— 单行实现
opt.map_or_else(default_fn, f)if let Some(x) = opt { f(x) } else { default_fn() }Same, both sides are closures / 同上,两边都是闭包
opt?match opt { Some(x) => x, None => return None }“Propagate absence upward” / “向上层传播空值”

The Result combinator family / Result 组合器家族

The same pattern applies to Result<T, E>:

同样的模式也适用于 Result<T, E>

You write… / 你写下…Instead of… / 代替…What it communicates / 它所表达的意图
res.map(f)match res { Ok(x) => Ok(f(x)), Err(e) => Err(e) }Transform the success path / 转换成功路径
res.map_err(f)match res { Ok(x) => Ok(x), Err(e) => Err(f(e)) }Transform the error / 转换错误值
res.and_then(f)match res { Ok(x) => f(x), Err(e) => Err(e) }Chain fallible operations / 链式调用可能失败的操作
`res.unwrap_or_else(edefault(e))`
res.ok()match res { Ok(x) => Some(x), Err(_) => None }“I don’t care about the error” / “我不关心具体的错误”
res?match res { Ok(x) => x, Err(e) => return Err(e.into()) }Propagate errors upward / 向上层传播错误

When if let IS better / 何时 if let 更合适

The combinators lose when:

  • You need multiple statements in the Some branch. A map closure with 5 lines is worse than an if let with 5 lines.
  • The control flow is the point. if let Some(connection) = pool.try_get() { /* use it */ } else { /* log, retry, alert */ } — the two branches are genuinely different code paths, not a transform-or-default.
  • Side effects dominate. If both branches do I/O with different error handling, the combinator version obscures the important differences.

Rule of thumb: If the else branch produces the same type as the Some branch and the bodies are short expressions, use a combinator. If the branches do fundamentally different things, use if let or match.


8.2 Bool Combinators: .then() and .then_some() / 8.2 布尔组合器:.then().then_some()

Another pattern that’s more common than it should be:

另一个比想象中更常见的模式:

#![allow(unused)]
fn main() {
let label = if is_admin {
    Some("ADMIN")
} else {
    None
};
}

Rust 1.62+ gives you:

Rust 1.62+ 之后你可以这样写:

#![allow(unused)]
fn main() {
let label = is_admin.then_some("ADMIN");
}

Or with a computed value:

或者是使用计算出的值:

#![allow(unused)]
fn main() {
let permissions = is_admin.then(|| compute_admin_permissions());
}

This is especially powerful in chains:

这在链式调用中尤其强大:

#![allow(unused)]
fn main() {
// Imperative / 命令式
let mut tags = Vec::new();
if user.is_admin { tags.push("admin"); }
if user.is_verified { tags.push("verified"); }
if user.score > 100 { tags.push("power-user"); }

// Functional / 函数式
let tags: Vec<&str> = [
    user.is_admin.then_some("admin"),
    user.is_verified.then_some("verified"),
    (user.score > 100).then_some("power-user"),
]
.into_iter()
.flatten()
.collect();
}

The functional version makes the pattern explicit: “build a list from conditional elements.” The imperative version makes you read each if to confirm they all do the same thing (push a tag).

函数式版本使这种模式显式化:“从条件元素中构建一个列表”。而命令式版本则迫使你阅读每一个 if 语句,以确认它们都在做同样的事情(如推送一个标签)。



8.3 Iterator Chains vs. Loops: The Decision Framework / 8.3 迭代器链与循环:决策框架

Ch 7 showed the mechanics. This section builds the judgment.

第 7 章展示了其工作机制。本节将培养你做出判断的能力。

When iterators win / 何时迭代器胜出

Data pipelines — transforming a collection through a series of steps:

数据流水线 —— 通过一系列步骤转换集合中的数据:

#![allow(unused)]
fn main() {
// Imperative: 8 lines, 2 mutable variables
// 命令式:8 行代码,2 个可变变量
let mut results = Vec::new();
for item in inventory {
    if item.category == Category::Server {
        if let Some(temp) = item.last_temperature() {
            if temp > 80.0 {
                results.push((item.id, temp));
            }
        }
    }
}

// Functional: 6 lines, 0 mutable variables, one pipeline
// 函数式:6 行代码,0 个可变变量,一条流水线
let results: Vec<_> = inventory.iter()
    .filter(|item| item.category == Category::Server)
    .filter_map(|item| item.last_temperature().map(|t| (item.id, t)))
    .filter(|(_, temp)| *temp > 80.0)
    .collect();
}

The functional version wins because:

函数式版本胜出是因为:

  • Each filter is independently readable / 每个过滤器都是独立可读的
  • No mut — the data flows in one direction / 没有 mut —— 数据流向单一
  • You can add/remove/reorder pipeline stages without restructuring / 你可以增加、删除或重新排序流水线阶段,而无需重新组织代码结构
  • LLVM inlines iterator adapters to the same machine code as the loop / LLVM 会将迭代器适配器内联为与循环相同的机器码

Aggregation — computing a single value from a collection:

聚合 —— 从集合中计算出单个值:

#![allow(unused)]
fn main() {
// Imperative / 命令式
let mut total_power = 0.0;
let mut count = 0;
for server in fleet {
    total_power += server.power_draw();
    count += 1;
}
let avg = total_power / count as f64;

// Functional / 函数式
let (total_power, count) = fleet.iter()
    .map(|s| s.power_draw())
    .fold((0.0, 0usize), |(sum, n), p| (sum + p, n + 1));
let avg = total_power / count as f64;
}

Or even simpler if you just need the sum:

如果你只需要求和,还可以更简单:

#![allow(unused)]
fn main() {
let total: f64 = fleet.iter().map(|s| s.power_draw()).sum();
}

When loops win / 何时循环胜出

Early exit with complex state / 带有复杂状态的提前退出:

#![allow(unused)]
fn main() {
// This is clear and direct / 这种写法清晰直接
let mut best_candidate = None;
for server in fleet {
    let score = evaluate(server);
    if score > threshold {
        if server.is_available() {
            best_candidate = Some(server);
            break; // Found one — stop immediately / 找到了一个 —— 立即停止
        }
    }
}

// The functional version is strained / 函数式版本则显得有些勉强
let best_candidate = fleet.iter()
    .filter(|s| evaluate(s) > threshold)
    .find(|s| s.is_available());
}

Wait — that functional version is actually pretty clean. Let’s try a case where it genuinely loses:

等等 —— 上面那个函数式版本其实挺整洁的。让我们看一个它真正处于下风的情况:

Building multiple outputs simultaneously / 同时构建多个输出:

#![allow(unused)]
fn main() {
// Imperative: clear, each branch does something different
// 命令式:清晰,每个分支执行不同的操作
let mut warnings = Vec::new();
let mut errors = Vec::new();
let mut stats = Stats::default();

for event in log_stream {
    match event.severity {
        Severity::Warn => {
            warnings.push(event.clone());
            stats.warn_count += 1;
        }
        Severity::Error => {
            errors.push(event.clone());
            stats.error_count += 1;
            if event.is_critical() {
                alert_oncall(&event);
            }
        }
        _ => stats.other_count += 1,
    }
}

// Functional version: forced, awkward, nobody wants to read this
// 函数式版本:牵强、笨拙,没人想读这样的代码
let (warnings, errors, stats) = log_stream.iter().fold(
    (Vec::new(), Vec::new(), Stats::default()),
    |(mut w, mut e, mut s), event| {
        match event.severity {
            Severity::Warn => { w.push(event.clone()); s.warn_count += 1; }
            Severity::Error => {
                e.push(event.clone()); s.error_count += 1;
                if event.is_critical() { alert_oncall(event); }
            }
            _ => s.other_count += 1,
        }
        (w, e, s)
    },
);
}

The fold version is longer, harder to read, and has mutation anyway (the mut deconstructed accumulators). The loop wins because:

fold 版本不仅 更长更难读,而且无论如何都存在可变性(被解构的可变累加器)。循环胜出是因为:

  • Multiple outputs being built in parallel / 并行构建多个输出
  • Side effects (alerting) mixed into the logic / 逻辑中混入了副作用(如报警通知)
  • Branch bodies are statements, not expressions / 分支主体是语句而非表达式

State machines with I/O / 带有 I/O 的状态机:

#![allow(unused)]
fn main() {
// A parser that reads tokens — the loop IS the algorithm
// 一个读取 Token 的解析器 —— 这里的循环本身即是算法
let mut state = ParseState::Start;
loop {
    let token = lexer.next_token()?;
    state = match state {
        ParseState::Start => match token {
            Token::Keyword(k) => ParseState::GotKeyword(k),
            Token::Eof => break,
            _ => return Err(ParseError::UnexpectedToken(token)),
        },
        ParseState::GotKeyword(k) => match token {
            Token::Ident(name) => ParseState::GotName(k, name),
            _ => return Err(ParseError::ExpectedIdentifier),
        },
        // ...more states / ...更多状态
    };
}
}

No functional equivalent is cleaner. The loop with match state is the natural expression of a state machine.

没有比这更简洁的函数式对等写法了。带有 match state 的循环是实现状态机最自然的表达方式。

The decision flowchart / 决策流程图

flowchart TB
    START{What are you doing? / 你在做什么?}

    START -->|"Transforming a collection\ninto another collection / \n将一个集合转换为另一个集合"| PIPE[Use iterator chain / 使用迭代器链]
    START -->|"Computing a single value\nfrom a collection / \n从集合计算出单个值"| AGG{How complex? / 复杂度如何?}
    START -->|"Multiple outputs from\none pass / \n单次遍历产生多个输出"| LOOP[Use a for loop / 使用 for 循环]
    START -->|"State machine with\nI/O or side effects / \n带有 I/O 或副作用的状态机"| LOOP
    START -->|"One Option/Result\ntransform + default / \n单个 Option/Result \n转换 + 默认值"| COMB[Use combinators / 使用组合器]

    AGG -->|"Sum, count, min, max / \n求和、计数、最小值、最大值"| BUILTIN["Use .sum(), .count(),\n.min(), .max() / \n使用内置方法"]
    AGG -->|"Custom accumulation / \n自定义累加操作"| FOLD{Accumulator has mutation\nor side effects? / \n累加器是否包含可变性或副作用?}
    FOLD -->|"No / 否"| FOLDF["Use .fold()"]
    FOLD -->|"Yes / 是"| LOOP

    style PIPE fill:#d4efdf,stroke:#27ae60,color:#000
    style COMB fill:#d4efdf,stroke:#27ae60,color:#000
    style BUILTIN fill:#d4efdf,stroke:#27ae60,color:#000
    style FOLDF fill:#d4efdf,stroke:#27ae60,color:#000
    style LOOP fill:#fef9e7,stroke:#f1c40f,color:#000

Rust blocks are expressions. This lets you confine mutation to a construction phase and bind the result immutably:

Rust 的代码块是表达式。这允许你将可变性限制在构建阶段,并以不可变的方式绑定结果:

#![allow(unused)]
fn main() {
use rand::random;

let samples = {
    let mut buf = Vec::with_capacity(10);
    while buf.len() < 10 {
        let reading: f64 = random();
        buf.push(reading);
        if random::<u8>() % 3 == 0 { break; } // randomly stop early / 随机提前停止
    }
    buf // Yield the vector / 产生(返回)这个 vector
};
// samples is immutable — contains between 1 and 10 elements
// samples 是不可变的 —— 包含 1 到 10 个元素
}

The inner buf is mutable only inside the block. Once the block yields, the outer binding samples is immutable and the compiler will reject any later samples.push(...).

内部的 buf 仅在代码块内是可变的。一旦代码块产生结果,外部绑定 samples 就是不可变的,编译器将拒绝后续任何 samples.push(...) 调用。

Why not an iterator chain? You might try:

为什么不用迭代器链? 你可能会尝试:

#![allow(unused)]
fn main() {
let samples: Vec<f64> = std::iter::from_fn(|| Some(random()))
    .take(10)
    .take_while(|_| random::<u8>() % 3 != 0)
    .collect();
}

But take_while excludes the element that fails the predicate, producing anywhere from zero to nine elements instead of the guaranteed-at-least-one the imperative version provides. You can work around it with scan or chain, but the imperative version is clearer.

但是 take_while排除 谓词检查失败的那个元素,从而产生 0 到 9 个元素,而不是像命令式版本那样保证至少有一个元素。你也可以通过 scanchain 来规避这个问题,但命令式版本更清晰。

When scoped mutability genuinely wins / 作用域内可变性真正胜出的场景:

Scenario / 场景Why iterators struggle / 为什么迭代器难以胜任
Sort-then-freeze / 排序后冻结 (sort_unstable() + dedup())Both return () — no chainable output / 两者都返回 () —— 没有可链式调用的输出(若有 itertools 则可使用 .sorted().dedup()
Stateful termination / 状态化终止 (stop on a condition unrelated to the data)take_while drops the boundary element / take_while 会丢弃边界上的那个元素
Multi-step struct population / 多步骤结构体填充 (field-by-field from different sources)No natural single pipeline / 没有自然的单一流水线

Honest calibration: For most collection-building tasks, iterator chains or itertools are preferred. Reach for scoped mutability when the construction logic has branching, early exit, or in-place mutation that doesn’t map to a single pipeline. The pattern’s real value is teaching that mutation scope can be smaller than variable lifetime — a Rust fundamental that surprises developers coming from C++, C#, and Python.

坦诚的评估:对于大多数构建集合的任务,迭代器链或 itertools 是首选。当构建逻辑包含分支、提前退出或无法映射到单一流水线的原地修改时,请使用作用域内可变性。这种模式的真正价值在于:它揭示了 可变作用域可以比变量生命周期更小 —— 这是 Rust 的一项基本原理,常令来自 C++、C# 和 Python 的开发者感到惊讶。


8.4 The ? Operator: Where Functional Meets Imperative / 8.4 ? 运算符:函数式与命令式的交汇点

The ? operator is Rust’s most elegant synthesis of both styles. It’s essentially .and_then() combined with early return:

? 运算符是 Rust 对这两种风格最优雅的综合。它本质上是 .and_then() 与提前返回(early return)的结合:

#![allow(unused)]
fn main() {
// This chain of and_then...
// 这种 and_then 链式调用...
fn load_config() -> Result<Config, Error> {
    read_file("config.toml")
        .and_then(|contents| parse_toml(&contents))
        .and_then(|table| validate_config(table))
        .and_then(|valid| Config::from_validated(valid))
}

// ...is exactly equivalent to this
// ...与下面这段代码完全等价
fn load_config() -> Result<Config, Error> {
    let contents = read_file("config.toml")?;
    let table = parse_toml(&contents)?;
    let valid = validate_config(table)?;
    Config::from_validated(valid)
}
}

Both are functional in spirit (they propagate errors automatically) but the ? version gives you named intermediate variables, which matter when:

两者在精神上都是函数式的(它们自动传播错误),但 ? 版本为你提供了命名的中间变量,这在以下情况下非常重要:

  • You need to use contents again later / 你稍后还需要再次使用 contents
  • You want to add .context("while parsing config")? per step / 你想为每一步添加 .context("while parsing config")?
  • You’re debugging and want to inspect intermediate values / 你正在调试,并希望检查中间值

The anti-pattern: long .and_then() chains when ? is available. If every closure in the chain is |x| next_step(x), you’ve reinvented ? without the readability.

反模式:在 ? 可用的情况下使用长长的 .and_then() 链。如果链中的每个闭包都只是 |x| next_step(x),那么你只是在用一种更难读的方式重新实现 ?

When .and_then() IS better than ? / 何时 .and_then()? 更好:

#![allow(unused)]
fn main() {
// Transforming inside an Option, without early return
// 在 Option 内部进行转换,无需提前返回
let port: Option<u16> = config.get("port")
    .and_then(|v| v.parse::<u16>().ok())
    .filter(|&p| p > 0 && p < 65535);
}

You can’t use ? here because there’s no enclosing function to return from — you’re building an Option, not propagating it.

在这里你不能使用 ?,因为没有外部函数可以返回 —— 你是在构建一个 Option,而不是传播它。



8.5 Collection Building: collect() vs. Push Loops / 8.5 集合构建:collect() 与 Push 循环

collect() is more powerful than most developers realize:

collect() 的强大程度超乎大多数开发者的想象:

Collecting into a Result / 收集到 Result

#![allow(unused)]
fn main() {
// Imperative: parse a list, fail on first error
// 命令式:解析列表,遇到第一个错误即失败
let mut numbers = Vec::new();
for s in input_strings {
    let n: i64 = s.parse().map_err(|_| Error::BadInput(s.clone()))?;
    numbers.push(n);
}

// Functional: collect into Result<Vec<_>, _>
// 函数式:收集到 Result<Vec<_>, _> 中
let numbers: Vec<i64> = input_strings.iter()
    .map(|s| s.parse::<i64>().map_err(|_| Error::BadInput(s.clone())))
    .collect::<Result<_, _>>()?;
}

The collect::<Result<Vec<_>, _>>() trick works because Result implements FromIterator. It short-circuits on the first Err, just like the loop with ?.

collect::<Result<Vec<_>, _>>() 这个技巧之所以奏效,是因为 Result 实现了 FromIterator。它会在遇到第一个 Err 时短路(short-circuit),就像带有 ? 的循环一样。

Collecting into a HashMap / 收集到 HashMap

#![allow(unused)]
fn main() {
// Imperative / 命令式
let mut index = HashMap::new();
for server in fleet {
    index.insert(server.id.clone(), server);
}

// Functional / 函数式
let index: HashMap<_, _> = fleet.into_iter()
    .map(|s| (s.id.clone(), s))
    .collect();
}

Collecting into a String / 收集到 String

#![allow(unused)]
fn main() {
// Imperative / 命令式
let mut csv = String::new();
for (i, field) in fields.iter().enumerate() {
    if i > 0 { csv.push(','); }
    csv.push_str(field);
}

// Functional / 函数式
let csv = fields.join(",");

// Or for more complex formatting:
// 或者针对更复杂的格式化:
let csv: String = fields.iter()
    .map(|f| format!("\"{f}\""))
    .collect::<Vec<_>>()
    .join(",");
}

When the loop version wins / 何时循环版本胜出

collect() allocates a new collection. If you’re modifying in place, the loop is both clearer and more efficient:

collect() 会分配一个新的集合。如果你是在 原地修改(modifying in place),那么循环不仅更清晰,而且更高效:

#![allow(unused)]
fn main() {
// In-place update — no functional equivalent that's better
// 原地更新 —— 没有比这更好的函数式等价写法
for server in &mut fleet {
    if server.needs_refresh() {
        server.refresh_telemetry()?;
    }
}
}

The functional version would require .iter_mut().for_each(|s| { ... }), which is just a loop with extra syntax.

函数式版本需要使用 .iter_mut().for_each(|s| { ... }),这其实只是多了额外语法的循环而已。



8.6 Pattern Matching as Function Dispatch / 8.6 模式匹配作为函数分发

Rust’s match is a functional construct that most developers use imperatively. Here’s the functional lens:

Rust 的 match 是一个函数式结构,但大多数开发者会以命令式的方式使用它。下面是其函数式的视角:

Match as a lookup table / Match 作为查找表

#![allow(unused)]
fn main() {
// Imperative thinking: "check each case"
// 命令式思维:“检查每一种情况”
fn status_message(code: StatusCode) -> &'static str {
    if code == StatusCode::OK { "Success" }
    else if code == StatusCode::NOT_FOUND { "Not found" }
    else if code == StatusCode::INTERNAL { "Server error" }
    else { "Unknown" }
}

// Functional thinking: "map from domain to range"
// 函数式思维:“从定义域映射到值域”
fn status_message(code: StatusCode) -> &'static str {
    match code {
        StatusCode::OK => "Success",
        StatusCode::NOT_FOUND => "Not found",
        StatusCode::INTERNAL => "Server error",
        _ => "Unknown",
    }
}
}

The match version isn’t just style — the compiler verifies exhaustiveness. Add a new variant, and every match that doesn’t handle it becomes a compile error. The if/else chain silently falls through to the default.

match 版本不仅仅是风格问题 —— 编译器会验证其 完备性(exhaustiveness)。如果增加一个新变体,每一个未处理该变体的 match 都会导致编译错误。而 if/else 链则会静默地进入默认分支。

Match + destructuring as a pipeline / Match + 解构作为流水线

#![allow(unused)]
fn main() {
// Parsing a command — each arm extracts and transforms
// 解析命令 —— 每个分支提取并转换
fn execute(cmd: Command) -> Result<Response, Error> {
    match cmd {
        Command::Get { key } => db.get(&key).map(Response::Value),
        Command::Set { key, value } => db.set(key, value).map(|_| Response::Ok),
        Command::Delete { key } => db.delete(&key).map(|_| Response::Ok),
        Command::Batch(cmds) => cmds.into_iter()
            .map(execute)
            .collect::<Result<Vec<_>, _>>()
            .map(Response::Batch),
    }
}
}

Each arm is an expression that returns the same type. This is pattern matching as function dispatch — the match arms are essentially a function table indexed by the enum variant.

每个分支都是一个返回相同类型的表达式。这就是作为函数分发的模式匹配 —— match 分支本质上是一个由枚举变体索引的函数表。



8.7 Chaining Methods on Custom Types / 8.7 在自定义类型上链式调用方法

The functional style extends beyond standard library types. Builder patterns and fluent APIs are functional programming in disguise:

函数式风格不仅限于标准库类型。构建器模式(Builder patterns)和流式 API(fluent APIs)其实就是披着羊皮的函数式编程:

#![allow(unused)]
fn main() {
// This is a combinator chain over your own type
// 这是一个针对你自己类型的组合器链
let query = QueryBuilder::new("servers")
    .filter("status", Eq, "active")
    .filter("rack", In, &["A1", "A2", "B1"])
    .order_by("temperature", Desc)
    .limit(50)
    .build();
}

The key insight: if your type has methods that take self and return Self (or a transformed type), you’ve built a combinator. The same functional/imperative judgment applies:

核心见解:如果你的类型拥有接受 self 并返回 Self(或转换后的类型)的方法,那么你实际上已经构建了一个组合器。同样的函数式/命令式判断标准同样适用:

#![allow(unused)]
fn main() {
// Good: chainable because each step is a simple transform
// ✅ 好的设计:可链式调用,因为每一步都是简单的转换
let config = Config::default()
    .with_timeout(Duration::from_secs(30))
    .with_retries(3)
    .with_tls(true);

// Bad: chainable but the chain is doing too many unrelated things
// ❌ 不好的设计:虽然可链式调用,但链条做了太多不相关的事情
let result = processor
    .load_data(path)?       // I/O
    .validate()             // Pure / 纯函数
    .transform(rule_set)    // Pure / 纯函数
    .save_to_disk(output)?  // I/O
    .notify_downstream()?;  // Side effect / 副作用

// Better: separate the pure pipeline from the I/O bookends
// 💡 更好的做法:将纯数据流水线与 I/O 操作区分开
let data = load_data(path)?;
let processed = data.validate().transform(rule_set);
save_to_disk(output, &processed)?;
notify_downstream()?;
}

The chain fails when it mixes pure transforms with I/O. The reader can’t tell which calls might fail, which have side effects, and where the actual data transformations happen.

当链条混合了纯转换与 I/O 操作时,它就失效了。读者无法分辨哪些调用可能会失败、哪些带有副作用,以及实际的数据转换发生在何处。



8.8 Performance: They’re the Same / 8.8 性能:它们是一样的

A common misconception: “functional style is slower because of all the closures and allocations.”

一个常见的误解是:“函数式风格更慢,因为它包含大量的闭包和内存分配。”

In Rust, iterator chains compile to the same machine code as hand-written loops. LLVM inlines the closure calls, eliminates the iterator adapter structs, and often produces identical assembly. This is called zero-cost abstraction and it’s not aspirational — it’s measured.

在 Rust 中,迭代器链编译出的机器码与手写的循环是一样的。 LLVM 会内联闭包调用,消除迭代器适配器结构体,并且通常会生成完全相同的汇编代码。这被称为 零成本抽象(zero-cost abstraction),这并非空谈 —— 而是经过实测验证的。

#![allow(unused)]
fn main() {
// These produce identical assembly on release builds:
// 在 release 构建下,这些会生成完全一致的汇编代码:

// Functional / 函数式
let sum: i64 = (0..1000).filter(|n| n % 2 == 0).map(|n| n * n).sum();

// Imperative / 命令式
let mut sum: i64 = 0;
for n in 0..1000 {
    if n % 2 == 0 {
        sum += n * n;
    }
}
}

The one exception: .collect() allocates. If you’re chaining .map().collect().iter().map().collect() with intermediate collections, you’re paying for allocations the loop version avoids. The fix: eliminate intermediate collects by chaining adapters directly, or use a loop if you need the intermediate collections for other reasons.

唯一的例外.collect() 会分配内存。如果你正在链式调用 .map().collect().iter().map().collect() 并在其中使用了中间集合,那么你就要为循环版本原本不需要的内存分配额外付费。解决方法是:通过直接链接适配器来消除中间环节的 collect(),或者如果由于其他原因确实需要中间集合,则改用循环。



8.9 The Taste Test: A Catalog of Transformations / 8.9 风格品鉴:转换目录

Here’s a reference table for the most common “I wrote 6 lines but there’s a one-liner” patterns:

下面是一张参考表,列出了最常见的“我写了 6 行,但其实一行就能搞定”的模式:

Imperative pattern / 命令式模式Functional equivalent / 函数式等价写法When to prefer functional / 何时首选函数式
if let Some(x) = opt { f(x) } else { default }opt.map_or(default, f)Short expressions on both sides / 两边都是简短表达式时
if let Some(x) = opt { Some(g(x)) } else { None }opt.map(g)Always — this is what map is for / 总是首选 —— 这正是 map 的用武之地
if condition { Some(x) } else { None }condition.then_some(x)Always / 总是首选
if condition { Some(compute()) } else { None }condition.then(compute)Always / 总是首选
match opt { Some(x) if pred(x) => Some(x), _ => None }opt.filter(pred)Always / 总是首选
for x in iter { if pred(x) { result.push(f(x)); } }iter.filter(pred).map(f).collect()When the pipeline is readable / 当流水线可读性强时
if a.is_some() && b.is_some() { Some((a?, b?)) }a.zip(b)Always — .zip() is exactly this / 总是首选 —— .zip() 正是为此设计的
match (a, b) { (Some(x), Some(y)) => x + y, _ => 0 }a.zip(b).map(|(x,y)| x + y).unwrap_or(0)Judgment call — depends on complexity / 需要斟酌 —— 取决于逻辑复杂度
iter.map(f).collect::<Vec<_>>()[0]iter.map(f).next().unwrap()Don’t allocate a Vec for one element / 不要为了一个元素分配 Vec
let mut v = vec; v.sort(); v{ let mut v = vec; v.sort(); v }Rust doesn’t have a .sorted() in std / Rust 标准库没有 .sorted()


8.10 The Anti-Patterns / 8.10 反模式

Over-functionalizing: the 5-deep chain nobody can read / 过度函数式:无人能读懂的深层链条

#![allow(unused)]
fn main() {
// This is not elegant. This is a puzzle.
// 这不优雅。这简直是个谜题。
let result = data.iter()
    .filter_map(|x| x.metadata.as_ref())
    .flat_map(|m| m.tags.iter())
    .filter(|t| t.starts_with("env:"))
    .map(|t| t.strip_prefix("env:").unwrap())
    .filter(|env| allowed_envs.contains(env))
    .map(|env| env.to_uppercase())
    .collect::<HashSet<_>>()
    .into_iter()
    .sorted()
    .collect::<Vec<_>>();
}

When a chain exceeds ~4 adapters, break it up with named intermediate variables or extract a helper:

当链条超过约 4 个适配器时,请使用命名的中间变量将其拆分,或者提取出一个辅助函数:

#![allow(unused)]
fn main() {
let env_tags = data.iter()
    .filter_map(|x| x.metadata.as_ref())
    .flat_map(|m| m.tags.iter());

let allowed: Vec<_> = env_tags
    .filter_map(|t| t.strip_prefix("env:"))
    .filter(|env| allowed_envs.contains(env))
    .map(|env| env.to_uppercase())
    .sorted()
    .collect();
}

Under-functionalizing: the C-style loop that Rust has a word for / 缺乏函数式思维:Rust 已有现成术语的 C 风格循环

#![allow(unused)]
fn main() {
// This is just .any() / 这其实就是 .any()
let mut found = false;
for item in &list {
    if item.is_expired() {
        found = true;
        break;
    }
}

// Write this instead / 请改写为:
let found = list.iter().any(|item| item.is_expired());
}
#![allow(unused)]
fn main() {
// This is just .find() / 这其实就是 .find()
let mut target = None;
for server in &fleet {
    if server.id == target_id {
        target = Some(server);
        break;
    }
}

// Write this instead / 请改写为:
let target = fleet.iter().find(|s| s.id == target_id);
}
#![allow(unused)]
fn main() {
// This is just .all() / 这其实就是 .all()
let mut all_healthy = true;
for server in &fleet {
    if !server.is_healthy() {
        all_healthy = false;
        break;
    }
}

// Write this instead / 请改写为:
let all_healthy = fleet.iter().all(|s| s.is_healthy());
}

The standard library has these for a reason. Learn the vocabulary and the patterns become obvious.

标准库提供这些方法是有原因的。一旦掌握了这些“词汇”,相关的模式就会变得一目了然。


Key Takeaways / 核心要点

  • Option and Result are one-element collections. Their combinators (.map(), .and_then(), .unwrap_or_else(), .filter(), .zip()) replace most if let / match boilerplate. / Option 和 Result 是单元素集合。 它们的组合器(.map().and_then().unwrap_or_else().filter().zip())可以替代大多数 if let / match 样板代码。
  • Use bool::then_some() — it replaces if cond { Some(x) } else { None } in every case. / 使用 bool::then_some() —— 它在任何情况下都能替代 if cond { Some(x) } else { None }
  • Iterator chains win for data pipelines — filter/map/collect with zero mutable state. They compile to the same machine code as loops. / 迭代器链在数据流水线中更胜一筹 —— 使用 filter/map/collect 可以实现零可变状态。它们编译出的机器码与循环完全一致。
  • Loops win for multi-output state machines — when you’re building multiple collections, doing I/O in branches, or managing a state transition. / 循环在多输出状态机中更胜一筹 —— 当你需要构建多个集合、在分支中执行 I/O 或管理状态转换时。
  • The ? operator is the best of both worlds — functional error propagation with imperative readability. / ? 运算符兼具两者的优点 —— 既有函数式的错误传播,又有命令式的可读性。
  • Break chains at ~4 adapters — use named intermediates for readability. Over-functionalizing is as bad as under-functionalizing. / 在约 4 个适配器处断开链条 —— 使用命名的中间变量以提高可读性。过度函数式与缺乏函数式思维同样糟糕。
  • Learn the standard-library vocabulary.any(), .all(), .find(), .position(), .sum(), .min_by_key() — each one replaces a multi-line loop with a single intent-revealing call. / 学习标准库“词汇” —— .any().all().find().position().sum().min_by_key() —— 每一个都能用一个揭示意图的调用来替代多行循环。

See also / 另请参阅: Ch 7 for closure mechanics and the Fn trait hierarchy. Ch 10 for error combinator patterns. Ch 15 for fluent API design.

参见 第 7 章 了解闭包机制和 Fn Trait 等级体系。参见 第 10 章 了解错误组合器模式。参见 第 15 章 了解流式 API 设计。


Exercise: Refactoring Imperative to Functional ★★ (~30 min) / 练习:将命令式重构为函数式 ★★(约 30 分钟)

Refactor the following function from imperative to functional style. Then identify one place where the functional version is worse and explain why.

将下列函数从命令式风格重构为函数式风格。然后找出一个函数式版本表现 更差 的地方并说明原因。

#![allow(unused)]
fn main() {
fn summarize_fleet(fleet: &[Server]) -> FleetSummary {
    let mut healthy = Vec::new();
    let mut degraded = Vec::new();
    let mut failed = Vec::new();
    let mut total_power = 0.0;
    let mut max_temp = f64::NEG_INFINITY;

    for server in fleet {
        match server.health_status() {
            Health::Healthy => healthy.push(server.id.clone()),
            Health::Degraded(reason) => degraded.push((server.id.clone(), reason)),
            Health::Failed(err) => failed.push((server.id.clone(), err)),
        }
        total_power += server.power_draw();
        if server.max_temperature() > max_temp {
            max_temp = server.max_temperature();
        }
    }

    FleetSummary {
        healthy,
        degraded,
        failed,
        avg_power: total_power / fleet.len() as f64,
        max_temp,
    }
}
}
🔑 Solution / 参考答案

The total_power and max_temp are clean functional rewrites:

total_powermax_temp 可以非常整洁地改写为函数式风格:

#![allow(unused)]
fn main() {
fn summarize_fleet(fleet: &[Server]) -> FleetSummary {
    let avg_power: f64 = fleet.iter().map(|s| s.power_draw()).sum::<f64>()
        / fleet.len() as f64;

    let max_temp = fleet.iter()
        .map(|s| s.max_temperature())
        .fold(f64::NEG_INFINITY, f64::max);

    // But the three-way partition is BETTER as a loop.
    // Functional version would require three separate passes
    // or an awkward fold with three mutable accumulators.
    // 但这个三路划分(three-way partition)在循环中表现更好。
    // 函数式版本要么需要三次独立的遍历,要么需要一个带有三个可变累加器的笨拙 fold。
    let mut healthy = Vec::new();
    let mut degraded = Vec::new();
    let mut failed = Vec::new();

    for server in fleet {
        match server.health_status() {
            Health::Healthy => healthy.push(server.id.clone()),
            Health::Degraded(reason) => degraded.push((server.id.clone(), reason)),
            Health::Failed(err) => failed.push((server.id.clone(), err)),
        }
    }

    FleetSummary { healthy, degraded, failed, avg_power, max_temp }
}
}

Why the loop is better for the three-way partition: A functional version would either require three .filter().collect() passes (3x iteration), or a .fold() with three mut Vec accumulators inside a tuple — which is just the loop rewritten with worse syntax. The imperative single-pass loop is clearer, more efficient, and easier to extend.

为什么针对三路划分循环更好:函数式版本要么需要三次 .filter().collect() 遍历(3 倍迭代次数),要么需要一个在元组中包含三个 mut Vec 累加器的 .fold() —— 这其实只是用一种更糟糕的语法重写了循环。命令式的单次遍历(single-pass)循环更清晰、更高效,并且更容易扩展。


9. Smart Pointers and Interior Mutability / 9. 智能指针与内部可变性 🟡

What you’ll learn / 你将学到:

  • Box, Rc, Arc for heap allocation and shared ownership / 用于堆分配和共享所有权的 BoxRcArc
  • Weak references for breaking Rc/Arc reference cycles / 用于打破 Rc/Arc 引用循环的 Weak 引用
  • Cell, RefCell, and Cow for interior mutability patterns / 用于内部可变性模式的 CellRefCellCow
  • Pin for self-referential types and ManuallyDrop for lifecycle control / 用于自引用类型的 Pin 和用于生命周期控制的 ManuallyDrop

Box, Rc, Arc — Heap Allocation and Sharing / Box, Rc, Arc —— 堆分配与共享

#![allow(unused)]
fn main() {
// --- Box<T>: Single owner, heap allocation ---
// --- Box<T>:单一所有者,堆分配 ---
// Use when: recursive types, large values, trait objects
// 适用场景:递归类型、大型数值、Trait 对象
let boxed: Box<i32> = Box::new(42);
println!("{}", *boxed); // Deref to i32 / 解引用为 i32

// Recursive type requires Box (otherwise infinite size):
// 递归类型需要 Box(否则其大小将无限大):
enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

// Trait object (dynamic dispatch):
// Trait 对象(动态分发):
let writer: Box<dyn std::io::Write> = Box::new(std::io::stdout());

// --- Rc<T>: Multiple owners, single-threaded ---
// --- Rc<T>:多个所有者,单线程 ---
// Use when: shared ownership within one thread (no Send/Sync)
// 适用场景:单线程内的共享所有权(非 Send/Sync)
use std::rc::Rc;

let a = Rc::new(vec![1, 2, 3]);
let b = Rc::clone(&a); // Increments reference count (NOT deep clone) / 增加引用计数(而非深拷贝)
let c = Rc::clone(&a);
println!("Ref count: {}", Rc::strong_count(&a)); // 3

// All three point to the same Vec. When the last Rc is dropped,
// the Vec is deallocated.
// 三者都指向同一个 Vec。当最后一个 Rc 被丢弃时,该 Vec 内存也会被释放。

// --- Arc<T>: Multiple owners, thread-safe ---
// --- Arc<T>:多个所有者,线程安全 ---
// Use when: shared ownership across threads
// 适用场景:跨线程的共享所有权
use std::sync::Arc;

let shared = Arc::new(String::from("shared data"));
let handles: Vec<_> = (0..5).map(|_| {
    let shared = Arc::clone(&shared);
    std::thread::spawn(move || println!("{shared}"))
}).collect();
for h in handles { h.join().unwrap(); }
}

Weak References — Breaking Reference Cycles / Weak 引用 —— 打破引用循环

Rc and Arc use reference counting, which cannot free cycles (A → B → A). Weak<T> is a non-owning handle that does not increment the strong count:

RcArc 使用引用计数,这无法释放循环引用(如 A → B → A)。 Weak<T> 是一种不具有所有权的句柄,它 不会 增加强引用计数(strong count):

#![allow(unused)]
fn main() {
use std::rc::{Rc, Weak};
use std::cell::RefCell;

struct Node {
    value: i32,
    parent: RefCell<Weak<Node>>,   // does NOT keep parent alive / 不会让父节点保持存活
    children: RefCell<Vec<Rc<Node>>>,
}

let parent = Rc::new(Node {
    value: 0, parent: RefCell::new(Weak::new()), children: RefCell::new(vec![]),
});
let child = Rc::new(Node {
    value: 1, parent: RefCell::new(Rc::downgrade(&parent)), children: RefCell::new(vec![]),
});
parent.children.borrow_mut().push(Rc::clone(&child));

// Access parent from child — returns Option<Rc<Node>>:
// 从子节点访问父节点 —— 返回一个 Option<Rc<Node>>:
if let Some(p) = child.parent.borrow().upgrade() {
    println!("Child's parent value: {}", p.value); // 0
}
// When `parent` is dropped, strong_count → 0, memory is freed.
// `child.parent.upgrade()` would then return `None`.
// 当 `parent` 被丢弃时,强引用计数变为 0,内存将被释放。
// 此时 `child.parent.upgrade()` 将会返回 `None`。
}

Rule of thumb: Use Rc/Arc for ownership edges, Weak for back-references and caches. For thread-safe code, use Arc<T> with sync::Weak<T>.

经验法则:对于具有所有权的连接使用 Rc/Arc,对于背向引用(back-references)和缓存使用 Weak。对于线程安全的代码,请将 Arc<T>sync::Weak<T> 配合使用。

Cell and RefCell — Interior Mutability / Cell 与 RefCell —— 内部可变性

Sometimes you need to mutate data behind a shared (&) reference. Rust provides interior mutability with runtime borrow checking:

有时你需要修改隐藏在共享(&)引用之后的数据。Rust 通过运行时借用检查提供了 内部可变性(interior mutability)

#![allow(unused)]
fn main() {
use std::cell::{Cell, RefCell};

// --- Cell<T>: Copy-based interior mutability ---
// --- Cell<T>:基于复制的内部可变性 ---
// Only for Copy types (or types you swap in/out)
// 仅适用于 Copy 类型(或者你进行 swap/replace 操作的类型)
struct Counter {
    count: Cell<u32>,
}

impl Counter {
    fn new() -> Self { Counter { count: Cell::new(0) } }

    fn increment(&self) { // &self, not &mut self!
        self.count.set(self.count.get() + 1);
    }

    fn value(&self) -> u32 { self.count.get() }
}

// --- RefCell<T>: Runtime borrow checking ---
// --- RefCell<T>:运行时借用检查 ---
// Panics if you violate borrow rules at runtime
// 如果在运行时违反借用规则,将会触发 Panic
struct Cache {
    data: RefCell<Vec<String>>,
}

impl Cache {
    fn new() -> Self { Cache { data: RefCell::new(Vec::new()) } }

    fn add(&self, item: String) { // &self — looks immutable from outside
        // &self —— 从外部看是不可变的
        self.data.borrow_mut().push(item); // Runtime-checked &mut / 运行时检查的 &mut
    }

    fn get_all(&self) -> Vec<String> {
        self.data.borrow().clone() // Runtime-checked & / 运行时检查的 &
    }

    fn bad_example(&self) {
        let _guard1 = self.data.borrow();
        // let _guard2 = self.data.borrow_mut();
        // ❌ PANICS at runtime — can't have &mut while & exists
        // ❌ 运行时 Panic —— & 引用存在时不能拥有 &mut
    }
}
}

Cell vs RefCell: Cell never panics (it copies/swaps values) but only works with Copy types or via swap()/replace(). RefCell works with any type but panics on double-mutable-borrow. Neither is Sync — for multithreaded use, see Mutex/RwLock.

Cell vs RefCellCell 永远不会触发 Panic(它通过复制或交换值来工作),但仅适用于 Copy 类型,或者通过 swap()/replace() 使用。RefCell 适用于任何类型,但在发生“双重可变借用”时会触发 Panic。两者都不是 Sync —— 对于多线程使用,请参考 Mutex/RwLock

Cow — Clone on Write / Cow —— 写时克隆

Cow (Clone on Write) holds either a borrowed or owned value. It clones only when mutation is needed:

Cow (Clone on Write,写时克隆) 可以持有一个借用的值或一个拥有的值。它 仅在 需要修改时才进行克隆:

use std::borrow::Cow;

// Avoids allocating when no modification is needed:
// 当不需要修改时,避免内存分配:
fn normalize(input: &str) -> Cow<'_, str> {
    if input.contains('\t') {
        // Only allocate if tabs need replacing
        // 只有当制表符需要替换时才进行分配
        Cow::Owned(input.replace('\t', "    "))
    } else {
        // No allocation — just return a reference
        // 无需分配 —— 仅返回一个引用
        Cow::Borrowed(input)
    }
}

fn main() {
    let clean = "no tabs here";
    let dirty = "tabs\there";

    let r1 = normalize(clean); // Cow::Borrowed — zero allocation / 零分配
    let r2 = normalize(dirty); // Cow::Owned — allocated new String / 分配了新的 String

    println!("{r1}");
    println!("{r2}");
}

// Also useful for function parameters that MIGHT need ownership:
// 对于可能需要所有权的函数参数同样有用:
fn process(data: Cow<'_, [u8]>) {
    // Can read data without copying
    // 可以读取数据而无需拷贝
    println!("Length: {}", data.len());
    // If we need to mutate, Cow auto-clones:
    // 如果我们需要修改,Cow 会自动克隆:
    let mut owned = data.into_owned(); // Clone only if Borrowed / 仅在借用时克隆
    owned.push(0xFF);
}

Cow<'_, [u8]> for Binary Data / 用于二进制数据的 Cow<'_, [u8]>

Cow is especially useful for byte-oriented APIs where data may or may not need transformation (checksum insertion, padding, escaping). This avoids allocating a Vec<u8> on the common fast path:

Cow 对于面向字节的 API 特别有用,因为在这些场景下,数据可能需要也可能不需要进行转换(如校验和插入、填充、转义)。这在常见的“快速路径”中避免了分配 Vec<u8> 的开销:

#![allow(unused)]
fn main() {
use std::borrow::Cow;

/// Pads a frame to a minimum length, borrowing when no padding is needed.
/// 将帧填充到最小长度,如果无需填充,则使用借用。
fn pad_frame(frame: &[u8], min_len: usize) -> Cow<'_, [u8]> {
    if frame.len() >= min_len {
        Cow::Borrowed(frame)  // Already long enough — zero allocation / 已经足够长 —— 零分配
    } else {
        let mut padded = frame.to_vec();
        padded.resize(min_len, 0x00);
        Cow::Owned(padded)    // Allocate only when padding is required / 仅在需要填充时才分配
    }
}

let short = pad_frame(&[0xDE, 0xAD], 8);    // Owned — padded to 8 bytes / Owned —— 已填充至 8 字节
let long  = pad_frame(&[0; 64], 8);          // Borrowed — already ≥ 8 / Borrowed —— 已经 ≥ 8
}

Tip: Combine Cow<[u8]> with bytes::Bytes (Ch10) when you need reference-counted sharing of potentially-transformed buffers.

提示:当你需要对可能经过转换的缓冲区进行引用计数共享时,可以将 Cow<[u8]>bytes::Bytes(见第 10 章)结合使用。

When to Use Which Pointer / 该使用哪种指针

Pointer / 指针Owner Count / 所有者数量Thread-Safe / 线程安全Mutability / 可变性Use When / 适用场景
Box<T>1✅ (if T: Send)Via &mutHeap allocation, trait objects, recursive types / 堆分配、Trait 对象、递归类型
Rc<T>NNone (wrap in Cell/RefCell) / 无(需包裹在 Cell/RefCell 中)Shared ownership, single thread, graphs/trees / 共享所有权、单线程、图/树结构
Arc<T>NNone (wrap in Mutex/RwLock) / 无(需包裹在 Mutex/RwLock 中)Shared ownership across threads / 跨线程的共享所有权
Cell<T>.get() / .set()Interior mutability for Copy types / 适用于 Copy 类型的内部可变性
RefCell<T>.borrow() / .borrow_mut()Interior mutability for any type, single thread / 适用于任何类型的内部可变性(单线程)
Cow<'_, T>0 or 1✅ (if T: Send)Clone on write / 写时克隆Avoid allocation when data is often unchanged / 当数据通常保持不变时避免分配

Pin and Self-Referential Types / Pin 与自引用类型

Pin<P> prevents a value from being moved in memory. This is essential for self-referential types — structs that contain a pointer to their own data — and for Futures, which may hold references across .await points.

Pin<P> 防止一个值在内存中被移动。这对于 自引用类型(self-referential types)(即包含指向自身数据指针的结构体)以及 Future(可能在 .await 点跨越引用)至关重要。

use std::pin::Pin;
use std::marker::PhantomPinned;

// A self-referential struct (simplified):
// 一个自引用结构体(简化版):
struct SelfRef {
    data: String,
    ptr: *const String, // Points to `data` above / 指向其上方的 `data`
    _pin: PhantomPinned, // Opts out of Unpin — can't be moved / 排除 Unpin —— 不克被移动
}

impl SelfRef {
    fn new(s: &str) -> Pin<Box<Self>> {
        let val = SelfRef {
            data: s.to_string(),
            ptr: std::ptr::null(),
            _pin: PhantomPinned,
        };
        let mut boxed = Box::pin(val);

        // SAFETY: we don't move the data after setting the pointer
        // 安全性:在设置指针后,我们不再移动数据
        let self_ptr: *const String = &boxed.data;
        unsafe {
            let mut_ref = Pin::as_mut(&mut boxed);
            Pin::get_unchecked_mut(mut_ref).ptr = self_ptr;
        }
        boxed
    }

    fn data(&self) -> &str {
        &self.data
    }

    fn ptr_data(&self) -> &str {
        // SAFETY: ptr was set to point to self.data while pinned
        // 安全性:ptr 在 pinned 时已设置为指向 self.data
        unsafe { &*self.ptr }
    }
}

fn main() {
    let pinned = SelfRef::new("hello");
    assert_eq!(pinned.data(), pinned.ptr_data()); // Both "hello"
    // std::mem::swap would invalidate ptr — but Pin prevents it
    // std::mem::swap 会使指针失效 —— 但 Pin 阻止了这种情况
}

Key concepts / 核心概念:

Concept / 概念Meaning / 含义
Unpin (auto-trait)“Moving this type is safe.” Most types are Unpin by default. / “移动此类型是安全的。” 大多数类型默认都是 Unpin 的。
!Unpin / PhantomPinned“I have internal pointers — don’t move me.” / “我有内部指针 —— 别移动我。”
Pin<&mut T>A mutable reference that guarantees T won’t move / 保证 T 不会被移动的可变引用
Pin<Box<T>>An owned, heap-pinned value / 拥有的、堆上固定的值

Why this matters for async / 为什么这对于异步很重要:

Every async fn desugars to a Future that may hold references across .await points — making it self-referential. The async runtime uses Pin<&mut Future> to guarantee the future isn’t moved once polled.

每一个 async fn 都会反糖化(desugar)为一个 Future,它可能在 .await 点跨越引用 —— 使其成为自引用的。异步运行时使用 Pin<&mut Future> 来保证 Future 在第一次轮询(poll)后不会被移动。

#![allow(unused)]
fn main() {
// When you write: / 当你写下:
async fn fetch(url: &str) -> String {
    let response = http_get(url).await; // reference held across await / 引用跨越了 await
    response.text().await
}

// The compiler generates a state machine struct that is !Unpin,
// and the runtime pins it before calling Future::poll().
// 编译器生成一个 !Unpin 的状态机结构体,运行时在调用 Future::poll() 之前会将其固定。
}

When to care about Pin: (1) Implementing Future manually, (2) writing async runtimes or combinators, (3) any struct with self-referential pointers. For normal application code, async/await handles pinning transparently. See the companion Async Rust Training for deeper coverage.

何时需要关心 Pin:(1) 手动实现 Future;(2) 编写异步运行时或组合器;(3) 任何带有自引用指针的结构体。对于普通的应用程序代码,async/await 会透明地处理固定。参见配套的《Async Rust 培训》以获得更深入的讲解。

Crate alternatives: For self-referential structs without manual Pin, consider ouroboros or self_cell — they generate safe wrappers with correct pinning and drop semantics.

Crate 替代方案:对于无需手动 Pin 的自引用结构体,可以考虑使用 ouroborosself_cell —— 它们会生成带有正确固定和丢弃语义的安全包装器。

Pin Projections — Structural Pinning / Pin 投影 —— 结构化固定

When you have a Pin<&mut MyStruct>, you often need to access individual fields. Pin projection is the pattern for safely going from Pin<&mut Struct> to Pin<&mut Field> (for pinned fields) or &mut Field (for unpinned fields).

当你拥有一个 Pin<&mut MyStruct> 时,你经常需要访问其中的单个字段。Pin 投影(Pin projection) 是一种安全地从 Pin<&mut Struct> 转换到 Pin<&mut Field>(用于固定字段)或 &mut Field(用于未固定字段)的模式。

The Problem: Field Access on Pinned Types / 问题所在:访问固定类型的字段

#![allow(unused)]
fn main() {
use std::pin::Pin;
use std::marker::PhantomPinned;

struct MyFuture {
    data: String,              // Regular field — safe to move / 普通字段 —— 可以安全移动
    state: InternalState,      // Self-referential — must stay pinned / 自引用 —— 必须保持固定
    _pin: PhantomPinned,
}

enum InternalState {
    Waiting { ptr: *const String }, // Points to `data` — self-referential / 指向 `data` —— 自引用
    Done,
}

// Given `Pin<&mut MyFuture>`, how do you access `data` and `state`?
// You CAN'T just do `pinned.data` — the compiler won't let you
// get a &mut to a field of a pinned value without unsafe.
// 给定 `Pin<&mut MyFuture>`,你该如何访问 `data` 和 `state`?
// 你不能直接写 `pinned.data` —— 编译器不允许你在不使用 unsafe 的情况下获取固定值的字段的可变引用(&mut)。
}

Manual Pin Projection (unsafe) / 手动 Pin 投影(unsafe)

#![allow(unused)]
fn main() {
impl MyFuture {
    // Project to `data` — this field is structurally unpinned (safe to move)
    // 投影到 `data` —— 该字段在结构上未被固定(可以安全移动)
    fn data(self: Pin<&mut Self>) -> &mut String {
        // SAFETY: `data` is not structurally pinned. Moving `data` alone
        // doesn't move the whole struct, so Pin's guarantee is preserved.
        // 安全性:`data` 在结构上未被固定。仅移动 `data` 不会移动整个结构体,
        // 因此保留了 Pin 的保证。
        unsafe { &mut self.get_unchecked_mut().data }
    }

    // Project to `state` — this field IS structurally pinned
    // 投影到 `state` —— 该字段在结构上已被固定
    fn state(self: Pin<&mut Self>) -> Pin<&mut InternalState> {
        // SAFETY: `state` is structurally pinned — we maintain the
        // pin invariant by returning Pin<&mut InternalState>.
        // 安全性:`state` 在结构上已被固定 —— 我们通过返回 
        // Pin<&mut InternalState> 来维护 Pin 的不变性。
        unsafe { Pin::new_unchecked(&mut self.get_unchecked_mut().state) }
    }
}
}

Structural pinning rules / 结构化固定规则 — a field is “structurally pinned” if / 如果满足以下条件,则该字段是“结构化固定的”:

  1. Moving/swapping that field alone could invalidate a self-reference / 仅移动/交换该字段本身就可能使自引用失效
  2. The struct’s Drop impl must not move the field / 结构体的 Drop 实现绝不能移动该字段
  3. The struct must be !Unpin (enforced by PhantomPinned or a !Unpin field) / 结构体必须是 !Unpin 的(通过 PhantomPinned!Unpin 字段强制执行)

pin-project — Safe Pin Projections (Zero Unsafe) / pin-project —— 安全的 Pin 投影(零 Unsafe)

The pin-project crate generates provably correct projections at compile time, eliminating the need for manual unsafe:

pin-project crate 在编译时生成可证明正确的投影,从而消除了手动使用 unsafe 的必要:

#![allow(unused)]
fn main() {
use pin_project::pin_project;
use std::pin::Pin;
use std::future::Future;
use std::task::{Context, Poll};

#[pin_project]                   // <-- Generates projection methods / 生成投影方法
struct TimedFuture<F: Future> {
    #[pin]                       // <-- Structurally pinned (it's a Future) / 结构化固定(因为它是 Future)
    inner: F,
    started_at: std::time::Instant, // NOT pinned — plain data / 未固定 —— 普通数据
}

impl<F: Future> Future for TimedFuture<F> {
    type Output = (F::Output, std::time::Duration);

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let this = self.project();  // Safe! Generated by pin_project / 安全!由 pin_project 生成
        //   this.inner   : Pin<&mut F>              — pinned field / 固定字段
        //   this.started_at : &mut std::time::Instant — unpinned field / 未固定字段

        match this.inner.poll(cx) {
            Poll::Ready(output) => {
                let elapsed = this.started_at.elapsed();
                Poll::Ready((output, elapsed))
            }
            Poll::Pending => Poll::Pending,
        }
    }
}
}

pin-project vs Manual Projection / pin-project 对比手动投影

Aspect / 维度Manual (unsafe) / 手动pin-project
Safety / 安全性You prove invariants / 由你证明不变性Compiler-verified / 编译器验证
Boilerplate / 样板代码Low (but error-prone) / 较少(但易错)Zero — derive macro / 零样板 —— 通过派生宏
Drop interaction / 与 Drop 的交互Must not move pinned fields / 禁止移动固定字段Enforced: #[pinned_drop] / 强制执行
Compile-time cost / 编译开销None / 无Proc-macro expansion / 过程宏展开
Use case / 适用场景Primitives, no_std / 原型组件、no_stdApplication / library code / 应用或库代码

#[pinned_drop] — Drop for Pinned Types / 用于固定类型的 Drop

When a type has #[pin] fields, pin-project requires #[pinned_drop] instead of a regular Drop impl to prevent accidentally moving pinned fields:

当一个类型包含 #[pin] 字段时,pin-project 要求使用 #[pinned_drop] 而非普通的 Drop 实现,以防止意外移动固定字段:

#![allow(unused)]
fn main() {
use pin_project::{pin_project, pinned_drop};
use std::pin::Pin;

#[pin_project(PinnedDrop)]
struct Connection<F> {
    #[pin]
    future: F,
    buffer: Vec<u8>,  // Not pinned — can be moved in drop / 未固定 —— 可以在 drop 中被移动
}

#[pinned_drop]
impl<F> PinnedDrop for Connection<F> {
    fn drop(self: Pin<&mut Self>) {
        let this = self.project();
        // `this.future` is Pin<&mut F> — can't be moved, only dropped in place
        // `this.future` 是 Pin<&mut F> —— 不能被移动,只能原地销毁
        // `this.buffer` is &mut Vec<u8> — can be drained, cleared, etc.
        // `this.buffer` 是 &mut Vec<u8> —— 可以进行 drain、clear 等操作
        this.buffer.clear();
        println!("Connection dropped, buffer cleared");
    }
}
}

When Pin Projections Matter in Practice / 何时需要在实践中考虑 Pin 投影

graph TD
    A["Do you implement Future manually? / \n你是否在手动实现 Future?"] -->|Yes / 是| B["Does the future hold references\nacross .await points? / \n该 Future 是否在 .await 点\n跨越了引用?"]
    A -->|No / 否| C["async/await handles Pin for you\n✅ No projections needed / \nasync/await 会为你处理 Pin\n✅ 无需投影"]
    B -->|Yes / 是| D["Use #[pin_project] on your\nfuture struct / \n在你的 Future 结构体上\n使用 #[pin_project]"]
    B -->|No / 否| E["Your future is Unpin\n✅ No projections needed / \n你的 Future 是 Unpin 的\n✅ 无需投影"]
    D --> F["Mark futures/streams as #[pin]\nLeave data fields unpinned / \n将 Future/Stream 标记为 #[pin]\n让数据字段保持未固定状态"]
    
    style C fill:#91e5a3,color:#000
    style E fill:#91e5a3,color:#000
    style D fill:#ffa07a,color:#000
    style F fill:#ffa07a,color:#000

Rule of thumb: If you’re wrapping another Future or Stream, use pin-project. If you’re writing application code with async/await, you’ll never need pin projections directly. See the companion Async Rust Training for async combinator patterns that use pin projections.

经验法则:如果你正在包装另一个 FutureStream,请使用 pin-project。如果你正在使用 async/await 编写应用程序代码,你永远不需要直接处理 Pin 投影。参见配套的《Async Rust 培训》了解使用 Pin 投影的异步组合器模式。

Drop Ordering and ManuallyDrop / 丢弃顺序与 ManuallyDrop

Rust’s drop order is deterministic but has rules worth knowing:

Rust 的丢弃(drop)顺序是确定的,但有一些规则值得了解:

Drop Order Rules / 丢弃顺序规则

struct Label(&'static str);

impl Drop for Label {
    fn drop(&mut self) { println!("Dropping {}", self.0); }
}

fn main() {
    let a = Label("first");   // Declared first / 首先声明
    let b = Label("second");  // 其次声明
    let c = Label("third");   // 最后声明
}
// Output: / 输出:
//   Dropping third    ← locals drop in REVERSE declaration order
//   Dropping third    ← 局部变量按声明的逆序丢弃
//   Dropping second
//   Dropping first

The three rules / 三大规则:

What / 对象Drop Order / 丢弃顺序Rationale / 原理
Local variables / 局部变量Reverse declaration order / 声明的逆序Later variables might reference earlier ones / 后声明的变量可能引用先声明的变量
Struct fields / 结构体字段Declaration order (top to bottom) / 声明顺序(从上到下)Matches construction order (stable since Rust 1.0, guaranteed by RFC 1857) / 符合构建顺序(自 Rust 1.0 起稳定,由 RFC 1857 保证)
Tuple elements / 元组元素Declaration order (left to right) / 声明顺序(从左到右)(a, b, c) → drop a, then b, then c / (a, b, c) → 先丢弃 a,然后 b,最后 c
#![allow(unused)]
fn main() {
struct Server {
    listener: Label,  // Dropped 1st / 第一个丢弃
    handler: Label,   // Dropped 2nd / 第二个丢弃
    logger: Label,    // Dropped 3rd / 第三个丢弃
}
// Fields drop top-to-bottom (declaration order).
// This matters when fields reference each other or hold resources.
// 字段按从上到下的顺序(声明顺序)丢弃。
// 当字段之间存在引用关系或持有资源时,这一点至关重要。
}

Practical impact / 实际影响:如果你的结构体包含一个 JoinHandle 和一个 Sender,字段顺序将决定谁先被丢弃。如果线程正在从通道读取数据,请先丢弃 Sender(关闭通道)以使线程退出,然后再加入(join)句柄。在结构体中,应将 Sender 放在 JoinHandle 之上。

ManuallyDrop<T> — Suppressing Automatic Drop / ManuallyDrop<T> —— 抑制自动丢弃

ManuallyDrop<T> wraps a value and prevents its destructor from running automatically. You take responsibility for dropping it (or intentionally leaking it):

ManuallyDrop<T> 包装一个值并防止其析构函数自动运行。你将承担起手动丢弃(或有意让其泄漏)它的责任:

#![allow(unused)]
fn main() {
use std::mem::ManuallyDrop;

// Use case 1: Prevent double-free in unsafe code
// 使用场景 1:防止 unsafe 代码中的二次释放(double-free)
struct TwoPhaseBuffer {
    // We need to drop the Vec ourselves to control timing
    // 我们需要自行丢弃 Vec 以控制时机
    data: ManuallyDrop<Vec<u8>>,
    committed: bool,
}

impl TwoPhaseBuffer {
    fn new(capacity: usize) -> Self {
        TwoPhaseBuffer {
            data: ManuallyDrop::new(Vec::with_capacity(capacity)),
            committed: false,
        }
    }

    fn write(&mut self, bytes: &[u8]) {
        self.data.extend_from_slice(bytes);
    }

    fn commit(&mut self) {
        self.committed = true;
        println!("Committed {} bytes", self.data.len());
    }
}

impl Drop for TwoPhaseBuffer {
    fn drop(&mut self) {
        if !self.committed {
            println!("Rolling back — dropping uncommitted data");
        }
        // SAFETY: data is always valid here; we only drop it once.
        // 安全性:data 在此处始终有效;且我们只丢弃它一次。
        unsafe { ManuallyDrop::drop(&mut self.data); }
    }
}
}
#![allow(unused)]
fn main() {
// Use case 2: Intentional leak (e.g., global singletons)
// 使用场景 2:故意泄漏(例如全局单例)
fn leaked_string() -> &'static str {
    // Box::leak() is the idiomatic way to create a &'static reference:
    // Box::leak() 是创建 &'static 引用的惯用方式:
    let s = String::from("lives forever");
    Box::leak(s.into_boxed_str())
    // ⚠️ This is a controlled memory leak. The String's heap allocation
    // ⚠️ 这是一个受控的内存泄漏。String 的堆分配永远不会被释放。
    // is never freed. Only use for long-lived singletons.
    // 仅用于长生命周期的单例。
}

// ManuallyDrop alternative (requires unsafe):
// ManuallyDrop 替代方案(需要 unsafe):
// ⚠️ Prefer Box::leak() above — this is shown only to illustrate
// ⚠️ 优先使用上方的 Box::leak() —— 此处仅为演示 ManuallyDrop 的语义
// ManuallyDrop semantics (suppressing Drop while the heap data survives).
fn leaked_string_manual() -> &'static str {
    use std::mem::ManuallyDrop;
    let md = ManuallyDrop::new(String::from("lives forever"));
    // SAFETY: ManuallyDrop prevents deallocation; the heap data lives
    // forever, so a 'static reference is valid.
    // 安全性:ManuallyDrop 阻止了释放;堆数据将永远存在,因此 'static 引用是有效的。
    unsafe { &*(md.as_str() as *const str) }
}
}
#![allow(unused)]
fn main() {
// Use case 3: Union fields (only one variant is valid at a time)
// 使用场景 3:Union 字段(一次只有一个变体有效)
use std::mem::ManuallyDrop;

union IntOrString {
    i: u64,
    s: ManuallyDrop<String>,
    // String has a Drop impl, so it MUST be wrapped in ManuallyDrop
    // inside a union — the compiler can't know which field is active.
    // String 具有 Drop 实现,因此在 union 中它必须被包装在 ManuallyDrop 中 
    // —— 因为编译器无法知道哪个字段是激活状态。
}

// No automatic Drop — the code that constructs IntOrString must also
// handle cleanup. If the String variant is active, call:
// 没有自动丢弃 —— 构造 IntOrString 的代码也必须处理清理工作。
// 如果 String 变体是激活的,请调用:
//   unsafe { ManuallyDrop::drop(&mut value.s); }
// without a Drop impl, the union is simply leaked (no UB, just a leak).
// 如果没有 Drop 实现,union 就会被简单地泄漏(不是未定义行为,只是泄漏)。
}

ManuallyDrop vs mem::forget / ManuallyDrop 对比 mem::forget

ManuallyDrop<T>mem::forget(value)
When / 何时使用Wrap at construction / 在构造时包装Consume later / 在稍后消耗
Access inner / 访问内部&*md / &mut *mdValue is gone / 值已消失
Drop later / 稍后丢弃ManuallyDrop::drop(&mut md)Not possible / 不可能
Use case / 适用场景Fine-grained lifecycle control / 细粒度生命周期控制Fire-and-forget leak / “发完即忘”式的泄漏

Rule: Use ManuallyDrop in unsafe abstractions where you need to control exactly when a destructor runs. In safe application code, you almost never need it — Rust’s automatic drop ordering handles things correctly.

法则:在需要 精确 控制析构函数何时运行的 unsafe 抽象中使用 ManuallyDrop。在安全的应用程序代码中,你几乎永远不需要它 —— Rust 的自动丢弃顺序会正确处理一切。

Key Takeaways — Smart Pointers / 核心要点 —— 智能指针

  • Box for single ownership on heap; Rc/Arc for shared ownership (single-/multi-threaded) / Box 用于堆上的单一所有权;Rc/Arc 用于(单线程/多线程)共享所有权
  • Cell/RefCell provide interior mutability; RefCell panics on violations at runtime / Cell/RefCell 提供内部可变性;RefCell 在运行时违反规则时会触发 Panic
  • Cow avoids allocation on the common path; Pin prevents moves for self-referential types / Cow 在常见路径下避免内存分配;Pin 防止自引用类型的移动
  • Drop order: fields drop in declaration order (RFC 1857); locals drop in reverse declaration order / 丢弃顺序:字段按声明顺序丢弃(RFC 1857);局部变量按声明的逆序丢弃

See also / 另请参阅: Ch 6 — Concurrency for Arc + Mutex patterns. Ch 4 — PhantomData for PhantomData used with smart pointers.

参见 第 6 章 —— 并发 了解 Arc + Mutex 模式。参见 第 4 章 —— PhantomData 了解在智能指针中使用 PhantomData。

graph TD
    Box["Box&lt;T&gt;<br>Single owner, heap / \n单一所有者,堆分配"] --> Heap["Heap allocation / 堆分配"]
    Rc["Rc&lt;T&gt;<br>Shared, single-thread / \n共享,单线程"] --> Heap
    Arc["Arc&lt;T&gt;<br>Shared, multi-thread / \n共享,多线程"] --> Heap

    Rc --> Weak1["Weak&lt;T&gt;<br>Non-owning / \n无所有权句柄"]
    Arc --> Weak2["Weak&lt;T&gt;<br>Non-owning / \n无所有权句柄"]

    Cell["Cell&lt;T&gt;<br>Copy interior mut / \n基于复制的内部可变性"] --> Stack["Stack / interior / \n栈 / 内部"]
    RefCell["RefCell&lt;T&gt;<br>Runtime borrow check / \n运行时借用检查"] --> Stack
    Cow["Cow&lt;T&gt;<br>Clone on write / \n写时克隆"] --> Stack

    style Box fill:#d4efdf,stroke:#27ae60,color:#000
    style Rc fill:#e8f4f8,stroke:#2980b9,color:#000
    style Arc fill:#e8f4f8,stroke:#2980b9,color:#000
    style Weak1 fill:#fef9e7,stroke:#f1c40f,color:#000
    style Weak2 fill:#fef9e7,stroke:#f1c40f,color:#000
    style Cell fill:#fdebd0,stroke:#e67e22,color:#000
    style RefCell fill:#fdebd0,stroke:#e67e22,color:#000
    style Cow fill:#fdebd0,stroke:#e67e22,color:#000
    style Heap fill:#f5f5f5,stroke:#999,color:#000
    style Stack fill:#f5f5f5,stroke:#999,color:#000

Exercise: Reference-Counted Graph ★★ (~30 min) / 练习:引用计数图 ★★(约 30 分钟)

Build a directed graph using Rc<RefCell<Node>> where each node has a name and a list of children. Create a cycle (A → B → C → A) using Weak to break the back-edge. Verify no memory leak with Rc::strong_count.

使用 Rc<RefCell<Node>> 构建一个有向图,其中每个节点都有一个名称和子节点列表。创建一个循环(A → B → C → A),并使用 Weak 来打破背向边(back-edge)。通过 Rc::strong_count 验证没有内存泄漏。

🔑 Solution / 参考答案
use std::cell::RefCell;
use std::rc::{Rc, Weak};

struct Node {
    name: String,
    children: Vec<Rc<RefCell<Node>>>,
    back_ref: Option<Weak<RefCell<Node>>>,
}

impl Node {
    fn new(name: &str) -> Rc<RefCell<Self>> {
        Rc::new(RefCell::new(Node {
            name: name.to_string(),
            children: Vec::new(),
            back_ref: None,
        }))
    }
}

impl Drop for Node {
    fn drop(&mut self) {
        println!("Dropping {}", self.name);
    }
}

fn main() {
    let a = Node::new("A");
    let b = Node::new("B");
    let c = Node::new("C");

    // A → B → C, with C back-referencing A via Weak
    // A → B → C,其中 C 通过 Weak 背向引用 A
    a.borrow_mut().children.push(Rc::clone(&b));
    b.borrow_mut().children.push(Rc::clone(&c));
    c.borrow_mut().back_ref = Some(Rc::downgrade(&a)); // Weak ref! / Weak 引用!

    println!("A strong count: {}", Rc::strong_count(&a)); // 1 (only `a` binding) / 1(只有 `a` 绑定)
    println!("B strong count: {}", Rc::strong_count(&b)); // 2 (b + A's child) / 2(变量 b + A 的子节点)
    println!("C strong count: {}", Rc::strong_count(&c)); // 2 (c + B's child) / 2(变量 c + B 的子节点)

    // Upgrade the weak ref to prove it works:
    // 升级 Weak 引用以证明其有效:
    let c_ref = c.borrow();
    if let Some(back) = &c_ref.back_ref {
        if let Some(a_ref) = back.upgrade() {
            println!("C points back to: {}", a_ref.borrow().name);
        }
    }
    // When a, b, c go out of scope, all Nodes drop (no cycle leak!)
    // 当 a, b, c 离开作用域时,所有节点都会被丢弃(没有循环泄露!)
}

10. Error Handling Patterns / 10. 错误处理模式 🟢

What you’ll learn / 你将学到:

  • When to use thiserror (libraries) vs anyhow (applications) / 何时使用 thiserror(库)与 anyhow(应用程序)
  • Error conversion chains with #[from] and .context() wrappers / 使用 #[from].context() 包装器的错误转换链
  • How the ? operator desugars and works in main() / ? 运算符是如何反糖化(desugar)以及如何在 main() 中工作的
  • When to panic vs return errors, and catch_unwind for FFI boundaries / 何时触发 Panic 与何时返回错误,以及用于 FFI 边界的 catch_unwind

thiserror vs anyhow — Library vs Application / thiserror 与 anyhow —— 库与应用程序

Rust error handling centers on the Result<T, E> type. Two crates dominate:

Rust 的错误处理围绕 Result<T, E> 类型展开。该领域有两款主流的 crate:

// --- thiserror: For LIBRARIES ---
// --- thiserror:适用于“库” ---
// Generates Display, Error, and From impls via derive macros
// 通过派生宏生成 Display、Error 和 From 实现
use thiserror::Error;

#[derive(Error, Debug)]
pub enum DatabaseError {
    #[error("connection failed: {0}")]
    ConnectionFailed(String),

    #[error("query error: {source}")]
    QueryError {
        #[source]
        source: sqlx::Error,
    },

    #[error("record not found: table={table} id={id}")]
    NotFound { table: String, id: u64 },

    #[error(transparent)] // Delegate Display to the inner error / 将 Display 委托给内部错误
    Io(#[from] std::io::Error), // Auto-generates From<io::Error> / 自动生成 From<io::Error>
}

// --- anyhow: For APPLICATIONS ---
// --- anyhow:适用于“应用程序” ---
// Dynamic error type — great for top-level code where you just want errors to propagate
// 动态错误类型 —— 非常适合这种仅仅需要传播错误的顶层代码
use anyhow::{Context, Result, bail, ensure};

fn read_config(path: &str) -> Result<Config> {
    let content = std::fs::read_to_string(path)
        .with_context(|| format!("failed to read config from {path}"))?;

    let config: Config = serde_json::from_str(&content)
        .context("failed to parse config JSON")?;

    ensure!(config.port > 0, "port must be positive, got {}", config.port);

    Ok(config)
}

fn main() -> Result<()> {
    let config = read_config("server.toml")?;

    if config.name.is_empty() {
        bail!("server name cannot be empty"); // Return Err immediately / 立即返回 Err
    }

    Ok(())
}

When to use which / 该如何选择:

thiserroranyhow
Use in / 用于Libraries, shared crates / 库、共享 crateApplications, binaries / 应用程序、二进制文件
Error types / 错误类型Concrete enums — callers can match / 具体的枚举 —— 调用者可以进行 matchanyhow::Error — opaque / anyhow::Error —— 不透明的(Opaque)
Effort / 开发工作量Define your error enum / 定义你的错误枚举Just use Result<T> / 直接使用 Result<T> 即可
Downcasting / 向下转型Not needed — pattern match / 不需要 —— 利用模式匹配error.downcast_ref::<MyError>()

Error Conversion Chains (#[from]) / 错误转换链 (#[from])

use thiserror::Error;

#[derive(Error, Debug)]
enum AppError {
    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("JSON error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("HTTP error: {0}")]
    Http(#[from] reqwest::Error),
}

// Now ? automatically converts:
// 此时 ? 会自动进行转换:
fn fetch_and_parse(url: &str) -> Result<Config, AppError> {
    let body = reqwest::blocking::get(url)?.text()?;  // reqwest::Error → AppError::Http
    let config: Config = serde_json::from_str(&body)?; // serde_json::Error → AppError::Json
    Ok(config)
}

Context and Error Wrapping / 上下文与错误包装

Add human-readable context to errors without losing the original:

在不丢失原始错误的情况下,为错误添加人类可读的上下文:

use anyhow::{Context, Result};

fn process_file(path: &str) -> Result<Data> {
    let content = std::fs::read_to_string(path)
        .with_context(|| format!("failed to read {path}"))?;

    let data = parse_content(&content)
        .with_context(|| format!("failed to parse {path}"))?;

    validate(&data)
        .context("validation failed")?;

    Ok(data)
}

// Error output: / 错误输出:
// Error: validation failed
//
// Caused by:
//    0: failed to parse config.json
//    1: expected ',' at line 5 column 12

The ? Operator in Depth / 深入了解 ? 运算符

? is syntactic sugar for a match + From conversion + early return:

?match + From 转换 + 提前返回(early return)的语法糖:

#![allow(unused)]
fn main() {
// This: / 这段代码:
let value = operation()?;

// Desugars to: / 反糖化后等同于:
let value = match operation() {
    Ok(v) => v,
    Err(e) => return Err(From::from(e)),
    //                  ^^^^^^^^^^^^^^
    //                  Automatic conversion via From trait
    //                  通过 From trait 进行自动转换
};
}

? also works with Option (in functions returning Option):

? 也能用于 Option(在返回 Option 的函数中):

#![allow(unused)]
fn main() {
fn find_user_email(users: &[User], name: &str) -> Option<String> {
    let user = users.iter().find(|u| u.name == name)?; // Returns None if not found / 若未找到则返回 None
    let email = user.email.as_ref()?; // Returns None if email is None / 若 email 为 None 则返回 None
    Some(email.to_uppercase())
}
}

Panics, catch_unwind, and When to Abort / Panic、catch_unwind 以及何时中止

#![allow(unused)]
fn main() {
// Panics: for BUGS, not expected errors
// Panic:用于处理 BUG,而非预料中的错误
fn get_element(data: &[i32], index: usize) -> &i32 {
    // If this panics, it's a programming error (bug).
    // Don't "handle" it — fix the caller.
    // 如果这里发生 panic,说明是编程错误(bug)。
    // 不要试图“处理”它 —— 请修复调用者。
    &data[index]
}

// catch_unwind: for boundaries (FFI, thread pools)
// catch_unwind:用于边界场景(FFI、线程池)
use std::panic;

let result = panic::catch_unwind(|| {
    // Run potentially panicking code safely
    // 安全地运行可能发生 panic 的代码
    risky_operation()
});

match result {
    Ok(value) => println!("Success: {value:?}"),
    Err(_) => eprintln!("Operation panicked — continuing safely"),
    // Err(_) => eprintln!("操作发生 panic —— 正在安全地继续运行"),
}

// When to use which / 该如何选择:
// - Result<T, E> → expected failures (file not found, network timeout)
// - Result<T, E> → 预料中的失败(如文件未找到、网络超时)
// - panic!()     → programming bugs (index out of bounds, invariant violated)
// - panic!()     → 编程 bug(如索引越界、违反不变性)
// - process::abort() → unrecoverable state (security violation, corrupt data)
// - process::abort() → 不可恢复的状态(如安全违规、数据损坏)
}

C++ comparison: Result<T, E> replaces exceptions for expected errors. panic!() is like assert() or std::terminate() — it’s for bugs, not control flow. Rust’s ? operator makes error propagation as ergonomic as exceptions without the unpredictable control flow.

C++ 对比Result<T, E> 取代了 C++ 中处理预期错误的异常机制。panic!() 类似于 assert()std::terminate() —— 它是为了处理 bug,而非控制流。Rust 的 ? 运算符使错误传播像异常一样符合人体工程学,同时又避免了不可预测的控制流。

Key Takeaways — Error Handling / 核心要点 —— 错误处理

  • Libraries: thiserror for structured error enums; applications: anyhow for ergonomic propagation / 库:使用 thiserror 定义结构化错误枚举;应用程序:使用 anyhow 进行符合人体工程学的错误传播
  • #[from] auto-generates From impls; .context() adds human-readable wrappers / #[from] 自动生成 From 实现;.context() 添加人类可读的包装层
  • ? desugars to From::from() + early return; works in main() returning Result / ? 会反糖化为 From::from() + 提前返回;可在返回 Resultmain() 函数中使用

See also / 另请参阅: Ch 15 — Crate Architecture and API Design for “parse, don’t validate” patterns. Ch 11 — Serialization for serde error handling.

参见 第 15 章 —— Crate 架构与 API 设计 了解“解析而非验证”模式。参见 第 11 章 —— 序列化 了解 serde 错误处理。

flowchart LR
    A["std::io::Error"] -->|"#[from]"| B["AppError::Io"]
    C["serde_json::Error"] -->|"#[from]"| D["AppError::Json"]
    E["Custom validation / \n自定义验证"] -->|"manual / \n手动"| F["AppError::Validation"]

    B --> G["? operator / \n? 运算符"]
    D --> G
    F --> G
    G --> H["Result&lt;T, AppError&gt;"]

    style A fill:#e8f4f8,stroke:#2980b9,color:#000
    style C fill:#e8f4f8,stroke:#2980b9,color:#000
    style E fill:#e8f4f8,stroke:#2980b9,color:#000
    style B fill:#fdebd0,stroke:#e67e22,color:#000
    style D fill:#fdebd0,stroke:#e67e22,color:#000
    style F fill:#fdebd0,stroke:#e67e22,color:#000
    style G fill:#fef9e7,stroke:#f1c40f,color:#000
    style H fill:#d4efdf,stroke:#27ae60,color:#000

Exercise: Error Hierarchy with thiserror ★★ (~30 min) / 练习:使用 thiserror 构建错误层级 ★★(约 30 分钟)

Design an error type hierarchy for a file-processing application that can fail during I/O, parsing (JSON and CSV), and validation. Use thiserror and demonstrate ? propagation.

为文件处理应用程序设计错误类型层级,该程序可能会在 I/O、解析(JSON 和 CSV)以及验证期间发生失败。使用 thiserror 并演示 ? 运算符的传播。

🔑 Solution / 参考答案
use thiserror::Error;

#[derive(Error, Debug)]
pub enum AppError {
    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("JSON parse error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("CSV error at line {line}: {message}")]
    Csv { line: usize, message: String },

    #[error("validation error: {field} — {reason}")]
    Validation { field: String, reason: String },
}

fn read_file(path: &str) -> Result<String, AppError> {
    Ok(std::fs::read_to_string(path)?) // io::Error → AppError::Io via #[from]
}

fn parse_json(content: &str) -> Result<serde_json::Value, AppError> {
    Ok(serde_json::from_str(content)?) // serde_json::Error → AppError::Json
}

fn validate_name(value: &serde_json::Value) -> Result<String, AppError> {
    let name = value.get("name")
        .and_then(|v| v.as_str())
        .ok_or_else(|| AppError::Validation {
            field: "name".into(),
            reason: "must be a non-null string".into(),
        })?;

    if name.is_empty() {
        return Err(AppError::Validation {
            field: "name".into(),
            reason: "must not be empty".into(),
        });
    }

    Ok(name.to_string())
}

fn process_file(path: &str) -> Result<String, AppError> {
    let content = read_file(path)?;
    let json = parse_json(&content)?;
    let name = validate_name(&json)?;
    Ok(name)
}

fn main() {
    match process_file("config.json") {
        Ok(name) => println!("Name: {name}"),
        Err(e) => eprintln!("Error: {e}"),
    }
}

11. Serialization, Zero-Copy, and Binary Data / 11. 序列化、零拷贝与二进制数据 🟡

What you’ll learn / 你将学到:

  • serde fundamentals: derive macros, attributes, and enum representations / serde 基础:派生宏、属性和枚举表示形式
  • Zero-copy deserialization for high-performance read-heavy workloads / 适用于高性能、重负载读取场景的零拷贝(Zero-copy)反序列化
  • The serde format ecosystem (JSON, TOML, bincode, MessagePack) / serde 格式生态系统(JSON、TOML、bincode、MessagePack)
  • Binary data handling with repr(C), zerocopy, and bytes::Bytes / 使用 repr(C)zerocopybytes::Bytes 处理二进制数据

serde Fundamentals / serde 基础

serde (SERialize/DEserialize) is the universal serialization framework for Rust. It separates data model (your structs) from format (JSON, TOML, binary):

serde (SERialize/DEserialize) 是 Rust 中通用的序列化框架。它将 数据模型(data model)(即你的结构体)与 数据格式(format)(如 JSON、TOML、二进制)分离开来:

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
struct ServerConfig {
    name: String,
    port: u16,
    #[serde(default)]                    // Use Default::default() if missing / 若缺失则使用 Default::default()
    max_connections: usize,
    #[serde(skip_serializing_if = "Option::is_none")]
    tls_cert_path: Option<String>,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Deserialize from JSON:
    // 从 JSON 反序列化:
    let json_input = r#"{
        "name": "hw-diag",
        "port": 8080
    }"#;
    let config: ServerConfig = serde_json::from_str(json_input)?;
    println!("{config:?}");
    // ServerConfig { name: "hw-diag", port: 8080, max_connections: 0, tls_cert_path: None }

    // Serialize to JSON:
    // 序列化为 JSON:
    let output = serde_json::to_string_pretty(&config)?;
    println!("{output}");

    // Same struct, different format — no code changes:
    // 相同的结构体,不同的格式 —— 无需修改代码:
    let toml_input = r#"
        name = "hw-diag"
        port = 8080
    "#;
    let config: ServerConfig = toml::from_str(toml_input)?;
    println!("{config:?}");

    Ok(())
}

Key insight / 核心见解:你的结构体只需派生一次 SerializeDeserialize。随后,它即可与 所有 兼容 serde 的格式配合使用 —— 包括 JSON、TOML、YAML、bincode、MessagePack、CBOR、postcard 及其它数十种格式。

Common serde Attributes / 常用的 serde 属性

serde provides fine-grained control over serialization through field and container attributes:

serde 通过字段属性(field attributes)和容器属性(container attributes)提供了对序列化的细粒度控制:

use serde::{Serialize, Deserialize};

// --- Container attributes (on the struct/enum) ---
// --- 容器属性(作用于结构体/枚举) ---
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]       // JSON convention: field_name → fieldName
#[serde(deny_unknown_fields)]            // Reject extra keys — strict parsing / 拒绝额外键值 —— 严格解析
struct DiagResult {
    test_name: String,                   // Serialized as "testName"
    pass_count: u32,                     // Serialized as "passCount"
    fail_count: u32,                     // Serialized as "failCount"
}

// --- Field attributes ---
// --- 字段属性 ---
#[derive(Serialize, Deserialize)]
struct Sensor {
    #[serde(rename = "sensor_id")]       // Override field name for serialization / 覆盖序列化时的字段名
    id: u64,

    #[serde(default)]                    // Use Default if missing from input / 若输入中缺失则使用 Default
    enabled: bool,

    #[serde(default = "default_threshold")]
    threshold: f64,

    #[serde(skip)]                       // Never serialize or deserialize / 永不进行序列化或反序列化
    cached_value: Option<f64>,

    #[serde(skip_serializing_if = "Vec::is_empty")]
    tags: Vec<String>,

    #[serde(flatten)]                    // Inline nested struct fields / 内联嵌套结构体的字段
    metadata: Metadata,

    #[serde(with = "hex_bytes")]         // Custom ser/de module / 自定义序列化/反序列化模块
    raw_data: Vec<u8>,
}

fn default_threshold() -> f64 { 1.0 }

#[derive(Serialize, Deserialize)]
struct Metadata {
    vendor: String,
    model: String,
}
// With #[serde(flatten)], the JSON looks like:
// 使用 #[serde(flatten)] 后,JSON 结构如下:
// { "sensor_id": 1, "vendor": "Intel", "model": "X200", ... }
// NOT: { "sensor_id": 1, "metadata": { "vendor": "Intel", ... } }
// 而非:{ "sensor_id": 1, "metadata": { "vendor": "Intel", ... } }

Most-used attributes cheat sheet / 常用属性速查表:

Attribute / 属性Level / 层级Effect / 作用
rename_all = "camelCase"Container / 容器Rename all fields to camelCase/snake_case/SCREAMING_SNAKE_CASE / 将所有字段重命名为小驼峰/蛇形/大写蛇形命名
deny_unknown_fieldsContainer / 容器Error on unexpected keys (strict mode) / 遇到未预料的键时报错(严格模式)
defaultField / 字段Use Default::default() when field missing / 当字段缺失时使用 Default::default()
rename = "..."Field / 字段Custom serialized name / 自定义序列化名称
skipField / 字段Exclude from ser/de entirely / 完全排除在序列化/反序列化之外
skip_serializing_if = "fn"Field / 字段Conditionally exclude (e.g., Option::is_none) / 条件性排除(如 Option::is_none
flattenField / 字段Inline a nested struct’s fields / 内联嵌套结构体的字段
with = "module"Field / 字段Use custom serialize/deserialize functions / 使用自定义的序列化/反序列化函数
alias = "..."Field / 字段Accept alternative names during deserialization / 反序列化时接受备选名称
deserialize_with = "fn"Field / 字段Custom deserialize function only / 仅使用自定义的反序列化函数
untaggedEnum / 枚举Try each variant in order (no discriminant in output) / 按顺序尝试每个变体(输出中不包含判别式)

Enum Representations / 枚举表示形式

serde provides four representations for enums in formats like JSON:

对于 JSON 等格式,serde 为枚举提供了四种表示形式:

use serde::{Serialize, Deserialize};

// 1. Externally tagged (DEFAULT):
// 1. 外部标记(默认):
#[derive(Serialize, Deserialize)]
enum Command {
    Reboot,
    RunDiag { test_name: String, timeout_secs: u64 },
    SetFanSpeed(u8),
}
// "Reboot"                                          → Command::Reboot
// {"RunDiag": {"test_name": "gpu", "timeout_secs": 60}}  → Command::RunDiag { ... }

// 2. Internally tagged — #[serde(tag = "type")]:
// 2. 内部标记 —— #[serde(tag = "type")]:
#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum Event {
    Start { timestamp: u64 },
    Error { code: i32, message: String },
    End   { timestamp: u64, success: bool },
}
// {"type": "Start", "timestamp": 1706000000}
// {"type": "Error", "code": 42, "message": "timeout"}

// 3. Adjacently tagged — #[serde(tag = "t", content = "c")]:
// 3. 相邻标记 —— #[serde(tag = "t", content = "c")]:
#[derive(Serialize, Deserialize)]
#[serde(tag = "t", content = "c")]
enum Payload {
    Text(String),
    Binary(Vec<u8>),
}
// {"t": "Text", "c": "hello"}
// {"t": "Binary", "c": [0, 1, 2]}

// 4. Untagged — #[serde(untagged)]:
// 4. 无标记 —— #[serde(untagged)]:
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
enum StringOrNumber {
    Str(String),
    Num(f64),
}
// "hello" → StringOrNumber::Str("hello")
// 42.0    → StringOrNumber::Num(42.0)
// ⚠️ Tried IN ORDER — first matching variant wins
// ⚠️ 按顺序尝试 —— 第一个匹配的变体胜出

Which representation to choose / 该选择哪种表示形式:对于大多数 JSON API,请使用内部标记(tag = "type")。它是最可读的,并且符合 Go、Python 和 TypeScript 中的惯例。只有在形状(shape)本身足以区分的“联合”类型(union types)中,才使用无标记。

Zero-Copy Deserialization / 零拷贝反序列化

serde can deserialize without allocating new strings — borrowing directly from the input buffer. This is the key to high-performance parsing:

serde 可以实现在不分配新字符串的情况下进行反序列化 —— 直接从输入缓冲区借用数据。这是高性能解析的关键:

use serde::Deserialize;

// --- Owned (allocating) ---
// --- 所有权模式(会进行分配) ---
// Each String field copies bytes from the input into new heap allocations.
// 每个 String 字段都会将字节从输入中复制到新的堆分配中。
#[derive(Deserialize)]
struct OwnedRecord {
    name: String,           // Allocates a new String / 分配一个新 String
    value: String,          // Allocates another String / 分配另一个 String
}

// --- Zero-copy (borrowing) ---
// --- 零拷贝模式(进行借用) ---
// &'de str fields borrow directly from the input — ZERO allocation.
// &'de str 字段直接从输入中借用 —— 零分配。
#[derive(Deserialize)]
struct BorrowedRecord<'a> {
    name: &'a str,          // Points into the input buffer / 指向输入缓冲区
    value: &'a str,         // Points into the input buffer / 指向输入缓冲区
}

fn main() {
    let input = r#"{"name": "cpu_temp", "value": "72.5"}"#;

    // Owned: allocates two String objects
    // 所有权模式:分配了两个 String 对象
    let owned: OwnedRecord = serde_json::from_str(input).unwrap();

    // Zero-copy: `name` and `value` point into `input` — no allocation
    // 零拷贝模式:`name` 和 `value` 指向 `input` —— 无分配
    let borrowed: BorrowedRecord = serde_json::from_str(input).unwrap();

    // The output is lifetime-bound: borrowed can't outlive input
    // 输出受生命周期限制:borrowed 的存活时间不能超过 input
    println!("{}: {}", borrowed.name, borrowed.value);
}

Understanding the lifetime / 理解生命周期:

// Deserialize<'de> — the struct can borrow from data with lifetime 'de:
// Deserialize<'de> —— 结构体可以从生命周期为 'de 的数据中借用:
//   struct BorrowedRecord<'a> where 'a == 'de
//   Only works when the input buffer lives long enough
//   仅在输入缓冲区存活时间足够长时有效

// DeserializeOwned — the struct owns all its data, no borrowing:
// DeserializeOwned —— 结构体拥有其所有数据,无借用:
//   trait DeserializeOwned: for<'de> Deserialize<'de> {}
//   Works with any input lifetime (the struct is independent)

use serde::de::DeserializeOwned;

// This function requires owned types — input can be temporary
// 此函数要求所有权类型 —— 输入可以是临时的
fn parse_owned<T: DeserializeOwned>(input: &str) -> T {
    serde_json::from_str(input).unwrap()
}

// This function allows borrowing — more efficient but restricts lifetimes
// 此函数允许借用 —— 更高效,但限制了生命周期
fn parse_borrowed<'a, T: Deserialize<'a>>(input: &'a str) -> T {
    serde_json::from_str(input).unwrap()
}

When to use zero-copy / 何时使用零拷贝:

  • Parsing large files where you only need a few fields / 解析大文件且只需要其中几个字段时
  • High-throughput pipelines (network packets, log lines) / 高吞吐量流水线(网络数据包、日志行)时
  • When the input buffer already lives long enough (e.g., memory-mapped file) / 输入缓冲区已经存活足够长时间(如内存映射文件)时

When NOT to use zero-copy / 何时不可使用零拷贝:

  • Input is ephemeral (network read buffer that’s reused) / 输入是瞬时的(如被复用的网络读取缓冲区)时
  • You need to store the result beyond the input’s lifetime / 你需要将结果存储得比输入的生命周期更久时
  • Fields need transformation (escapes, normalization) / 字段需要转换(如转义、归一化)时

Practical tip / 实用提示Cow<'a, str> 为你提供了两全其美的选择 —— 尽可能借用,必要时分配(例如,当 JSON 转义序列需要取消转义时)。serde 原生支持 Cow

The Format Ecosystem / 格式生态系统

Format / 格式CrateHuman-Readable / 人类可读Size / 体积Speed / 速度Use Case / 使用场景
JSONserde_jsonLarge / 大Good / 良Config files, REST APIs, logging / 配置文件、REST API、日志
TOMLtomlMedium / 中Good / 良Config files (Cargo.toml style) / 配置文件(Cargo.toml 风格)
YAMLserde_yamlMedium / 中Good / 良Config files (complex nesting) / 配置文件(复杂嵌套)
bincodebincodeSmall / 小Fast / 快IPC, caches, Rust-to-Rust / IPC、缓存、Rust 到 Rust 的通信
postcardpostcardTiny / 极小Very fast / 极快Embedded systems, no_std / 嵌入式系统、no_std 环境
MessagePackrmp-serdeSmall / 小Fast / 快Cross-language binary protocol / 跨语言二进制协议
CBORciboriumSmall / 小Fast / 快IoT, constrained environments / 物联网、受限环境
#![allow(unused)]
fn main() {
// Same struct, many formats — serde's power:
// 同一个结构体,多种格式 —— 这就是 serde 的强大之处:

#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct DiagConfig {
    name: String,
    tests: Vec<String>,
    timeout_secs: u64,
}

let config = DiagConfig {
    name: "accel_diag".into(),
    tests: vec!["memory".into(), "compute".into()],
    timeout_secs: 300,
};

// JSON:   {"name":"accel_diag","tests":["memory","compute"],"timeout_secs":300}
let json = serde_json::to_string(&config).unwrap();       // 67 bytes

// bincode: compact binary — ~40 bytes, no field names
// bincode:紧凑二进制 —— 约 40 字节,不包含地段名称
let bin = bincode::serialize(&config).unwrap();            // Much smaller / 小得多

// postcard: even smaller, varint encoding — great for embedded
// postcard:体积更小,采用变长整数(varint)编码 —— 非常适合嵌入式
// let post = postcard::to_allocvec(&config).unwrap();
}

Choose your format / 该选择哪种格式:

  • Config files humans edit → TOML or JSON / 人类编辑的配置文件 → TOML 或 JSON
  • Rust-to-Rust IPC/caching → bincode (fast, compact, not cross-language) / Rust 到 Rust 的 IPC/缓存 → bincode(快、紧凑、非跨语言)
  • Cross-language binary → MessagePack or CBOR / 跨语言二进制 → MessagePack 或 CBOR
  • Embedded / no_std → postcard / 嵌入式 / no_std → postcard

Binary Data and repr(C) / 二进制数据与 repr(C)

When you need to parse fixed-layout binary data (hardware protocols, firmware), we use repr(C) to ensure the struct layout matches the physical data:

当你需要解析固定布局的二进制数据(如硬件协议、固件)时,我们使用 repr(C) 来确保结构体布局与物理数据相匹配:

#![allow(unused)]
fn main() {
// Without repr(C), Rust might reorder fields for better packing
// 不使用 repr(C) 时,Rust 可能会为了更好的打包效果而重新排列字段
#[repr(C)]
#[derive(Debug, Clone, Copy)]
struct IpmiHeader {
    rs_addr: u8,
    net_fn_lun: u8,
    checksum: u8,
    rq_addr: u8,
    rq_seq_lun: u8,
    cmd: u8,
}

// --- Safe binary parsing with manual deserialization ---
// --- 通过手动反序列化进行安全的二进制解析 ---
impl IpmiHeader {
    fn from_bytes(data: &[u8]) -> Option<Self> {
        if data.len() < std::mem::size_of::<Self>() { // Changed size_of to std::mem::size_of
            return None;
        }
        Some(IpmiHeader {
            rs_addr:     data[0],
            net_fn_lun:  data[1],
            checksum:    data[2],
            rq_addr:     data[3],
            rq_seq_lun:  data[4],
            cmd:         data[5],
        })
    }

    fn net_fn(&self) -> u8 { self.net_fn_lun >> 2 }
    fn lun(&self)    -> u8 { self.net_fn_lun & 0x03 }
}

// --- Endianness-aware parsing ---
// --- 字节序敏感的解析 ---
fn read_u16_le(data: &[u8], offset: usize) -> u16 {
    u16::from_le_bytes([data[offset], data[offset + 1]])
}

fn read_u32_be(data: &[u8], offset: usize) -> u32 {
    u32::from_be_bytes([
        data[offset], data[offset + 1],
        data[offset + 2], data[offset + 3],
    ])
}

// --- #[repr(C, packed)]: Remove padding (alignment = 1) ---
// --- #[repr(C, packed)]:移除填充(对齐方式 = 1) ---
#[repr(C, packed)]
#[derive(Debug, Clone, Copy)]
struct PcieCapabilityHeader {
    cap_id: u8,        // Capability ID / 能力ID
    next_cap: u8,      // Pointer to next capability / 指向下一个能力的指针
    cap_reg: u16,      // Capability-specific register / 能力特定寄存器
}
// ⚠️ Packed structs: taking &field creates an unaligned reference — UB.
// ⚠️ 紧凑型结构体:获取字段引用(&field)会创建未对齐的引用 —— 属于未定义行为(UB)。
// Always copy fields out: let id = header.cap_id;  // OK (Copy)
// 务必将字段复制出来:let id = header.cap_id;  // 正确(复制)
// Never do: let r = &header.cap_reg;               // UB if unaligned
// 切勿执行:let r = &header.cap_reg;               // 若未对齐则是 UB
}

zerocopy and bytemuck — Safe Transmutation / zerocopy 与 bytemuck —— 安全转换

Instead of unsafe transmute, use crates that verify layout safety at compile time:

与其使用 unsafetransmute,不如使用在编译时验证布局安全性的库:

#![allow(unused)]
fn main() {
// --- zerocopy: Compile-time checked zero-copy conversions ---
// --- zerocopy:编译时检查的零拷贝转换 ---
// Cargo.toml: zerocopy = { version = "0.8", features = ["derive"] }

use zerocopy::{FromBytes, IntoBytes, KnownLayout, Immutable};

#[derive(FromBytes, IntoBytes, KnownLayout, Immutable, Debug)]
#[repr(C)]
struct SensorReading {
    sensor_id: u16,
    flags: u8,
    _reserved: u8,
    value: u32,     // Fixed-point: actual = value / 1000.0 / 定点数:实际值 = value / 1000.0
}

fn parse_sensor(raw: &[u8]) -> Option<&SensorReading> {
    // Safe zero-copy: verifies alignment and size AT COMPILE TIME
    // 安全零拷贝:在编译时验证对齐和大小
    SensorReading::ref_from_bytes(raw)
    // Returns &SensorReading pointing INTO raw — no copy, no allocation
    // 返回指向 raw 内部的 &SensorReading —— 无拷贝,无分配
}

// --- bytemuck: Simple, battle-tested ---
// --- bytemuck:简单且经过实战检验 ---
// Cargo.toml: bytemuck = { version = "1", features = ["derive"] }

use bytemuck::{Pod, Zeroable};

#[derive(Pod, Zeroable, Clone, Copy, Debug)]
#[repr(C)]
struct GpuRegister {
    address: u32,
    value: u32,
}

fn cast_registers(data: &[u8]) -> &[GpuRegister] {
    // Safe cast: Pod guarantees all bit patterns are valid
    // 安全转换:Pod 保证所有位模式都是有效的
    bytemuck::cast_slice(data)
}
}

When to use which / 如何选择:

Approach / 方法Safety / 安全性Overhead / 开销Use When / 适用场景
Manual field-by-field parsing / 手动逐字段解析✅ Safe / 安全Copy fields / 拷贝字段Small structs, complex layouts / 小型结构体、复杂布局
zerocopy✅ Safe / 安全Zero-copy / 零拷贝Large buffers, many reads, compile-time checks / 大缓冲区、多次读取、编译时检查
bytemuck✅ Safe / 安全Zero-copy / 零拷贝Simple Pod types, casting slices / 简单的 Pod 类型、切片转换
unsafe { transmute() }❌ Unsafe / 不安全Zero-copy / 零拷贝Last resort — avoid in application code / 最后的手段 —— 在应用代码中应避免使用

bytes::Bytes — Reference-Counted Buffers / bytes::Bytes —— 引用计数缓冲区

The bytes crate (used by tokio, hyper, tonic) provides zero-copy byte buffers with reference counting — Bytes is to Vec<u8> what Arc<[u8]> is to owned slices:

bytes 库(被 tokio、hyper、tonic 等广泛使用)提供了带引用计数的零拷贝字节缓冲区 —— Bytes 之于 Vec<u8>,正如 Arc<[u8]> 之于所有权切片:

use bytes::{Bytes, BytesMut, Buf, BufMut};

fn main() {
    // --- BytesMut: mutable buffer for building data ---
    // --- BytesMut:用于构建数据的可变缓冲区 ---
    let mut buf = BytesMut::with_capacity(1024);
    buf.put_u8(0x01);                    // Write a byte / 写入一个字节
    buf.put_u16(0x1234);                 // Write u16 (big-endian) / 写入 u16(大端序)
    buf.put_slice(b"hello");             // Write raw bytes / 写入原始字节
    buf.put(&b"world"[..]);              // Write from slice / 从切片写入

    // Freeze into immutable Bytes (zero cost):
    // 冻结为不可变的 Bytes(零成本):
    let data: Bytes = buf.freeze();

    // --- Bytes: immutable, reference-counted, cloneable ---
    // --- Bytes:不可变、引用计数、可克隆 ---
    let data2 = data.clone();            // Cheap: increments refcount, NOT deep copy / 廉价:增加引用计数,而非深拷贝
    let slice = data.slice(3..8);        // Zero-copy sub-slice (shares buffer) / 零拷贝子切片(共享缓冲区)

    // Read from Bytes using the Buf trait:
    // 使用 Buf trait 从 Bytes 中读取:
    let mut reader = &data[..];
    let byte = reader.get_u8();          // 0x01
    let short = reader.get_u16();        // 0x1234

    // Split without copying:
    // 无需拷贝即可进行拆分:
    let mut original = Bytes::from_static(b"HEADER\x00PAYLOAD");
    let header = original.split_to(6);   // header = "HEADER", original = "\x00PAYLOAD"

    println!("header: {:?}", &header[..]);
    println!("payload: {:?}", &original[1..]);
}

bytes vs Vec<u8> / bytesVec<u8>

Feature / 特性Vec<u8>Bytes
Clone cost / 克隆开销O(n) deep copy / O(n) 深拷贝O(1) refcount increment / O(1) 引用计数增加
Sub-slicing / 子切片Borrows with lifetime / 带生命周期的借用Owned, refcount-tracked / 所有权模式、引用计数追踪
Thread safety / 线程安全Not Sync (needs Arc) / 非 Sync(需要 ArcSend + Sync built in / 内置 Send + Sync
Mutability / 可变性Direct &mut / 直接 &mutSplit into BytesMut first / 需先拆分为 BytesMut
Ecosystem / 生态环境Standard library / 标准库tokio, hyper, tonic, axum

**When to use bytes / 何时使用 bytes:**网络协议、数据包解析,或者任何你需要接收缓冲区并将其拆分为由不同组件或线程处理的部分的场景。零拷贝拆分是它的杀手锏功能。

Key Takeaways — Serialization & Binary Data / 核心要点 —— 序列化与二进制数据

  • serde’s derive macros handle 90% of cases; use attributes (rename, skip, default) for the rest / serde 的派生宏可处理 90% 的情况;其余情况请使用属性(renameskipdefault)。
  • Zero-copy deserialization (&'a str in structs) avoids allocation for read-heavy workloads / 零拷贝反序列化(结构体中的 &'a str)可避免读密集型工作负载中的分配开销。
  • repr(C) + zerocopy/bytemuck for hardware register layouts; bytes::Bytes for reference-counted buffers / 对于硬件寄存器布局,使用 repr(C) + zerocopy/bytemuck;对于引用计数缓冲区,使用 bytes::Bytes

See also: / 另请参阅: Ch 10 — Error Handling 了解如何将 serde 错误与 thiserror 结合。 Ch 12 — Unsafe 了解 repr(C) 和 FFI 数据布局。

flowchart LR
    subgraph Input / 输入
        JSON["JSON"]
        TOML["TOML"]
        Bin["bincode"]
        MsgP["MessagePack"]
    end

    subgraph Serde["Serde (Data Model / 数据模型)"]
        direction TB
        Traits["Deserialize / Serialize"]
        Attributes["Attributes / 属性"]
    end

    subgraph Output / 输出
        Struct["Rust Structs / 结构体"]
        Enum["Enums / 枚举"]
    end

    Input --> Serde
    Serde --> Output

Exercise: Custom serde Deserialization ★★★ (~45 min) / 练习:自定义 serde 反序列化 ★★★(约 45 分钟)

Design a HumanDuration wrapper that deserializes from human-readable strings like "30s", "5m", "2h" using a custom serde deserializer. It should also serialize back to the same format.

设计一个 HumanDuration 包装器,使用自定义 serde 反序列化器将 "30s""5m""2h" 等人类可读的字符串反序列化。它还应该能序列化回相同的格式。

🔑 Solution / 解决方案
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::fmt;

#[derive(Debug, Clone, PartialEq)]
struct HumanDuration(std::time::Duration);

impl HumanDuration {
    fn from_str(s: &str) -> Result<Self, String> {
        let s = s.trim();
        if s.is_empty() { return Err("empty duration string".into()); }

        let (num_str, suffix) = s.split_at(
            s.find(|c: char| !c.is_ascii_digit()).unwrap_or(s.len())
        );
        let value: u64 = num_str.parse()
            .map_err(|_| format!("invalid number: {num_str}"))?;

        let duration = match suffix {
            "s" | "sec"  => std::time::Duration::from_secs(value),
            "m" | "min"  => std::time::Duration::from_secs(value * 60),
            "h" | "hr"   => std::time::Duration::from_secs(value * 3600),
            "ms"         => std::time::Duration::from_millis(value),
            other        => return Err(format!("unknown suffix: {other}")),
        };
        Ok(HumanDuration(duration))
    }
}

impl fmt::Display for HumanDuration {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let secs = self.0.as_secs();
        if secs == 0 {
            write!(f, "{}ms", self.0.as_millis())
        } else if secs % 3600 == 0 {
            write!(f, "{}h", secs / 3600)
        } else if secs % 60 == 0 {
            write!(f, "{}m", secs / 60)
        } else {
            write!(f, "{}s", secs)
        }
    }
}

impl Serialize for HumanDuration {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        // Serialize using Display / 使用 Display 进行序列化
        serializer.serialize_str(&self.to_string())
    }
}

impl<'de> Deserialize<'de> for HumanDuration {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        // Deserialize from string and parse / 从字符串反序列化并解析
        let s = String::deserialize(deserializer)?;
        HumanDuration::from_str(&s).map_err(serde::de::Error::custom)
    }
}

#[derive(Debug, Deserialize, Serialize)]
struct Config {
    timeout: HumanDuration,
    retry_interval: HumanDuration,
}

fn main() {
    let json = r#"{ "timeout": "30s", "retry_interval": "5m" }"#;
    let config: Config = serde_json::from_str(json).unwrap();

    assert_eq!(config.timeout.0, std::time::Duration::from_secs(30));
    assert_eq!(config.retry_interval.0, std::time::Duration::from_secs(300));

    let serialized = serde_json::to_string(&config).unwrap();
    assert!(serialized.contains("30s"));
    println!("Config: {serialized}");
}

12. Unsafe Rust — Controlled Danger / 12. Unsafe Rust:受控的危险 🔶

What you’ll learn / 你将学到:

  • The five unsafe superpowers and when each is needed / 五种 Unsafe “超能力”及其适用场景
  • Writing sound abstractions: safe API, unsafe internals / 编写可靠的抽象:安全 API 与 Unsafe 内部实现
  • FFI patterns for calling C from Rust (and back) / FFI 模式:在 Rust 中调用 C(以及反向调用)
  • Common UB pitfalls and arena/slab allocator patterns / 常见的未定义行为 (UB) 陷阱与 Arena/Slab 分配器模式

The Five Unsafe Superpowers / 五种 Unsafe 超能力

unsafe unlocks five operations that the compiler can’t verify:

unsafe 开启了五种编译器无法验证的操作:

#![allow(unused)]
fn main() {
// SAFETY: each operation is explained inline below.
// 安全提示:每项操作均在下方行内进行说明。
unsafe {
    // 1. Dereference a raw pointer
    // 1. 解引用裸指针
    let ptr: *const i32 = &42;
    let value = *ptr; // Could be a dangling/null pointer
                      // 可能是一个悬空或空指针

    // 2. Call an unsafe function
    // 2. 调用 unsafe 函数
    let layout = std::alloc::Layout::new::<u64>();
    let mem = std::alloc::alloc(layout);

    // 3. Access a mutable static variable
    // 3. 访问可变静态变量
    static mut COUNTER: u32 = 0;
    COUNTER += 1; // Data race if multiple threads access
                  // 如果多个线程访问,会发生数据竞态

    // 4. Implement an unsafe trait
    // 4. 实现 unsafe trait
    // unsafe impl Send for MyType {}

    // 5. Access fields of a union
    // 5. 访问 union 的字段
    // union IntOrFloat { i: i32, f: f32 }
    // let u = IntOrFloat { i: 42 };
    // let f = u.f; // Reinterpret bits — could be garbage
                  // 重新解释位(bits)—— 可能是垃圾数据
}
}

Key principle / 核心原则unsafe doesn’t turn off the borrow checker or type system. It only unlocks these five specific capabilities. All other Rust rules still apply.

unsafe 并没有关闭借用检查器或类型系统。它仅仅开启了这五种特定的能力。所有其他 Rust 规则依然适用。

Writing Sound Abstractions / 编写可靠的抽象

The purpose of unsafe is to build safe abstractions around unsafe operations:

unsafe 的目的是围绕不安全的操作构建 安全抽象

#![allow(unused)]
fn main() {
/// A fixed-capacity stack-allocated buffer.
/// All public methods are safe — the unsafe is encapsulated.
/// 一个固定容量的、分配在栈上的缓冲区。
/// 所有公共方法都是安全的 —— unsafe 已被封装在内部。
pub struct StackBuf<T, const N: usize> {
    data: [std::mem::MaybeUninit<T>; N],
    len: usize,
}

impl<T, const N: usize> StackBuf<T, N> {
    pub fn new() -> Self {
        StackBuf {
            // Each element is individually MaybeUninit — no unsafe needed.
            // `const { ... }` blocks (Rust 1.79+) let us repeat a non-Copy
            // const expression N times.
            // 每个元素都是独立的 MaybeUninit —— 这里无需使用 unsafe。
            // `const { ... }` 块(Rust 1.79+)允许我们重复一个非 Copy 的常量表达式 N 次。
            data: [const { std::mem::MaybeUninit::uninit() }; N],
            len: 0,
        }
    }

    pub fn push(&mut self, value: T) -> Result<(), T> {
        if self.len >= N {
            return Err(value); // Buffer full — return value to caller
                               // 缓冲区已满 —— 将值返回给调用者
        }
        // SAFETY: len < N, so data[len] is within bounds.
        // We write a valid T into the MaybeUninit slot.
        // 安全性:len < N,因此 data[len] 处于边界内。
        // 我们在 MaybeUninit 插槽中写入了一个有效的 T。
        self.data[self.len] = std::mem::MaybeUninit::new(value);
        self.len += 1;
        Ok(())
    }

    pub fn get(&self, index: usize) -> Option<&T> {
        if index < self.len {
            // SAFETY: index < len, and data[0..len] are all initialized.
            // 安全性:index < len,且 data[0..len] 均已完成初始化。
            Some(unsafe { self.data[index].assume_init_ref() })
        } else {
            None
        }
    }
}

impl<T, const N: usize> Drop for StackBuf<T, N> {
    fn drop(&mut self) {
        // SAFETY: data[0..len] are initialized — drop them properly.
        // 安全性:data[0..len] 已初始化 —— 需妥善释放它们。
        for i in 0..self.len {
            unsafe { self.data[i].assume_init_drop(); }
        }
    }
}
}

The three rules of sound unsafe code / 可靠 Unsafe 代码的三原则

  1. Document invariants / 文档化不变量 — every // SAFETY: comment explains why the operation is valid / 每一个 // SAFETY: 注释都必须解释为什么该操作是有效的
  2. Encapsulate / 封装 — the unsafe is inside a safe API; users can’t trigger UB / 将 unsafe 保护在安全 API 内部;用户无法触发未定义行为 (UB)
  3. Minimize / 最小化 — only the smallest possible block is unsafe / 仅在尽可能小的范围内使用 unsafe

FFI Patterns: Calling C from Rust / FFI 模式:在 Rust 中调用 C

#![allow(unused)]
fn main() {
// Declare the C function signature:
// 声明 C 函数签名:
extern "C" {
    fn strlen(s: *const std::ffi::c_char) -> usize;
    fn printf(format: *const std::ffi::c_char, ...) -> std::ffi::c_int;
}

// Safe wrapper:
// 安全封装:
fn safe_strlen(s: &str) -> usize {
    let c_string = std::ffi::CString::new(s).expect("string contains null byte");
    // SAFETY: c_string is a valid null-terminated string, alive for the call.
    // 安全性:c_string 是一个有效的以 null 结尾的字符串,且在调用期间有效。
    unsafe { strlen(c_string.as_ptr()) }
}

// Calling Rust from C (export a function):
// 从 C 调用 Rust(导出函数):
#[no_mangle]
pub extern "C" fn rust_add(a: i32, b: i32) -> i32 {
    a + b
}
}

Common FFI types / 常用 FFI 类型

RustCNotes / 备注
i32 / u32int32_t / uint32_tFixed-width, safe / 固定宽度,安全
*const T / *mut Tconst T* / T*Raw pointers / 裸指针
std::ffi::CStrconst char* (borrowed)Null-terminated, borrowed / 以 null 结尾,借用
std::ffi::CStringchar* (owned)Null-terminated, owned / 以 null 结尾,拥有所有权
std::ffi::c_voidvoidOpaque pointer target / 不透明指针目标
Option<fn(...)>Nullable function pointerNone = NULL / 可为空的函数指针

Common UB Pitfalls / 常见的未定义行为 (UB) 陷阱

Pitfall / 陷阱Example / 示例Why It’s UB / 为什么是 UB
Null dereference / 空指针解引用*std::ptr::null::<i32>()Dereferencing null is always UB / 解引用空指针总是 UB
Dangling pointer / 悬空指针Dereference after drop()Memory may be reused / 内存可能已被重用
Data race / 数据竞态Two threads write to static mutUnsynchronized concurrent writes / 未经同步的并发写入
Wrong assume_init / 错误的 assume_initMaybeUninit::uninit().assume_init()Reading uninitialized memory / 读取了未初始化的内存
Aliasing violation / 别名违规Creating two &mut to same dataViolates Rust’s aliasing model / 违反了 Rust 的别名模型
Invalid enum value / 无效枚举值transmute::<u8, bool>(2)bool can only be 0 or 1 / bool 只能是 0 或 1

When to use unsafe in production / 生产中何时使用 unsafe

  • FFI boundaries (calling C/C++ code) / FFI 边界(调用 C/C++ 代码)
  • Performance-critical inner loops (avoid bounds checks) / 性能极其关键的内部循环(避免边界检查)
  • Building primitives (Vec, HashMap — these use unsafe internally) / 构建基础原语(VecHashMap —— 它们内部就使用了 unsafe)
  • Never in application logic if you can avoid it / 只要能避免,绝不要将其用于应用逻辑

Custom Allocators — Arena and Slab Patterns / 自定义分配器 —— Arena 与 Slab 模式

In C, you’d write custom malloc() replacements for specific allocation patterns — arena allocators that free everything at once, slab allocators for fixed-size objects, or pool allocators for high-throughput systems. Rust provides the same power through the GlobalAlloc trait and allocator crates, with the added benefit of lifetime-scoped arenas that prevent use-after-free at compile time.

在 C 语言中,你会针对特定的分配模式编写自定义的 malloc() 替代方案 —— 例如一次性释放所有内容的 arena 分配器、针对固定大小对象的 slab 分配器,或者用于高吞吐量系统的池分配器。Rust 通过 GlobalAlloc trait 和各种分配器 crate 提供了同样的能力,并增加了“基于生命周期作用域的 arena”这一额外优势,从而在 编译时防止“释放后使用(use-after-free)”

Arena Allocators — Bulk Allocation, Bulk Free / Arena 分配器 —— 批量分配与释放

An arena allocates by bumping a pointer forward. Individual items can’t be freed — the entire arena is freed at once. This is perfect for request-scoped or frame-scoped allocations:

Arena 通过向前推进指针来进行分配。单个条目无法被单独释放 —— 整个 arena 的内容会一次性全部释放。这非常适合于请求级(request-scoped)或帧级(frame-scoped)的分配场景:

#![allow(unused)]
fn main() {
use bumpalo::Bump;

fn process_sensor_frame(raw_data: &[u8]) {
    // Create an arena for this frame's allocations
    let arena = Bump::new();

    // Allocate objects in the arena — ~2ns each (just a pointer bump)
    let header = arena.alloc(parse_header(raw_data));
    let readings: &mut [f32] = arena.alloc_slice_fill_default(header.sensor_count);

    for (i, chunk) in raw_data[header.payload_offset..].chunks(4).enumerate() {
        if i < readings.len() {
            readings[i] = f32::from_le_bytes(chunk.try_into().unwrap());
        }
    }

    // Use readings...
    let avg = readings.iter().sum::<f32>() / readings.len() as f32;
    println!("Frame avg: {avg:.2}");

    // `arena` drops here — ALL allocations freed at once in O(1)
    // No per-object destructor overhead, no fragmentation
}
fn parse_header(_: &[u8]) -> Header { Header { sensor_count: 4, payload_offset: 8 } }
struct Header { sensor_count: usize, payload_offset: usize }
}

Arena vs standard allocator:

AspectVec::new() / Box::new()Bump arena
Alloc speed~25ns (malloc)~2ns (pointer bump)
Free speedPer-object destructorO(1) bulk free
FragmentationYes (long-lived processes)None within arena
Lifetime safetyHeap — freed on DropArena reference — compile-time scoped
Use caseGeneral purposeRequest/frame/batch processing

typed-arena — Type-Safe Arena / 类型安全的 Arena

When all arena objects are the same type, typed-arena provides a simpler API that returns references with the arena’s lifetime:

当 arena 中的所有对象都是同一类型时,typed-arena 提供了一个更简单的 API,其返回的引用生命周期与 arena 本身相绑定:

#![allow(unused)]
fn main() {
use typed_arena::Arena;

struct AstNode<'a> {
    value: i32,
    children: Vec<&'a AstNode<'a>>,
}

fn build_tree() {
    let arena: Arena<AstNode<'_>> = Arena::new();

    // Allocate nodes — returns &AstNode tied to arena's lifetime
    // 分配节点 —— 返回绑在 arena 生命周期上的 &AstNode
    let root = arena.alloc(AstNode { value: 1, children: vec![] });
    let left = arena.alloc(AstNode { value: 2, children: vec![] });
    let right = arena.alloc(AstNode { value: 3, children: vec![] });

    // Build the tree — all references valid as long as `arena` lives
    // 构建树 —— 只要 arena 还在,所有引用就都有效
    println!("Root: {}, Left: {}, Right: {}", root.value, left.value, right.value);

    // `arena` drops here — all nodes freed at once
    // `arena` 在此处被释放 —— 所有节点一次性销毁
}
}

Slab Allocators — Fixed-Size Object Pools / Slab 分配器 —— 固定大小的对象池

A slab allocator pre-allocates a pool of fixed-size slots. Objects are allocated and returned individually, but all slots are the same size — eliminating fragmentation and enabling O(1) alloc/free:

Slab 分配器预先分配一个固定大小插槽(slots)的池。对象可以被单独分配和归还,但所有插槽大小相同 —— 这消除了内存碎片,并实现了 O(1) 的分配和释放:

#![allow(unused)]
fn main() {
use slab::Slab;

struct Connection {
    id: u64,
    buffer: [u8; 1024],
    active: bool,
}

fn connection_pool_example() {
    // Pre-allocate a slab for connections
    // 为连接预分配一个 slab
    let mut connections: Slab<Connection> = Slab::with_capacity(256);

    // Insert returns a key (usize index) — O(1)
    // 插入返回一个 key (usize 索引) —— O(1)
    let key1 = connections.insert(Connection {
        id: 1001,
        buffer: [0; 1024],
        active: true,
    });

    let key2 = connections.insert(Connection {
        id: 1002,
        buffer: [0; 1024],
        active: true,
    });

    // Access by key — O(1)
    // 通过 key 访问 —— O(1)
    if let Some(conn) = connections.get_mut(key1) {
        conn.buffer[0..5].copy_from_slice(b"hello");
    }

    // Remove returns the value — O(1), slot is reused for next insert
    // 移除操作返回该值 —— O(1),该插槽会被下次插入重用
    let removed = connections.remove(key2);
    assert_eq!(removed.id, 1002);

    // Next insert reuses the freed slot — no fragmentation
    // 下一次插入重用已释放的插槽 —— 无内存碎片
    let key3 = connections.insert(Connection {
        id: 1003,
        buffer: [0; 1024],
        active: true,
    });
    assert_eq!(key3, key2); // Same slot reused!
                            // 同一个插槽被重用了!
}
}

Implementing a Minimal Arena (for no_std) / 实现最小 Arena(适用于 no_std

For bare-metal environments where you can’t pull in bumpalo, here’s a minimal arena built on unsafe:

对于无法引入 bumpalo 的裸机环境,这里有一个基于 unsafe 构建的极简 arena:

#![allow(unused)]
#![cfg_attr(not(test), no_std)]

fn main() {
use core::alloc::Layout;
use core::cell::{Cell, UnsafeCell};

/// A simple bump allocator backed by a fixed-size byte array.
/// Not thread-safe — use per-core or with a lock for multi-threaded contexts.
/// 一个由固定大小字节数组支持的简单 bump 分配器。
/// 非线程安全 —— 在多线程环境下请在每个核心使用或配合锁使用。
///
/// **Important**: Like `bumpalo`, this arena does NOT call destructors on
/// allocated items when the arena is dropped. Types with `Drop` impls will
/// leak their resources (file handles, sockets, etc.). Only allocate types
/// without meaningful `Drop` impls, or manually drop them before the arena.
/// **重要提示**:与 `bumpalo` 类似,当 arena 被释放时,它并不会调用已分配条目的析构函数。
/// 实现了 `Drop` 的类型将会泄露其资源(如文件句柄、套接字等)。
/// 请仅分配那些没有特殊 `Drop` 实现的类型,或者在 arena 释放前手动释放它们。
pub struct FixedArena<const N: usize> {
    // UnsafeCell is REQUIRED here: we mutate `buf` through `&self`.
    // Without UnsafeCell, casting &self.buf to *mut u8 would be UB
    // (violates Rust's aliasing model — shared ref implies immutable).
    // 这里必须使用 UnsafeCell:我们需要通过 `&self` 修改 `buf`。
    // 如果没有 UnsafeCell,将 &self.buf 转换为 *mut u8 将导致 UB
    // (这违反了 Rust 的别名模型 —— 共享引用意味着不可变)。
    buf: UnsafeCell<[u8; N]>,
    offset: Cell<usize>, // Interior mutability for &self allocation
                         // 内部可变性,用于 &self 分配
}

impl<const N: usize> FixedArena<N> {
    pub const fn new() -> Self {
        FixedArena {
            buf: UnsafeCell::new([0; N]),
            offset: Cell::new(0),
        }
    }

    /// Allocate a `T` in the arena. Returns `None` if out of space.
    /// 在 arena 中分配一个 `T`。如果空间不足则返回 `None`。
    pub fn alloc<T>(&self, value: T) -> Option<&mut T> {
        let layout = Layout::new::<T>();
        let current = self.offset.get();

        // Align up
        // 向上对齐
        let aligned = (current + layout.align() - 1) & !(layout.align() - 1);
        let new_offset = aligned + layout.size();

        if new_offset > N {
            return None; // Arena full
                         // Arena 已满
        }

        self.offset.set(new_offset);

        // SAFETY:
        // - `aligned` is within `buf` bounds (checked above)
        // - Alignment is correct (aligned to T's requirement)
        // - No aliasing: each alloc returns a unique, non-overlapping region
        // - UnsafeCell grants permission to mutate through &self
        // - The arena outlives the returned reference (caller must ensure)
        // 安全性:
        // - `aligned` 在 `buf` 的边界内(已在上方检查)
        // - 对齐正确(已按 T 的要求对齐)
        // - 无别名:每次分配都返回一个唯一、不重叠的区域
        // - UnsafeCell 允许通过 &self 进行修改
        // - Arena 的寿命长于返回的引用(调用者必须确保这一点)
        let ptr = unsafe {
            let base = (self.buf.get() as *mut u8).add(aligned);
            let typed = base as *mut T;
            typed.write(value);
            &mut *typed
        };

        Some(ptr)
    }

    /// Reset the arena — invalidates all previous allocations.
    /// 重置 arena —— 使之前所有的分配失效。
    ///
    /// # Safety
    /// Caller must ensure no references to arena-allocated data exist.
    /// ## 安全性
    /// 调用者必须确保不存在任何指向 arena 分配数据的引用。
    pub unsafe fn reset(&self) {
        self.offset.set(0);
    }

    pub fn used(&self) -> usize {
        self.offset.get()
    }

    pub fn remaining(&self) -> usize {
        N - self.offset.get()
    }
}
}

Choosing an Allocator Strategy / 选择分配器策略

Note / 注意:The diagram below uses Mermaid syntax. It renders on GitHub and in tools that support Mermaid (mdBook with mermaid plugin). In plain Markdown viewers, you’ll see the raw source.

下图使用了 Mermaid 语法。它在 GitHub 及支持 Mermaid 的工具(如带有 mermaid 插件的 mdBook)中可以正常渲染。在普通的 Markdown 查看器中,你将看到原始源代码。

graph TD
    A["What's your allocation pattern?<br/>你的分配模式是什么?"] --> B{All same type?<br/>全为同类型?}
    A --> I{"Environment?<br/>运行环境?"}
    B -->|Yes / 是| C{Need individual free?<br/>需单独释放?}
    B -->|No / 否| D{Need individual free?<br/>需单独释放?}
    C -->|Yes / 是| E["<b>Slab</b><br/>slab crate<br/>O(1) alloc + free<br/>Index-based access<br/>基于索引的访问"]
    C -->|No / 否| F["<b>typed-arena</b><br/>Bulk alloc, bulk free<br/>Lifetime-scoped refs<br/>生命周期作用域引用"]
    D -->|Yes / 是| G["<b>Standard allocator</b><br/>Box, Vec, etc.<br/>General-purpose malloc<br/>通用 malloc"]
    D -->|No / 否| H["<b>Bump arena</b><br/>bumpalo crate<br/>~2ns alloc, O(1) bulk free<br/>批量释放"]
    
    I -->|no_std| J["FixedArena (custom)<br/>or embedded-alloc"]
    I -->|std| K["bumpalo / typed-arena / slab"]
    
    style E fill:#91e5a3,color:#000
    style F fill:#91e5a3,color:#000
    style G fill:#89CFF0,color:#000
    style H fill:#91e5a3,color:#000
    style J fill:#ffa07a,color:#000
    style K fill:#91e5a3,color:#000
C Pattern / C 模式Rust Equivalent / Rust 等效Key Advantage / 关键优势
Custom malloc() pool / 自定义 malloc()#[global_allocator] implType-safe, debuggable / 类型安全,可调试
obstack (GNU)bumpalo::BumpLifetime-scoped, no use-after-free / 生命周期作用域,无释放后使用
Kernel slab (kmem_cache)slab::Slab<T>Type-safe, index-based / 类型安全,基于索引
Stack-allocated temp buffer / 栈分配临时缓冲FixedArena<N> (above)No heap, const constructible / 无堆分配,可 const 构造
alloca()[T; N] or SmallVecCompile-time sized, no UB / 编译时确定大小,无 UB

Cross-reference / 交叉引用:有关裸机分配器设置(使用 embedded-alloc 配合 #[global_allocator]),请参见《面向 C 程序员的 Rust 培训》第 15.1 章“全局分配器设置”,其中涵盖了嵌入式特定的引导加载。

Key Takeaways — Unsafe Rust / 关键要点:Unsafe Rust

  • Document invariants (SAFETY: comments), encapsulate behind safe APIs, minimize unsafe scope / 记录不变量(SAFETY: 注释),将其封装在安全 API 之后,并最小化 unsafe 的作用范围
  • [const { MaybeUninit::uninit() }; N] (Rust 1.79+) replaces the old assume_init anti-pattern / [const { MaybeUninit::uninit() }; N](Rust 1.79+)取代了旧的 assume_init 反模式
  • FFI requires extern "C", #[repr(C)], and careful null/lifetime handling / FFI 需要 extern "C"#[repr(C)] 以及对空指针/生命周期的谨慎处理
  • Arena and slab allocators trade general-purpose flexibility for allocation speed / Arena 和 Slab 分配器通过牺牲通用灵活性来换取分配速度

See also / 延伸阅读Ch 4 — PhantomData 了解与 unsafe 代码相关的型变(variance)和 drop-check 交互。Ch 9 — Smart Pointers 了解 Pin 和自引用类型。


Exercise: Safe Wrapper around Unsafe ★★★ (~45 min) / 练习:围绕 Unsafe 编写安全封装

Write a FixedVec<T, const N: usize> — a fixed-capacity, stack-allocated vector. Requirements:

  • push(&mut self, value: T) -> Result<(), T> returns Err(value) when full
  • pop(&mut self) -> Option<T> returns and removes the last element
  • as_slice(&self) -> &[T] borrows initialized elements
  • All public methods must be safe; all unsafe must be encapsulated with SAFETY: comments
  • Drop must clean up initialized elements

编写一个 FixedVec<T, const N: usize> —— 一个固定容量的、分配在栈上的 vector。 要求:

  • push(&mut self, value: T) -> Result<(), T>:如果已满,返回 Err(value)
  • pop(&mut self) -> Option<T>:返回并移除最后一个元素。
  • as_slice(&self) -> &[T]:借用已初始化的元素。
  • 所有公共方法必须是安全的(safe);所有 unsafe 操作必须被封装在内,并附带 SAFETY: 注释。
  • Drop 必须清理已初始化的元素。
🔑 Solution / 参考答案
use std::mem::MaybeUninit;

pub struct FixedVec<T, const N: usize> {
    data: [MaybeUninit<T>; N],
    len: usize,
}

impl<T, const N: usize> FixedVec<T, N> {
    pub fn new() -> Self {
        FixedVec {
            data: [const { MaybeUninit::uninit() }; N],
            len: 0,
        }
    }

    pub fn push(&mut self, value: T) -> Result<(), T> {
        if self.len >= N { return Err(value); }
        // SAFETY: len < N, so data[len] is within bounds.
        // 安全性:len < N,因此 data[len] 处于边界内。
        self.data[self.len] = MaybeUninit::new(value);
        self.len += 1;
        Ok(())
    }

    pub fn pop(&mut self) -> Option<T> {
        if self.len == 0 { return None; }
        self.len -= 1;
        // SAFETY: data[len] was initialized (len was > 0 before decrement).
        // 安全性:data[len] 已初始化(len 在递减前大于 0)。
        Some(unsafe { self.data[self.len].assume_init_read() })
    }

    pub fn as_slice(&self) -> &[T] {
        // SAFETY: data[0..len] are all initialized, and MaybeUninit<T>
        // has the same layout as T.
        // 安全性:data[0..len] 均已初始化,且 MaybeUninit<T> 与 T 的布局相同。
        unsafe { std::slice::from_raw_parts(self.data.as_ptr() as *const T, self.len) }
    }

    pub fn len(&self) -> usize { self.len }
    pub fn is_empty(&self) -> bool { self.len == 0 }
}

impl<T, const N: usize> Drop for FixedVec<T, N> {
    fn drop(&mut self) {
        // SAFETY: data[0..len] are initialized — drop each one.
        // 安全性:data[0..len] 已初始化 —— 逐个释放它们。
        for i in 0..self.len {
            unsafe { self.data[i].assume_init_drop(); }
        }
    }
}

fn main() {
    let mut v = FixedVec::<String, 4>::new();
    v.push("hello".into()).unwrap();
    v.push("world".into()).unwrap();
    assert_eq!(v.as_slice(), &["hello", "world"]);
    assert_eq!(v.pop(), Some("world".into()));
    assert_eq!(v.len(), 1);
}

13. Macros — Code That Writes Code / 13. 宏:生成代码的代码 🟡

What you’ll learn / 你将学到:

  • Declarative macros (macro_rules!) with pattern matching and repetition / 声明式宏(macro_rules!)中的模式匹配与重复操作
  • When macros are the right tool vs generics/traits / 宏与泛型/trait 的权衡及适用场景
  • Procedural macros: derive, attribute, and function-like / 过程宏:派生宏(derive)、属性宏(attribute)和函数式宏
  • Writing a custom derive macro with syn and quote / 使用 synquote 编写自定义派生宏

Declarative Macros (macro_rules!) / 声明式宏 (macro_rules!)

Macros match patterns on syntax and expand to code at compile time:

宏在编译时根据语法匹配模式并展开为代码:

#![allow(unused)]
fn main() {
// A simple macro that creates a HashMap
// 一个创建 HashMap 的简单宏
macro_rules! hashmap {
    // Match: key => value pairs separated by commas
    // 匹配:以逗号分隔的 key => value 键值对
    ( $( $key:expr => $value:expr ),* $(,)? ) => {
        {
            let mut map = std::collections::HashMap::new();
            $( map.insert($key, $value); )*
            map
        }
    };
}

let scores = hashmap! {
    "Alice" => 95,
    "Bob" => 87,
    "Carol" => 92,
};
// Expands to:
// 展开为:
// let mut map = HashMap::new();
// map.insert("Alice", 95);
// map.insert("Bob", 87);
// map.insert("Carol", 92);
// map
}

Macro fragment types / 宏片段类型

Fragment / 片段Matches / 匹配内容Example / 示例
$x:exprAny expression / 任何表达式42, a + b, foo()
$x:tyA type / 某种类型i32, Vec<String>
$x:identAn identifier / 标识符my_var, Config
$x:patA pattern / 模式Some(x), _
$x:stmtA statement / 语句let x = 5;
$x:ttA single token tree / 单个标记树Anything / 任何东西 (最灵活)
$x:literalA literal value / 字面量42, "hello", true

Repetition / 重复操作$( ... ),* means “zero or more, comma-separated” / 表示“零个或多个且以逗号分隔”

#![allow(unused)]
fn main() {
// Generate test functions automatically
// 自动生成测试函数
macro_rules! test_cases {
    ( $( $name:ident: $input:expr => $expected:expr ),* $(,)? ) => {
        $(
            #[test]
            fn $name() {
                assert_eq!(process($input), $expected);
            }
        )*
    };
}

test_cases! {
    test_empty: "" => "",
    test_hello: "hello" => "HELLO",
    test_trim: "  spaces  " => "SPACES",
}
// Generates three separate #[test] functions
// 生成三个独立的 #[test] 函数
}

When (Not) to Use Macros / 何时(不)使用宏

Use macros when / 在以下场景使用宏

  • Reducing boilerplate that traits/generics can’t handle (variadic arguments, DRY test generation) / 减少 trait/泛型无法处理的样板代码(如变长参数、减少重复的测试生成)
  • Creating DSLs (html!, sql!, vec!) / 创建领域特定语言 (DSL)
  • Conditional code generation (cfg!, compile_error!) / 条件代码生成

Don’t use macros when / 在以下场景不要使用宏

  • A function or generic would work (macros are harder to debug, autocomplete doesn’t help) / 函数或泛型可以完成的任务(宏更难调试,且编辑器自动补全无效)
  • You need type checking inside the macro (macros operate on tokens, not types) / 需要在宏内部进行类型检查(宏操作的是标记,而非类型)
  • The pattern is used once or twice (not worth the abstraction cost) / 某种模式只用到一两次(不值得承担抽象成本)
#![allow(unused)]
fn main() {
// ❌ Unnecessary macro — a function works fine:
// ❌ 没必要的宏 —— 函数就很好用:
macro_rules! double {
    ($x:expr) => { $x * 2 };
}

// ✅ Just use a function:
// ✅ 直接用函数:
fn double(x: i32) -> i32 { x * 2 }

// ✅ Good macro use — variadic, can't be a function:
// ✅ 宏的良好应用场景 —— 变长参数,无法用函数实现:
macro_rules! println {
    ($($arg:tt)*) => { /* format string + args */ };
}
}

Procedural Macros Overview / 过程宏概览

Procedural macros are Rust functions that transform token streams. They require a separate crate with proc-macro = true:

过程宏是转换标记流(token streams)的 Rust 函数。它们需要一个带有 proc-macro = true 设置的独立 crate:

#![allow(unused)]
fn main() {
// Three types of proc macros:
// 三种类型的过程宏:

// 1. Derive macros — #[derive(MyTrait)]
// 1. 派生宏 —— #[derive(MyTrait)]
// Generate trait implementations from struct definitions
// 根据结构体定义生成 trait 实现
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Config {
    name: String,
    port: u16,
}

// 2. Attribute macros — #[my_attribute]
// 2. 属性宏 —— #[my_attribute]
// Transform the annotated item
// 转换被标注的项
#[route(GET, "/api/users")]
async fn list_users() -> Json<Vec<User>> { /* ... */ }

// 3. Function-like macros — my_macro!(...)
// 3. 函数式宏 —— my_macro!(...)
// Custom syntax
// 自定义语法
let query = sql!(SELECT * FROM users WHERE id = ?);
}

Derive Macros in Practice / 派生宏实践

The most common proc macro type. Here’s how #[derive(Debug)] works conceptually:

这是最常用的过程宏类型。以下是 #[derive(Debug)] 在概念上的工作原理:

#![allow(unused)]
fn main() {
// Input (your struct):
// 输入(你的结构体):
#[derive(Debug)]
struct Point {
    x: f64,
    y: f64,
}

// The derive macro generates:
// 派生宏会生成:
impl std::fmt::Debug for Point {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Point")
            .field("x", &self.x)
            .field("y", &self.y)
            .finish()
    }
}
}

Commonly used derive macros / 常用派生宏

Derive / 派生Crate / 库What It Generates / 生成内容
Debugstdfmt::Debug impl (debug printing / 调试打印)
Clone, CopystdValue duplication / 值复制
PartialEq, EqstdEquality comparison / 等值比较
HashstdHashing for HashMap keys / HashMap 键的哈希计算
Serialize, DeserializeserdeJSON/YAML/etc. encoding / 编码
Errorthiserrorstd::error::Error + Display
ParserclapCLI argument parsing / 命令行参数解析
Builderderive_builderBuilder pattern / 建造者模式

Practical advice / 实践建议:请大胆使用派生宏 —— 它们能消除易出错的样板代码。编写自己的过程宏是一个进阶课题;在尝试构建自定义宏之前,请先熟悉现有宏(如 serdethiserrorclap)的使用。

Macro Hygiene and $crate / 宏卫生性与 $crate

Hygiene / 卫生性 means that identifiers created inside a macro don’t collide with identifiers in the caller’s scope. Rust’s macro_rules! is partially hygienic:

卫生性 意味着在宏内部创建的标识符不会与调用者作用域内的标识符发生冲突。Rust 的 macro_rules!部分 卫生的:

macro_rules! make_var {
    () => {
        let x = 42; // This 'x' is in the MACRO's scope
                    // 这个 'x' 位于宏的作用域内
    };
}

fn main() {
    let x = 10;
    make_var!();   // Creates a different 'x' (hygienic)
                   // 创建一个不同的 'x'(卫生性体现)
    println!("{x}"); // Prints 10, not 42 — macro's x doesn't leak
                     // 打印 10 而非 42 —— 宏内部的 x 不会泄露出来
}

$crate:When writing macros in a library, use $crate to refer to your own crate — it resolves correctly regardless of how users import your crate:

$crate:在编写库宏时,使用 $crate 来引用你自己的 crate —— 无论用户如何导入你的 crate,它都能正确解析:

#![allow(unused)]
fn main() {
// In my_diagnostics crate:
// 在 my_diagnostics crate 中:

pub fn log_result(msg: &str) {
    println!("[diag] {msg}");
}

#[macro_export]
macro_rules! diag_log {
    ($($arg:tt)*) => {
        // ✅ $crate always resolves to my_diagnostics, even if the user
        // renamed the crate in their Cargo.toml
        // ✅ $crate 总能解析到 my_diagnostics,即便用户在 Cargo.toml 中重命名了该 crate
        $crate::log_result(&format!($($arg)*))
    };
}

// ❌ Without $crate:
// ❌ 若不使用 $crate:
// my_diagnostics::log_result(...)  ← breaks if user writes:
//                                  ← 若用户按以下方式导入则会报错:
//   [dependencies]
//   diag = { package = "my_diagnostics", version = "1" }
}

Rule / 规则:Always use $crate:: in #[macro_export] macros. Never use your crate’s name directly.

始终在 #[macro_export] 宏中使用 $crate::。绝不要直接使用你的 crate 名称。

Recursive Macros and tt Munching / 递归宏与 tt 处理

Recursive macros process input one token at a time — a technique called tt munching (token-tree munching):

递归宏一次处理一个标记 —— 这种技术被称为 tt munching(标记树处理):

// Count the number of expressions passed to the macro
// 统计传递给宏的表达式数量
macro_rules! count {
    // Base case: no tokens left
    // 基础情况:没有剩余标记
    () => { 0usize };
    // Recursive case: consume one expression, count the rest
    // 递归情况:消耗一个表达式,并计算剩余部分
    ($head:expr $(, $tail:expr)* $(,)?) => {
        1usize + count!($($tail),*)
    };
}

fn main() {
    let n = count!("a", "b", "c", "d");
    assert_eq!(n, 4);

    // Works at compile time too:
    // 在编译时同样有效:
    const N: usize = count!(1, 2, 3);
    assert_eq!(N, 3);
}
#![allow(unused)]
fn main() {
// Build a heterogeneous tuple from a list of expressions:
// 从表达式列表构建一个异构元组:
macro_rules! tuple_from {
    // Base: single element
    // 基础:单个元素
    ($single:expr $(,)?) => { ($single,) };
    // Recursive: first element + rest
    // 递归:第一个元素 + 剩余部分
    ($head:expr, $($tail:expr),+ $(,)?) => {
        ($head, tuple_from!($($tail),+))
    };
}

let t = tuple_from!(1, "hello", 3.14, true);
// Expands to: (1, ("hello", (3.14, (true,))))
// 展开为:(1, ("hello", (3.14, (true,))))
}

Fragment specifier subtleties / 片段说明符的微妙之处

Fragment / 片段Gotcha / 陷阱
$x:exprGreedily parses — 1 + 2 is ONE expression / 贪婪解析 —— 1 + 2 是一个表达式
$x:tyGreedily parses — Vec<String> is one type / 贪婪解析 —— Vec<String> 是一个类型
$x:ttMatches exactly ONE token tree / 匹配确切的一个标记树 (最灵活,检查最少)
$x:identOnly plain identifiers — not paths / 仅限普通标识符 —— 不能是路径如 std::io
$x:patIn Rust 2021, matches A | B patterns / 在 Rust 2021 中匹配 A | B 模式

When to use tt / 何时使用 tt:When you need to forward tokens to another macro without the parser constraining them. $($args:tt)* is the “accept everything” pattern (used by println!, format!, vec!).

当你需要将标记转发给另一个宏且不希望解析器限制它们时。$($args:tt)* 是“接受一切”的模式(常用于 println!format!vec!)。

Writing a Derive Macro with syn and quote / 使用 synquote 编写派生宏

Derive macros live in a separate crate (proc-macro = true) and transform a token stream using syn (parse Rust) and quote (generate Rust):

派生宏存储在独立的 crate(设置 proc-macro = true)中,并使用 syn(解析 Rust)和 quote(生成 Rust)来处理标记流:

my_derive/Cargo.toml

[lib] proc-macro = true

[dependencies] syn = { version = “2”, features = [“full”] } quote = “1” proc-macro2 = “1”

#![allow(unused)]
fn main() {
// my_derive/src/lib.rs
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput};

/// Derive macro that generates a `describe()` method
/// returning the struct name and field names.
/// 派生宏,用于生成一个 `describe()` 方法,
/// 该方法返回结构体名称及字段名称。
#[proc_macro_derive(Describe)]
pub fn derive_describe(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as DeriveInput);
    let name = &input.ident;
    let name_str = name.to_string();

    // Extract field names (only for structs with named fields)
    // 提取字段名称(仅适用于拥有命名字段的结构体)
    let fields = match &input.data {
        syn::Data::Struct(data) => {
            data.fields.iter()
                .filter_map(|f| f.ident.as_ref())
                .map(|id| id.to_string())
                .collect::<Vec<_>>()
        }
        _ => vec![],
    };

    let field_list = fields.join(", ");

    let expanded = quote! {
        impl #name {
            pub fn describe() -> String {
                format!("{} {{ {} }}", #name_str, #field_list)
            }
        }
    };

    TokenStream::from(expanded)
}
}
// In the application crate:
use my_derive::Describe;

#[derive(Describe)]
struct SensorReading {
    sensor_id: u16,
    value: f64,
    timestamp: u64,
}

fn main() {
    println!("{}", SensorReading::describe());
    // "SensorReading { sensor_id, value, timestamp }"
}

The workflow / 工作流程TokenStream (raw tokens / 原始标记) → syn::parse (AST) → inspect/transform (检查/转换) → quote! (generate tokens / 生成标记) → TokenStream (back to compiler / 回传给编译器).

Crate / 库Role / 角色Key types / 关键类型
proc-macroCompiler interface / 编译器接口TokenStream
synParse Rust source into AST / 将 Rust 源码解析为 ASTDeriveInput, ItemFn, Type
quoteGenerate Rust tokens from templates / 从模板生成 Rust 标记quote!{}, #variable interpolation
proc-macro2Bridge between syn/quote and proc-macro / syn/quote 与 proc-macro 之间的桥梁TokenStream, Span

Practical tip / 实践技巧:在编写自己的派生宏之前,先研究一下简单的派生宏源码(如 thiserrorderive_more)。cargo expand 命令(通过 cargo-expand 工具)可以显示任何宏展开后的样子 —— 这对调试非常有帮助。

Key Takeaways — Macros / 关键要点:宏

  • macro_rules! for simple code generation; proc macros (syn + quote) for complex derives / macro_rules! 用于简单的代码生成;过程宏(syn + quote)用于复杂的派生
  • Prefer generics/traits over macros when possible — macros are harder to debug and maintain / 尽可能优先考虑泛型/trait 而非宏 —— 宏更难调试和维护
  • $crate ensures hygiene; tt munching enables recursive pattern matching / $crate 确保库的卫生性;tt munching 实现了递归模式匹配

See also / 延伸阅读Ch 2 — Traits 了解何时 trait/泛型优于宏。Ch 14 — Testing 了解如何测试由宏生成的代码。

flowchart LR
    A["Source code<br/>源代码"] --> B["macro_rules!<br>pattern matching<br>模式匹配"]
    A --> C["#[derive(MyMacro)]<br>proc macro<br>过程宏"]

    B --> D["Token expansion<br>标记展开"]
    C --> E["syn: parse AST<br>解析 AST"]
    E --> F["Transform<br>转换"]
    F --> G["quote!: generate tokens<br>生成标记"]
    G --> D

    D --> H["Compiled code<br>编译后的代码"]

    style A fill:#e8f4f8,stroke:#2980b9,color:#000
    style B fill:#d4efdf,stroke:#27ae60,color:#000
    style C fill:#fdebd0,stroke:#e67e22,color:#000
    style D fill:#fef9e7,stroke:#f1c40f,color:#000
    style E fill:#fdebd0,stroke:#e67e22,color:#000
    style F fill:#fdebd0,stroke:#e67e22,color:#000
    style G fill:#fdebd0,stroke:#e67e22,color:#000
    style H fill:#d4efdf,stroke:#27ae60,color:#000

Exercise: Declarative Macro — map! ★ (~15 min) / 练习:声明式宏 —— map!

Write a map! macro that creates a HashMap from key-value pairs:

编写一个 map! 宏,用于从键值对创建 HashMap

let m = map! {
    "host" => "localhost",
    "port" => "8080",
};
assert_eq!(m.get("host"), Some(&"localhost"));

Requirements: support trailing comma and empty invocation map!{}.

要求:支持尾部逗号和空调用 map!{}

🔑 Solution / 参考答案
macro_rules! map {
    () => { std::collections::HashMap::new() };
    ( $( $key:expr => $val:expr ),+ $(,)? ) => {{
        let mut m = std::collections::HashMap::new();
        $( m.insert($key, $val); )+
        m
    }};
}

fn main() {
    let config = map! {
        "host" => "localhost",
        "port" => "8080",
        "timeout" => "30",
    };
    assert_eq!(config.len(), 3);
    assert_eq!(config["host"], "localhost");

    let empty: std::collections::HashMap<String, String> = map!();
    assert!(empty.is_empty());

    let scores = map! { 1 => 100, 2 => 200 };
    assert_eq!(scores[&1], 100);
}

14. Testing and Benchmarking Patterns / 14. 测试与基准模式 🟢

What you’ll learn / 你将学到:

  • Rust’s three test tiers: unit, integration, and doc tests / Rust 的三级测试体系:单元测试、集成测试和文档测试
  • Property-based testing with proptest for discovering edge cases / 使用 proptest 进行基于属性的测试以发现边界情况
  • Benchmarking with criterion for reliable performance measurement / 使用 criterion 进行基准测试以实现可靠的性能衡量
  • Mocking strategies without heavyweight frameworks / 不依赖重型框架的 Mock 策略

Unit Tests, Integration Tests, Doc Tests / 单元测试、集成测试、文档测试

Rust has three testing tiers built into the language:

Rust 语言内置了三个层级的测试体系:

#![allow(unused)]
fn main() {
// --- Unit tests: in the same file as the code ---
// --- 单元测试:与代码位于同一文件中 ---
pub fn factorial(n: u64) -> u64 {
    (1..=n).product()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_factorial_zero() {
        // (1..=0).product() returns 1 — the multiplication identity for empty ranges
        // (1..=0).product() 返回 1 —— 这是空范围的乘法单位元
        assert_eq!(factorial(0), 1);
    }

    #[test]
    fn test_factorial_five() {
        assert_eq!(factorial(5), 120);
    }

    #[test]
    #[cfg(debug_assertions)] // overflow checks are only enabled in debug mode
                             // 溢出检查仅在调试模式下启用
    #[should_panic(expected = "overflow")]
    fn test_factorial_overflow() {
        // ⚠️ This test only passes in debug mode (overflow checks enabled).
        // In release mode (`cargo test --release`), u64 arithmetic wraps
        // silently and no panic occurs. Use `checked_mul` or the
        // `overflow-checks = true` profile setting for release-mode safety.
        // ⚠️ 此测试仅在调试模式下通过(启用了溢出检查)。
        // 在发布模式下(`cargo test --release`),u64 算术会静默回绕而不发生 panic。
        // 为了发布模式下的安全性,请使用 `checked_mul` 或 `overflow-checks = true` 配置。
        factorial(100); // Should panic on overflow
                        // 溢出时应当 panic
    }

    #[test]
    fn test_with_result() -> Result<(), Box<dyn std::error::Error>> {
        // Tests can return Result — ? works inside!
        // 测试可以返回 Result —— 内部可以使用 ? 操作符!
        let value: u64 = "42".parse()?;
        assert_eq!(value, 42);
        Ok(())
    }
}
}
#![allow(unused)]
fn main() {
// --- Integration tests: in tests/ directory ---
// --- 集成测试:位于 tests/ 目录中 ---
// tests/integration_test.rs
// These test your crate's PUBLIC API only
// 这些测试仅针对你的 crate 的 公共 API

use my_crate::factorial;

#[test]
fn test_factorial_from_outside() {
    assert_eq!(factorial(10), 3_628_800);
}
}
#![allow(unused)]
fn main() {
// --- Doc tests: in documentation comments ---
// --- 文档测试:位于文档注释中 ---
/// Computes the factorial of `n`.
/// 计算 `n` 的阶乘。
///
/// # Examples
///
/// ```
/// use my_crate::factorial;
/// assert_eq!(factorial(5), 120);
/// ```
///
/// # Panics
///
/// Panics if the result overflows `u64`.
///
/// ```should_panic
/// my_crate::factorial(100);
/// ```
pub fn factorial(n: u64) -> u64 {
    (1..=n).product()
}
// Doc tests are compiled and run by `cargo test` — they keep examples honest.
// 文档测试由 `cargo test` 编译并运行 —— 它们确保示例代码的准确性。
}

### Test Fixtures and Setup / 测试固件与初始化

```rust
#[cfg(test)]
mod tests {
    use super::*;

    // Shared setup — create a helper function
    // 共享初始化 —— 创建一个辅助函数
    fn setup_database() -> TestDb {
        let db = TestDb::new_in_memory();
        db.run_migrations();
        db.seed_test_data();
        db
    }

    #[test]
    fn test_user_creation() {
        let db = setup_database();
        let user = db.create_user("Alice", "alice@test.com").unwrap();
        assert_eq!(user.name, "Alice");
    }

    #[test]
    fn test_user_deletion() {
        let db = setup_database();
        db.create_user("Bob", "bob@test.com").unwrap();
        assert!(db.delete_user("Bob").is_ok());
        assert!(db.get_user("Bob").is_none());
    }

    // Cleanup with Drop (RAII):
    // 使用 Drop 进行清理 (RAII):
    struct TempDir {
        path: std::path::PathBuf,
    }

    impl TempDir {
        fn new() -> Self {
            // Cargo.toml: rand = "0.8"
            let path = std::env::temp_dir().join(format!("test_{}", rand::random::<u32>()));
            std::fs::create_dir_all(&path).unwrap();
            TempDir { path }
        }
    }

    impl Drop for TempDir {
        fn drop(&mut self) {
            let _ = std::fs::remove_dir_all(&self.path);
        }
    }

    #[test]
    fn test_file_operations() {
        let dir = TempDir::new(); // Created / 已创建
        std::fs::write(dir.path.join("test.txt"), "hello").unwrap();
        assert!(dir.path.join("test.txt").exists());
    } // dir dropped here → temp directory cleaned up
      // dir 在此处被释放 → 临时目录被清理
}

Property-Based Testing (proptest) / 基于属性的测试 (proptest)

Instead of testing specific values, test properties that should always hold:

不要只测试特定值,而应测试那些始终应当成立的 属性(properties)

#![allow(unused)]
fn main() {
// Cargo.toml: proptest = "1"
use proptest::prelude::*;

fn reverse(v: &[i32]) -> Vec<i32> {
    v.iter().rev().cloned().collect()
}

proptest! {
    #[test]
    fn test_reverse_twice_is_identity(v in prop::collection::vec(any::<i32>(), 0..100)) {
        // Property: reversing twice gives back the original
        // 属性:反转两次会得到原始输入
        assert_eq!(reverse(&reverse(&v)), v);
    }

    #[test]
    fn test_reverse_preserves_length(v in prop::collection::vec(any::<i32>(), 0..100)) {
        assert_eq!(reverse(&v).len(), v.len());
    }

    #[test]
    fn test_sort_is_idempotent(mut v in prop::collection::vec(any::<i32>(), 0..100)) {
        v.sort();
        let sorted_once = v.clone();
        v.sort();
        assert_eq!(v, sorted_once); // Sorting twice = sorting once
                                    // 排序两次 = 排序一次
    }

    #[test]
    fn test_parse_roundtrip(x in any::<f64>().prop_filter("finite", |x| x.is_finite())) {
        // Property: formatting then parsing gives back the same value
        // 属性:进行格式化后再解析会得到相同的值
        let s = format!("{x}");
        let parsed: f64 = s.parse().unwrap();
        prop_assert!((x - parsed).abs() < f64::EPSILON);
    }
}
}

When to use proptest / 何时使用 proptest:When you’re testing a function with a large input space and want confidence it works for edge cases you didn’t think of. proptest generates hundreds of random inputs and shrinks failures to the minimal reproducing case.

当你在测试一个输入空间巨大的函数,并希望确保它在你未曾想到的边界情况下也能正常工作时。proptest 会生成数百个随机输入,并会将失败案例缩减(shrink)为最小的可复现案例。

Benchmarking with criterion / 使用 criterion 进行基准测试

#![allow(unused)]
fn main() {
// Cargo.toml:
// [dev-dependencies]
// criterion = { version = "0.5", features = ["html_reports"] }
//
// [[bench]]
// name = "my_benchmarks"
// harness = false

// benches/my_benchmarks.rs
use criterion::{criterion_group, criterion_main, Criterion, black_box};

fn fibonacci(n: u64) -> u64 {
    match n {
        0 | 1 => n,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn bench_fibonacci(c: &mut Criterion) {
    c.bench_function("fibonacci 20", |b| {
        // Use black_box to prevent the compiler from optimizing away the call
        // 使用 black_box 防止编译器将调用优化掉
        b.iter(|| fibonacci(black_box(20)))
    });

    // Compare different implementations:
    // 比较不同实现:
    let mut group = c.benchmark_group("fibonacci_compare");
    for size in [10, 15, 20, 25] {
        group.bench_with_input(
            criterion::BenchmarkId::from_parameter(size),
            &size,
            |b, &size| b.iter(|| fibonacci(black_box(size))),
        );
    }
    group.finish();
}

criterion_group!(benches, bench_fibonacci);
criterion_main!(benches);

// Run / 运行: cargo bench
// Produces HTML reports in target/criterion/
// 在 target/criterion/ 中生成 HTML 报告
}

Mocking Strategies without Frameworks / 不依赖框架的 Mock 策略

Rust’s trait system provides natural dependency injection — no mocking framework required:

Rust 的 trait 系统提供了天然的依赖注入机制 —— 无需任何 Mock 框架:

#![allow(unused)]
fn main() {
// Define behavior as a trait
// 将行为定义为 trait
trait Clock {
    fn now(&self) -> std::time::Instant;
}

trait HttpClient {
    fn get(&self, url: &str) -> Result<String, String>;
}

// Production implementations
// 生产环境下的实现
struct RealClock;
impl Clock for RealClock {
    fn now(&self) -> std::time::Instant { std::time::Instant::now() }
}

// Service depends on abstractions
// 服务依赖于抽象
struct CacheService<C: Clock, H: HttpClient> {
    clock: C,
    client: H,
    ttl: std::time::Duration,
}

impl<C: Clock, H: HttpClient> CacheService<C, H> {
    fn fetch(&self, url: &str) -> Result<String, String> {
        // Uses self.clock and self.client — injectable
        // 使用 self.clock 和 self.client —— 可被注入
        self.client.get(url)
    }
}

// Test with mock implementations — no framework needed!
// 使用 Mock 实现进行测试 —— 无需框架!
#[cfg(test)]
mod tests {
    use super::*;

    struct MockClock {
        fixed_time: std::time::Instant,
    }
    impl Clock for MockClock {
        fn now(&self) -> std::time::Instant { self.fixed_time }
    }

    struct MockHttpClient {
        response: String,
    }
    impl HttpClient for MockHttpClient {
        fn get(&self, _url: &str) -> Result<String, String> {
            Ok(self.response.clone())
        }
    }

    #[test]
    fn test_cache_service() {
        let service = CacheService {
            clock: MockClock { fixed_time: std::time::Instant::now() },
            client: MockHttpClient { response: "cached data".into() },
            ttl: std::time::Duration::from_secs(300),
        };

        assert_eq!(service.fetch("http://example.com").unwrap(), "cached data");
    }
}
}

Test philosophy / 测试理念:Prefer real dependencies in integration tests, trait-based mocks in unit tests. Avoid mocking frameworks unless your dependency graph is complex — Rust’s trait generics handle most cases naturally.

在集成测试中优先使用真实的依赖,在单元测试中优先使用基于 trait 的 Mock。除非你的依赖图非常复杂,否则请尽量避免使用 Mock 框架 —— Rust 的 trait 泛型能够自然地处理大多数情况。

Key Takeaways — Testing / 关键要点:测试

  • Doc tests (///) double as documentation and regression tests — they’re compiled and run / 文档测试(///)既是文档也是回归测试 —— 它们会被编译并运行
  • proptest generates random inputs to find edge cases you’d never write manually / proptest 会生成随机输入以发现你永远不会手动编写的边界情况
  • criterion provides statistically rigorous benchmarks with HTML reports / criterion 提供了具有统计学严谨性的基准测试及 HTML 报告
  • Mock via trait generics + test doubles, not mock frameworks / 通过 trait 泛型 + 测试桩(test doubles)进行 Mock,而非通过 Mock 框架

See also / 延伸阅读Ch 13 — Macros 了解如何测试由宏生成的代码。Ch 15 — Crate Architecture and API Design 了解模块布局如何影响测试的组织形式。


Exercise: Property-Based Testing with proptest ★★ (~25 min) / 练习:使用 proptest 进行基于属性的测试

Write a SortedVec<T: Ord> wrapper that maintains a sorted invariant. Use proptest to verify that:

  1. After any sequence of insertions, the internal vec is always sorted
  2. contains() agrees with the stdlib Vec::contains()
  3. The length equals the number of insertions

编写一个 SortedVec<T: Ord> 封装,以维持排序不变量。使用 proptest 来验证以下内容:

  1. 在任何插入序列之后,内部的 vector 始终是已排序的。
  2. contains() 与标准库的 Vec::contains() 结果一致。
  3. 长度等于插入的次数。
🔑 Solution / 参考答案
#[derive(Debug)]
struct SortedVec<T: Ord> {
    inner: Vec<T>,
}

impl<T: Ord> SortedVec<T> {
    fn new() -> Self { SortedVec { inner: Vec::new() } }

    fn insert(&mut self, value: T) {
        let pos = self.inner.binary_search(&value).unwrap_or_else(|p| p);
        self.inner.insert(pos, value);
    }

    fn contains(&self, value: &T) -> bool {
        self.inner.binary_search(value).is_ok()
    }

    fn len(&self) -> usize { self.inner.len() }
    fn as_slice(&self) -> &[T] { &self.inner }
}

#[cfg(test)]
mod tests {
    use super::*;
    use proptest::prelude::*;

    proptest! {
        #[test]
        fn always_sorted(values in proptest::collection::vec(-1000i32..1000, 0..100)) {
            let mut sv = SortedVec::new();
            for v in &values {
                sv.insert(*v);
            }
            for w in sv.as_slice().windows(2) {
                prop_assert!(w[0] <= w[1]);
            }
            prop_assert_eq!(sv.len(), values.len());
        }

        #[test]
        fn contains_matches_stdlib(values in proptest::collection::vec(0i32..50, 1..30)) {
            let mut sv = SortedVec::new();
            for v in &values {
                sv.insert(*v);
            }
            for v in &values {
                prop_assert!(sv.contains(v));
            }
            prop_assert!(!sv.contains(&9999));
        }
    }
}

15. Crate Architecture and API Design / 15. Crate 架构与 API 设计 🟡

What you’ll learn / 你将学到:

  • Module layout conventions and re-export strategies / 模块布局惯例与重导出策略
  • The public API design checklist for polished crates / 打磨精品 crate 的公共 API 设计清单
  • Ergonomic parameter patterns: impl Into, AsRef, Cow / 易用的参数模式:impl IntoAsRefCow
  • “Parse, don’t validate” with TryFrom and validated types / 使用 TryFrom 和校验后的类型践行“以解析代替校验”
  • Feature flags, conditional compilation, and workspace organization / 特性标志(Feature flags)、条件编译及工作空间组织

Module Layout Conventions / 模块布局惯例

my_crate/
├── Cargo.toml
├── src/
│   ├── lib.rs          # Crate root — re-exports and public API / Crate 根 —— 重导出与公共 API
│   ├── config.rs       # Feature module / 功能模块
│   ├── parser/         # Complex module with sub-modules / 带有子模块的复杂模块
│   │   ├── mod.rs      # or parser.rs at parent level (Rust 2018+) / 或父级的 parser.rs
│   │   ├── lexer.rs
│   │   └── ast.rs
│   ├── error.rs        # Error types / 错误类型
│   └── utils.rs        # Internal helpers (pub(crate)) / 内部辅助程序
├── tests/
│   └── integration.rs  # Integration tests / 集成测试
├── benches/
│   └── perf.rs         # Benchmarks / 基准测试
└── examples/
    └── basic.rs        # cargo run --example basic / 示例代码
#![allow(unused)]
fn main() {
// lib.rs — curate your public API with re-exports:
// lib.rs — 通过重导出打磨你的公共 API:
mod config;
mod error;
mod parser;
mod utils;

// Re-export what users need:
// 重导出用户需要的项:
pub use config::Config;
pub use error::Error;
pub use parser::Parser;

// Public types are at the crate root — users write:
// 公共类型位于 crate 根部 —— 调用者可以这样写:
// use my_crate::Config;
// NOT: use my_crate::config::Config;
// 而非:use my_crate::config::Config;
}

Visibility modifiers / 可见性修饰符

Modifier / 修饰符Visible To / 可见范围
pubEveryone / 所有人
pub(crate)This crate only / 仅限当前 crate
pub(super)Parent module / 父模块
pub(in path)Specific ancestor module / 特定的祖先模块
(none / 无)Current module and its children / 当前模块及其子模块

Public API Design Checklist / 公共 API 设计清单

  1. Accept references, return owned / 接收引用,返回所有权fn process(input: &str) -> String
  2. Use impl Trait for parameters / 为参数使用 impl Trait — 使用 fn read(r: impl Read) 而非 fn read<R: Read>(r: R) 以获得更整洁的签名
  3. Return Result, not panic! / 返回 Result 而非 panic! — 让调用者决定如何处理错误
  4. Implement standard traits / 实现标准 traitDebugDisplayCloneDefaultFrom/Into
  5. Make invalid states unrepresentable / 使无效状态无法表示 — 使用类型状态(type states)和新类型(newtypes)
  6. Follow the builder pattern for complex configuration / 对复杂配置采用建造者模式 — 如果字段是必填的,请结合使用类型状态
  7. Seal traits you don’t want users to implement / 密封不希望用户实现的 traitpub trait Sealed: private::Sealed {}
  8. Mark types and functions #[must_use] / 将类型和函数标注为 #[must_use] — 防止静默丢弃重要的 Result、guard 或数值。适用于任何忽略其返回值几乎肯定会导致 bug 的类型:

#[must_use = “dropping the guard immediately releases the lock”] #[must_use = “丢弃 guard 会立即释放锁”] pub struct LockGuard<’a, T> { /* … */ }

#[must_use] pub fn validate(input: &str) -> Result<ValidInput, ValidationError> { /* … */ }


// Sealed trait pattern — users can use but not implement:
// 密封 trait 模式 —— 用户可以使用但无法实现该 trait:
mod private {
    pub trait Sealed {}
}

pub trait DatabaseDriver: private::Sealed {
    fn connect(&self, url: &str) -> Connection;
}

// Only types in THIS crate can implement Sealed → only we can implement DatabaseDriver
// 只有当前 crate 中的类型才能实现 Sealed → 只有我们能实现 DatabaseDriver
pub struct PostgresDriver;
impl private::Sealed for PostgresDriver {}
impl DatabaseDriver for PostgresDriver {
    fn connect(&self, url: &str) -> Connection { /* ... */ }
}

#[non_exhaustive] — mark public enums and structs so that adding variants or fields is not a breaking change. Downstream crates must use a wildcard arm (_ =>) in match statements, and cannot construct the type with struct literal syntax:

#[non_exhaustive] —— 标注公共枚举和结构体,使得添加变体或字段不再是破坏性变更。下游 crate 在 match 语句中必须包含通配符分支(_ =>),并且不能使用结构体字面量语法来构造该类型:

#![allow(unused)]
fn main() {
#[non_exhaustive]
pub enum DiagError {
    Timeout,
    HardwareFault,
    // Adding a new variant in a future release is NOT a semver break.
    // 在未来版本中添加新变体 不属于破坏性的 semver 变更。
}
}

Ergonomic Parameter Patterns — impl Into, AsRef, Cow / 易用的参数模式:impl IntoAsRefCow

One of Rust’s most impactful API patterns is accepting the most general type in function parameters, so callers don’t need repetitive .to_string(), &*s, or .as_ref() at every call site. This is the Rust-specific version of “be liberal in what you accept.”

Rust 中最有影响力的 API 模式之一就是在函数参数中接收 最宽泛的类型,这样调用者就不需要在每个调用处重复编写 .to_string()&*s.as_ref()。这是“宽以待人,严于律己”原则在 Rust 中的具体应用。

impl Into<T> — Accept Anything Convertible / 接受任何可转换的类型

#![allow(unused)]
fn main() {
// ❌ Friction: callers must convert manually
// ❌ 阻碍:调用者必须手动转换
fn connect(host: String, port: u16) -> Connection {
    // ...
}
connect("localhost".to_string(), 5432);  // Annoying .to_string() / 烦人的 .to_string()
connect(hostname.clone(), 5432);          // Unnecessary clone if we already have String / 如果已有 String,则 clone 是多余的

// ✅ Ergonomic: accept anything that converts to String
// ✅ 易用:接收任何可以转换为 String 的类型
fn connect(host: impl Into<String>, port: u16) -> Connection {
    let host = host.into();  // Convert once, inside the function
                             // 在函数内部进行一次转换
    // ...
}
connect("localhost", 5432);     // &str — zero friction / 零阻碍
connect(hostname, 5432);        // String — moved, no clone / 移动所有权,无需 clone
connect(arc_str, 5432);         // Arc<str> if From is implemented / 如果实现了 From,则支持 Arc<str>
}

This works because Rust’s From/Into trait pair provides blanket conversions. When you accept impl Into<T>, you’re saying: “give me anything that knows how to become a T.”

这是得益于 Rust 的 From/Into trait 对所提供的覆盖式转换(blanket conversions)。当你接收 impl Into<T> 时,你的意思是:“给我任何知道如何变成 T 的东西。”

AsRef<T> — Borrow as a Reference / 作为引用借用

AsRef<T> is the borrowing counterpart to Into<T>. Use it when you only need to read the data, not take ownership:

AsRef<T>Into<T> 在借用方面的对应物。当你只需要 读取 数据而不需要获取所有权时,请使用它:

#![allow(unused)]
fn main() {
use std::path::Path;

// ❌ Forces callers to convert to &Path
// ❌ 强制调用者转换为 &Path
fn file_exists(path: &Path) -> bool {
    path.exists()
}
file_exists(Path::new("/tmp/test.txt"));  // Awkward / 略显笨拙

// ✅ Accept anything that can behave as a &Path
// ✅ 接收任何可以表现为 &Path 的类型
fn file_exists(path: impl AsRef<Path>) -> bool {
    path.as_ref().exists()
}
file_exists("/tmp/test.txt");                    // &str ✅
file_exists(String::from("/tmp/test.txt"));      // String ✅
file_exists(Path::new("/tmp/test.txt"));         // &Path ✅
file_exists(PathBuf::from("/tmp/test.txt"));     // PathBuf ✅

// Same pattern for string-like parameters:
// 对于类字符串参数同样适用:
fn log_message(msg: impl AsRef<str>) {
    println!("[LOG] {}", msg.as_ref());
}
log_message("hello");                    // &str ✅
log_message(String::from("hello"));      // String ✅
}

Cow<T> — Clone on Write / 写时克隆

Cow<'a, T> (Clone on Write) delays allocation until mutation is needed. It holds either a borrowed &T or an owned T::Owned. This is perfect when most calls don’t need to modify the data:

Cow<'a, T>(写时克隆)将分配推迟到需要修改时。它要么持有一个借用的 &T,要么持有一个具有所有权的 T::Owned。当大多数调用不需要修改数据时,它是完美的选择:

#![allow(unused)]
fn main() {
use std::borrow::Cow;

/// Normalizes a diagnostic message — only allocates if changes are needed.
/// 规范化诊断消息 —— 仅在需要更改时才分配内存。
fn normalize_message(msg: &str) -> Cow<'_, str> {
    if msg.contains('\t') || msg.contains('\r') {
        // Must allocate — we need to modify the content
        // 必须分配 —— 我们需要修改内容
        Cow::Owned(msg.replace('\t', "    ").replace('\r', ""))
    } else {
        // No allocation — just borrow the original
        // 无需分配 —— 直接借用原本的内容
        Cow::Borrowed(msg)
    }
}

// Most messages pass through without allocation:
// 大多消息不需要分配内存即可通过:
let clean = normalize_message("All tests passed");          // Borrowed — free / 借用 —— 零开销
let fixed = normalize_message("Error:\tfailed\r\n");        // Owned — allocated / 具有所有权 —— 已分配内存

// Cow<str> implements Deref<Target=str>, so it works like &str:
// Cow<str> 实现了 Deref<Target=str>,所以它用起来像 &str:
println!("{}", clean);
println!("{}", fixed.to_uppercase());
}

Quick Reference: Which to Use / 快速参考:该使用哪一个

Do you need ownership of the data inside the function? / 函数内部需要获取数据所有权吗?
├── YES / 是 → impl Into<T>
│             "Give me anything that can become a T"
│             “给我任何能变成 T 的东西”
└── NO / 否 → Do you only need to read it? / 你是否只需要读取它?
     ├── YES / 是 → impl AsRef<T> or &T
     │             "Give me anything I can borrow as a &T"
     │             “给我任何能借用为 &T 的东西”
     └── MAYBE / 可能 (might need to modify sometimes? / 有时可能需要修改?)
          └── Cow<'_, T>
              "Borrow if possible, clone only when you must"
              “尽可能借用,仅在必须时才克隆”
Pattern / 模式Ownership / 所有权Allocation / 内存分配When to use / 适用场景
&strBorrowed / 借用Never / 从不Simple string params / 简单的字符串参数
impl AsRef<str>Borrowed / 借用Never / 从不Accept String, &str, etc. — read only / 接收 String、&str 等 —— 仅限读取
impl Into<String>Owned / 所有权On conversion / 转换时Accept &str, String — will store/own / 接收 &str、String —— 用于存储/持有所有权
Cow<'_, str>Either / 任意Only if modified / 仅在修改时Processing that usually doesn’t modify / 通常不需要修改的处理过程
&[u8] / impl AsRef<[u8]>Borrowed / 借用Never / 从不Byte-oriented APIs / 面向字节的 API

Borrow<T> vs AsRef<T>:Both provide &T, but Borrow<T> additionally guarantees that Eq, Ord, and Hash are consistent between the original and borrowed form. This is why HashMap<String, V>::get() accepts &Q where String: Borrow<Q> — not AsRef. Use Borrow when the borrowed form is used as a lookup key; use AsRef for general “give me a reference” parameters.

Borrow<T> vs AsRef<T>:两者都提供 &T,但 Borrow<T> 额外保证了 EqOrdHash 在原始形式和借用形式之间是 一致的。这就是为什么 HashMap<String, V>::get() 接收的是 &Q where String: Borrow<Q> 而非 AsRef。当借用形式被用作查找键时,请使用 Borrow;对于普通的“给我一个引用”参数,请使用 AsRef

Composing Conversions in APIs / 在 API 中组合使用转换

#![allow(unused)]
fn main() {
/// A well-designed diagnostic API using ergonomic parameters:
/// 一个使用易用参数精心设计的诊断 API:
pub struct DiagRunner {
    name: String,
    config_path: PathBuf,
    results: HashMap<String, TestResult>,
}

impl DiagRunner {
    /// Accept any string-like type for name, any path-like type for config.
    /// name 接收任何类字符串类型,config 接收任何类路径类型。
    pub fn new(
        name: impl Into<String>,
        config_path: impl Into<PathBuf>,
    ) -> Self {
        DiagRunner {
            name: name.into(),
            config_path: config_path.into(),
        }
    }

    /// Accept any AsRef<str> for read-only lookup.
    /// 接收任何 AsRef<str> 用于只读查找。
    pub fn get_result(&self, test_name: impl AsRef<str>) -> Option<&TestResult> {
        self.results.get(test_name.as_ref())
    }
}

// All of these work with zero caller friction:
// 以下所有调用都能正常工作,且对调用者零阻碍:
let runner = DiagRunner::new("GPU Diag", "/etc/diag_tool/config.json");
let runner = DiagRunner::new(format!("Diag-{}", node_id), config_path);
let runner = DiagRunner::new(name_string, path_buf);
}

Case Study: Designing a Public Crate API — Before & After / 案例研究:设计公共 Crate API —— 之前与之后

A real-world example of evolving a stringly-typed internal API into an ergonomic, type-safe public API. Consider a configuration parser crate:

一个将“字符串类型化”(stringly-typed)的内部 API 演变为易用、类型安全的公共 API 的真实案例。考虑一个配置解析器 crate:

Before / 之前 (stringly-typed, easy to misuse / 字符串类型化,容易误用):

#![allow(unused)]
fn main() {
// ❌ All parameters are strings — no compile-time validation
// ❌ 所有参数都是字符串 —— 没有编译时验证
pub fn parse_config(path: &str, format: &str, strict: bool) -> Result<Config, String> {
    // What formats are valid? "json"? "JSON"? "Json"?
    // 哪些格式是有效的?"json"?"JSON"?"Json"?
    // Is path a file path or URL?
    // path 是文件路径还是 URL?
    // What does "strict" even mean?
    // "strict" 到底是什么意思?
    todo!()
}
}

After / 之后 (type-safe, self-documenting / 类型安全,自描述):

#![allow(unused)]
fn main() {
use std::path::Path;

/// Supported configuration formats.
/// 支持的配置格式。
#[derive(Debug, Clone, Copy)]
#[non_exhaustive]  // Adding formats won't break downstream / 添加格式不会破坏下游代码
pub enum Format {
    Json,
    Toml,
    Yaml,
}

/// Controls parsing strictness.
/// 控制解析的严格程度。
#[derive(Debug, Clone, Copy, Default)]
pub enum Strictness {
    /// Reject unknown fields (default for libraries)
    /// 拒绝未知字段(库的默认设置)
    #[default]
    Strict,
    /// Ignore unknown fields (useful for forward-compatible configs)
    /// 忽略未知字段(对向前兼容的配置很有用)
    Lenient,
}

pub fn parse_config(
    path: &Path,          // Type-enforced: must be a filesystem path / 类型强制:必须是文件系统路径
    format: Format,       // Enum: impossible to pass invalid format / 枚举:不可能传入无效格式
    strictness: Strictness,  // Named alternatives, not a bare bool / 命名选项,而非裸布尔值
) -> Result<Config, ConfigError> {
    todo!()
}
}

What improved / 改进之处

Aspect / 方面Before / 之前After / 之后
Format validation / 格式校验Runtime string comparison / 运行时字符串比较Compile-time enum / 编译时枚举
Path type / 路径类型Raw &str (could be anything) / 原始 &str(可能是任何内容)&Path (filesystem-specific) / &Path(特定于文件系统)
Strictness / 严格程度Mystery bool / 神秘的 boolSelf-documenting enum / 自描述枚举
Error type / 错误类型String (opaque) / String(不透明)ConfigError (structured) / ConfigError(结构化)
Extensibility / 可扩展性Breaking changes / 破坏性变更#[non_exhaustive]

Rule of thumb / 经验准则:If you find yourself writing a match on string values, consider replacing the parameter with an enum. If a parameter is a boolean that isn’t obvious from context, use a two-variant enum instead.

如果你发现自己在对字符串值进行 match 操作,请考虑将参数替换为枚举。如果一个布尔参数在上下文中含义不明显,请改用具有两个变体的枚举。


Parse Don’t Validate — TryFrom and Validated Types / 以解析代替校验 —— TryFrom 与校验后的类型

“Parse, don’t validate” is a principle that says: don’t check data and then pass around the raw unchecked form — instead, parse it into a type that can only exist if the data is valid. Rust’s TryFrom trait is the standard tool for this.

“以解析代替校验”(Parse, don’t validate)原则指出:不要在校验数据后继续传递原始的、未经验证的形式 —— 相反,应该将其解析为一个只有当数据有效时才能存在的类型。 Rust 的 TryFrom trait 是实现这一目标的标准工具。

The Problem: Validation Without Enforcement / 问题所在:缺乏强制力的校验

#![allow(unused)]
fn main() {
// ❌ Validate-then-use: nothing prevents using an invalid value after the check
// ❌ 先校验后使用:没有任何机制能阻止在检查后使用无效值
fn process_port(port: u16) {
    if port == 0 || port > 65535 {
        panic!("Invalid port");           // We checked, but... / 我们确实检查了,但是……
    }
    start_server(port);                    // What if someone calls start_server(0) directly?
                                           // 如果有人直接调用 start_server(0) 怎么办?
}

// ❌ Stringly-typed: an email is just a String — any garbage gets through
// ❌ 字符串类型化:电子邮件只是一个 String —— 任何垃圾数据都能混进来
fn send_email(to: String, body: String) {
    // Is `to` actually a valid email? We don't know.
    // `to` 真的是有效的电子邮件吗?我们不知道。
    // Someone could pass "not-an-email" and we only find out at the SMTP server.
    // 有人可能会传入 "not-an-email",而我们只有在连接 SMTP 服务器时才会发现。
}
}

The Solution: Parse Into Validated Newtypes with TryFrom / 解决方案:使用 TryFrom 解析为校验后的新类型

use std::convert::TryFrom;
use std::fmt;

/// A validated TCP port number (1–65535).
/// If you have a `Port`, it is guaranteed valid.
/// 一个经过校验的 TCP 端口号 (1–65535)。
/// 如果你拥有一个 `Port` 实例,它就保证是有效的。
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Port(u16);

impl TryFrom<u16> for Port {
    type Error = PortError;

    fn try_from(value: u16) -> Result<Self, Self::Error> {
        if value == 0 {
            Err(PortError::Zero)
        } else {
            Ok(Port(value))
        }
    }
}

impl Port {
    pub fn get(&self) -> u16 { self.0 }
}

#[derive(Debug)]
pub enum PortError {
    Zero,
    InvalidFormat,
}

impl fmt::Display for PortError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            PortError::Zero => write!(f, "port must be non-zero"),
            PortError::InvalidFormat => write!(f, "invalid port format"),
        }
    }
}

impl std::error::Error for PortError {}

// Now the type system enforces validity:
// 现在类型系统强制保证了有效性:
fn start_server(port: Port) {
    // No validation needed — Port can only be constructed via TryFrom,
    // which already verified it's valid.
    // 无需再次校验 —— Port 只能通过 TryFrom 构造,而 TryFrom 已经验证过它的有效性。
    println!("Listening on port {}", port.get());
}

// Usage / 使用:
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let port = Port::try_from(8080)?;   // ✅ Validated once at the boundary / 在边界处进行一次校验
    start_server(port);                  // No re-validation anywhere downstream / 下游任何地方都不需要重新校验

    let bad = Port::try_from(0);         // ❌ Err(PortError::Zero)
    Ok(())
}

Real-World Example: Validated IPMI Address / 真实案例:经过校验的 IPMI 地址

#![allow(unused)]
fn main() {
/// A validated IPMI slave address (0x20–0xFE, even only).
/// 经过校验的 IPMI 从站地址(0x20–0xFE,且仅限偶数)。
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct IpmiAddr(u8);

#[derive(Debug)]
pub enum IpmiAddrError {
    Odd(u8),
    OutOfRange(u8),
}

impl fmt::Display for IpmiAddrError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            IpmiAddrError::Odd(v) => write!(f, "IPMI address 0x{v:02X} must be even"),
            IpmiAddrError::OutOfRange(v) => {
                write!(f, "IPMI address 0x{v:02X} out of range (0x20..=0xFE)")
            }
        }
    }
}

impl TryFrom<u8> for IpmiAddr {
    type Error = IpmiAddrError;

    fn try_from(value: u8) -> Result<Self, Self::Error> {
        if value % 2 != 0 {
            Err(IpmiAddrError::Odd(value))
        } else if value < 0x20 || value > 0xFE {
            Err(IpmiAddrError::OutOfRange(value))
        } else {
            Ok(IpmiAddr(value))
        }
    }
}

impl IpmiAddr {
    pub fn get(&self) -> u8 { self.0 }
}

// Downstream code never needs to re-check:
// 下游代码永远不需要重新检查:
fn send_ipmi_command(addr: IpmiAddr, cmd: u8, data: &[u8]) -> Result<Vec<u8>, IpmiError> {
    // addr.get() is guaranteed to be a valid, even IPMI address
    // addr.get() 保证是一个有效的、且为偶数的 IPMI 地址
    raw_ipmi_send(addr.get(), cmd, data)
}
}

Parsing Strings with FromStr / 使用 FromStr 解析字符串

For types that are commonly parsed from text (CLI args, config files), implement FromStr:

对于通常需要从文本(CLI 参数、配置文件)中解析的类型,请实现 FromStr

#![allow(unused)]
fn main() {
use std::str::FromStr;

impl FromStr for Port {
    type Err = PortError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let n: u16 = s.parse().map_err(|_| PortError::InvalidFormat)?;
        Port::try_from(n)
    }
}

// Now works with .parse():
// 现在可以配合 .parse() 使用:
let port: Port = "8080".parse()?;   // Validates in one step / 一步完成校验

// And with clap CLI parsing:
// 以及 clap 的 CLI 解析:
// #[derive(Parser)]
// struct Args {
//     #[arg(short, long)]
//     port: Port,   // clap calls FromStr automatically / clap 会自动调用 FromStr
// }
}

TryFrom Chain for Complex Validation / 用于复杂校验的 TryFrom

#![allow(unused)]
fn main() {
// Stub types for this example — in production these would be in
// separate modules with their own TryFrom implementations.
}
#![allow(unused)]
fn main() {
struct Hostname(String);
impl TryFrom<String> for Hostname {
    type Error = String;
    fn try_from(s: String) -> Result<Self, String> { Ok(Hostname(s)) }
}
struct Timeout(u64);
impl TryFrom<u64> for Timeout {
    type Error = String;
    fn try_from(ms: u64) -> Result<Self, String> {
        if ms == 0 { Err("timeout must be > 0".into()) } else { Ok(Timeout(ms)) }
    }
}
struct RawConfig { host: String, port: u16, timeout_ms: u64 }
#[derive(Debug)]
enum ConfigError {
    InvalidHost(String),
    InvalidPort(PortError),
    InvalidTimeout(String),
}
impl From<std::io::Error> for ConfigError {
    fn from(e: std::io::Error) -> Self { ConfigError::InvalidHost(e.to_string()) }
}
impl From<serde_json::Error> for ConfigError {
    fn from(e: serde_json::Error) -> Self { ConfigError::InvalidHost(e.to_string()) }
}
/// A validated configuration that can only exist if all fields are valid.
/// 一个经过校验的配置,只有当所有字段都有效时才能存在。
pub struct ValidConfig {
    pub host: Hostname,
    pub port: Port,
    pub timeout_ms: Timeout,
}

impl TryFrom<RawConfig> for ValidConfig {
    type Error = ConfigError;

    fn try_from(raw: RawConfig) -> Result<Self, Self::Error> {
        Ok(ValidConfig {
            host: Hostname::try_from(raw.host)
                .map_err(ConfigError::InvalidHost)?,
            port: Port::try_from(raw.port)
                .map_err(ConfigError::InvalidPort)?,
            timeout_ms: Timeout::try_from(raw.timeout_ms)
                .map_err(ConfigError::InvalidTimeout)?,
        })
    }
}

// Parse once at the boundary, use the validated type everywhere:
// 在边界处解析一次,在后续各处直接使用校验后的类型:
fn load_config(path: &str) -> Result<ValidConfig, ConfigError> {
    let raw: RawConfig = serde_json::from_str(&std::fs::read_to_string(path)?)?;
    ValidConfig::try_from(raw)  // All validation happens here / 所有校验均在此处发生
}
}

Summary: Validate vs Parse / 总结:校验 vs 解析

Approach / 方法Data checked? / 数据是否已检查?Compiler enforces validity? / 编译器是否强制保证有效性?Re-validation needed? / 是否需要重复校验?
Runtime checks (if/assert) / 运行时检查 (if/assert)Every function boundary / 每个函数边界处
Validated newtype + TryFrom / 校验后的新类型 + TryFromNever — type is proof / 永不需要 —— 类型即是证明

The rule: parse at the boundary, use validated types everywhere inside. Raw strings, integers, and byte slices enter your system, get parsed into validated types via TryFrom/FromStr, and from that point forward the type system guarantees they’re valid.

规则:在边界处解析,在内部各处使用校验后的类型。 原始字符串、整数和字节切片进入你的系统,通过 TryFrom/FromStr 解析为校验后的类型,从那一刻起,类型系统将保证它们的有效性。

Feature Flags and Conditional Compilation / 特性标志与条件编译

Cargo.toml

[features] default = [“json”] # Enabled by default / 默认启用 json = [“dep:serde_json”] # Enables JSON support / 启用 JSON 支持 xml = [“dep:quick-xml”] # Enables XML support / 启用 XML 支持 full = [“json”, “xml”] # Meta-feature: enables all / 元特性:启用所有

[dependencies] serde = “1” serde_json = { version = “1”, optional = true } quick-xml = { version = “0.31”, optional = true }

#![allow(unused)]
fn main() {
// Conditional compilation based on features:
// 基于特性的条件编译:
#[cfg(feature = "json")]
pub fn to_json<T: serde::Serialize>(value: &T) -> String {
    serde_json::to_string(value).unwrap()
}

#[cfg(feature = "xml")]
pub fn to_xml<T: serde::Serialize>(value: &T) -> String {
    quick_xml::se::to_string(value).unwrap()
}

// Compile error if a required feature isn't enabled:
// 如果未启用所需的特性,则抛出编译错误:
#[cfg(not(any(feature = "json", feature = "xml")))]
compile_error!("At least one format feature (json, xml) must be enabled");
}

Best practices / 最佳实践

  • Keep default features minimal — users can opt in / 保持默认特性最简 —— 让用户自行选择开启
  • Use dep: syntax (Rust 1.60+) for optional dependencies to avoid creating implicit features / 使用 dep: 语法(Rust 1.60+)处理可选依赖,避免创建隐式特性
  • Document features in your README and crate-level docs / 在 README 和 crate 级文档中记录特性

Workspace Organization / 工作空间组织

For large projects, use a Cargo workspace to share dependencies and build artifacts:

对于大型项目,请使用 Cargo 工作空间(workspace)来共享依赖项和构建产物:

Root Cargo.toml / 根目录下的 Cargo.toml

[workspace] members = [ “core”, # Shared types and traits / 共享的类型和 trait “parser”, # Parsing library / 解析库 “server”, # Binary — the main application / 二进制程序 —— 主应用 “client”, # Client library / 客户端库 “cli”, # CLI binary / 命令行二进制程序 ]

Shared dependency versions / 共享依赖版本

[workspace.dependencies] serde = { version = “1”, features = [“derive”] } tokio = { version = “1”, features = [“full”] } tracing = “0.1”

In each member’s Cargo.toml / 在每个成员的 Cargo.toml 中:

[dependencies]

serde =


```rust

**Benefits / 优势**:

- Single `Cargo.lock` — all crates use the same dependency versions / 统一的 `Cargo.lock` —— 所有 crate 使用相同的依赖版本
- `cargo test --workspace` runs all tests / `cargo test --workspace` 运行所有测试
- Shared build cache — compiling one crate benefits all / 共享构建缓存 —— 编译一个 crate 会让所有相关 crate 受益
- Clean dependency boundaries between components / 组件之间清晰的依赖边界

### `.cargo/config.toml`: Project-Level Configuration / 项目级配置

The `.cargo/config.toml` file (at the workspace root or in `$HOME/.cargo/`) customizes Cargo behavior without modifying `Cargo.toml`:

`.cargo/config.toml` 文件(位于工作空间根目录或 `$HOME/.cargo/` 中)可以在不修改 `Cargo.toml` 的情况下定制 Cargo 的行为:

```toml

.cargo/config.toml

Default target for this workspace

此工作空间的默认目标

[build] target = “x86_64-unknown-linux-gnu”

Custom runner — e.g., run via QEMU for cross-compiled binaries

自定义运行程序 —— 例如,对于交叉编译的二进制文件使用 QEMU 运行

[target.aarch64-unknown-linux-gnu] runner = “qemu-aarch64-static” linker = “aarch64-linux-gnu-gcc”

Cargo aliases — custom shortcut commands

Cargo 别名 —— 自定义快捷命令

[alias] xt = “test –workspace –release” # cargo xt = run all tests in release / 以 release 模式运行所有测试 ci = “clippy –workspace – -D warnings” # cargo ci = lint with errors on warnings / 若有警告则报错 cov = “llvm-cov –workspace” # cargo cov = coverage (requires cargo-llvm-cov) / 覆盖率测试

Environment variables for build scripts

用于构建脚本的环境变量

[env] IPMI_LIB_PATH = “/usr/lib/bmc”

Use a custom registry (for internal packages)

使用自定义注册表(用于内部包)

[registries.internal]

index = “https://gitlab.internal/crates/index”

Common configuration patterns / 常用的配置模式:

Setting / 设置Purpose / 用途Example / 示例
[build] targetDefault compilation target / 默认编译目标x86_64-unknown-linux-musl for static builds / 用于静态构建
[target.X] runnerHow to run the binary / 如何运行二进制程序"qemu-aarch64-static" for cross-compiled / 用于交叉编译
[target.X] linkerWhich linker to use / 使用哪个链接器"aarch64-linux-gnu-gcc"
[alias]Custom cargo subcommands / 自定义 cargo 子命令xt = "test --workspace"
[env]Build-time environment variables / 构建时的环境变量Library paths, feature toggles / 库路径、特性开关
[net] offlinePrevent network access / 禁止网络访问true for air-gapped builds / 用于离线构建

Compile-Time Environment Variables: env!() and option_env!() / 编译时环境变量:env!()option_env!()

Rust can embed environment variables into the binary at compile time — useful for version strings, build metadata, and configuration:

Rust 能够在编译时将环境变量嵌入二进制文件中 —— 这对于版本字符串、构建元数据和配置非常有用:

#![allow(unused)]
fn main() {
// env!() — panics at compile time if the variable is missing
// env!() —— 如果变量缺失,会在编译时触发 panic
const VERSION: &str = env!("CARGO_PKG_VERSION"); // "0.1.0" from Cargo.toml
const PKG_NAME: &str = env!("CARGO_PKG_NAME");   // Crate name from Cargo.toml

// option_env!() — returns Option<&str>, doesn't panic if missing
// option_env!() —— 返回 Option<&str>,变量缺失时不会 panic
const BUILD_SHA: Option<&str> = option_env!("GIT_SHA");
const BUILD_TIME: Option<&str> = option_env!("BUILD_TIMESTAMP");

fn print_version() {
    println!("{PKG_NAME} v{VERSION}");
    if let Some(sha) = BUILD_SHA {
        println!("  commit: {sha}");
    }
    if let Some(time) = BUILD_TIME {
        println!("  built:  {time}");
    }
}
}

Cargo automatically sets many useful environment variables:

Cargo 会自动设置许多有用的环境变量:

Variable / 变量Value / 数值Use case / 用途
CARGO_PKG_VERSION"1.2.3"Version reporting / 版本汇报
CARGO_PKG_NAME"diag_tool"Binary identification / 程序识别
CARGO_PKG_AUTHORSFrom Cargo.toml / 来自 Cargo.tomlAbout/help text / “关于”/帮助文本
CARGO_MANIFEST_DIRAbsolute path to Cargo.toml / Cargo.toml 的绝对路径Locating test data files / 定位测试数据文件
OUT_DIRBuild output directory / 构建输出目录build.rs code generation target / build.rs 代码生成的目录
TARGETTarget triple / 目标三元组Platform-specific logic in build.rs / build.rs 中的平台特定逻辑

You can set custom env vars from build.rs:

你可以从 build.rs 设置自定义环境变量:

// build.rs
fn main() {
    println!("cargo::rustc-env=GIT_SHA={}", git_sha());
    println!("cargo::rustc-env=BUILD_TIMESTAMP={}", timestamp());
}

cfg_attr: Conditional Attributes / cfg_attr:条件属性

cfg_attr applies an attribute only when a condition is true. This is more targeted than #[cfg()], which includes/excludes entire items:

cfg_attr 仅当 条件为真时才应用属性。这比 #[cfg()] 更具针对性,因为后者会包含或排除整个项:

#![allow(unused)]
fn main() {
// Derive Serialize only when the "serde" feature is enabled:
// 仅当开启 "serde" 特性时才派生 Serialize:
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[derive(Debug, Clone)]
pub struct DiagResult {
    pub fc: u32,
    pub passed: bool,
    pub message: String,
}
// Without "serde" feature: no serde dependency needed at all
// 如果没有 "serde" 特性:则完全不需要 serde 依赖
// With "serde" feature: DiagResult is serializable
// 如果有 "serde" 特性:DiagResult 即可序列化

// Conditional attribute for testing:
// 用于测试的条件属性:
#[cfg_attr(test, derive(PartialEq))]  // Only derive PartialEq in test builds
                                      // 仅在测试构建中派生 PartialEq
pub struct LargeStruct { /* ... */ }

// Platform-specific function attributes:
// 平台特定的函数属性:
#[cfg_attr(target_os = "linux", link_name = "ioctl")]
#[cfg_attr(target_os = "freebsd", link_name = "__ioctl")]
extern "C" fn platform_ioctl(fd: i32, request: u64) -> i32;
}
Pattern / 模式What it does / 作用
#[cfg(feature = "x")]Include/exclude the entire item / 包含/排除整个项
#[cfg_attr(feature = "x", derive(Foo))]Add derive(Foo) only when feature “x” is on / 仅当特性 “x” 开启时添加 derive(Foo)
#[cfg_attr(test, allow(unused))]Suppress warnings only in test builds / 仅在测试构建中消除警告
#[cfg_attr(doc, doc = "...")]Documentation visible only in cargo doc / 仅在使用 cargo doc 时可见的文档内容

cargo deny and cargo audit: Supply-Chain Security / cargo denycargo audit:供应链安全

# Install security audit tools / 安装安全审计工具
cargo install cargo-deny
cargo install cargo-audit

# Check for known vulnerabilities in dependencies / 检查依赖项中已知的安全漏洞
cargo audit

# Comprehensive checks: licenses, bans, advisories, sources / 全面检查:许可证、禁用名单、公告、来源
cargo deny check

Configure cargo deny with a deny.toml at the workspace root:

在工作空间根目录下使用 deny.toml 配置 cargo deny

# deny.toml
[advisories]
vulnerability = "deny"      # Fail on known vulnerabilities / 发现已知漏洞时报错
unmaintained = "warn"        # Warn on unmaintained crates / 对没人维护的 crate 发出警告

[licenses]
allow = ["MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause"]
deny = ["GPL-3.0"]          # Reject copyleft licenses / 拒绝 Copyleft 许可证

[bans]
multiple-versions = "warn"  # Warn if multiple versions of same crate / 如果同一个 crate 有多个版本则发出警告
deny = [
    { name = "openssl" },   # Force use of rustls instead / 强制使用 rustls 代替
]

[sources]
allow-git = []              # No git dependencies in production
Tool / 工具Purpose / 用途When to run / 何时运行
cargo auditCheck for known CVEs in dependencies / 检查依赖项中已知的 CVECI pipeline, pre-release / CI 流水线、发布前
cargo deny checkLicenses, bans, advisories, sources / 许可证、禁用、公告、来源CI pipeline / CI 流水线
cargo deny check licensesLicense compliance only / 仅检查许可证合规性Before open-sourcing / 开源前
cargo deny check bansPrevent specific crates / 防止引入特定 crateEnforce architecture decisions / 强制执行架构决策

Doc Tests: Tests Inside Documentation / 文档测试:文档中的测试

Rust doc comments (///) can contain code blocks that are compiled and run as tests:

Rust 的文档注释(///)可以包含代码块,这些代码块会被 作为测试编译并运行

#![allow(unused)]
fn main() {
/// Parses a diagnostic fault code from a string.
/// 从字符串中解析诊断故障码。
///
/// # Examples / 示例
///
/// ```
/// use my_crate::parse_fc;
///
/// let fc = parse_fc("FC:12345").unwrap();
/// assert_eq!(fc, 12345);
/// ```
///
/// Invalid input returns an error / 无效输入会返回错误:
///
/// ```
/// use my_crate::parse_fc;
///
/// assert!(parse_fc("not-a-fc").is_err());
/// ```
pub fn parse_fc(input: &str) -> Result<u32, ParseError> {
    input.strip_prefix("FC:")
        .ok_or(ParseError::MissingPrefix)?
        .parse()
        .map_err(ParseError::InvalidNumber)
}
}
cargo test --doc  # Run only doc tests / 仅运行文档测试
cargo test        # Runs unit + integration + doc tests / 运行单元 + 集成 + 文档测试

Module-level documentation uses //! at the top of a file:

模块级文档 在文件顶部使用 //!

#![allow(unused)]
fn main() {
//! # Diagnostic Framework / 诊断框架
//!
//! This crate provides the core diagnostic execution engine.
//! 它提供了核心诊断执行引擎。
//! It supports running diagnostic tests, collecting results,
//! and reporting to the BMC via IPMI.
//! 它支持运行诊断测试、收集结果,并通过 IPMI 向 BMC 汇报。
//!
//! ## Quick Start / 快速上手
//!
//! ```no_run
//! use diag_framework::Framework;
//!
//! let mut fw = Framework::new("config.json")?;
//! fw.run_all_tests()?;
//! ```
}

Benchmarking with Criterion / 使用 Criterion 进行基准测试

Full coverage / 完整内容:See the Benchmarking with criterion section in Chapter 14 (Testing and Benchmarking Patterns) for complete criterion setup, API examples, and a comparison table vs cargo bench. Below is a quick-reference for architecture-specific usage.

有关完整的 criterion 设置、API 示例以及与 cargo bench 的对比表,请参阅第 14 章(测试与基准模式)中的 使用 criterion 进行基准测试 章节。以下是针对架构特定用途的快速参考。

When benchmarking your crate’s public API, place benchmarks in benches/ and keep them focused on the hot path — typically parsers, serializers, or validation boundaries:

在对 crate 的公共 API 进行基准测试时,请将 benchmark 放在 benches/ 目录中,并专注于热点路径 —— 通常包括解析器、序列化器或校验边界:

cargo bench                  # Run all benchmarks / 运行所有基准测试
cargo bench -- parse_config  # Run specific benchmark / 运行特定的基准测试
# Results in target/criterion/ with HTML reports / 结果保存在 target/criterion/,包含 HTML 报告

Key Takeaways — Architecture & API Design / 关键要点:架构与 API 设计

  • Accept the most general type (impl Into, impl AsRef, Cow); return the most specific / 接收最通用的类型(impl Intoimpl AsRefCow);返回最具体的类型
  • Parse Don’t Validate: use TryFrom to create types that are valid by construction / 以解析代替校验:使用 TryFrom 创建构造即有效的类型
  • #[non_exhaustive] on public enums prevents breaking changes when adding variants / 公共枚举上的 #[non_exhaustive] 标签能防止添加变体时导致破坏性变更
  • #[must_use] catches silent discards of important values / #[must_use] 能捕获对重要数值的静默丢弃

See also / 延伸阅读Ch 10 — Error Handling Patterns 了解公共 API 中的错误类型设计。Ch 14 — Testing and Benchmarking Patterns 了解如何测试 crate 的公共 API。


Exercise: Crate API Refactoring ★★ (~30 min) / 练习:Crate API 重构

Refactor the following “stringly-typed” API into one that uses TryFrom, newtypes, and builder pattern:

将以下“字符串类型化”的 API 重构为使用 TryFrom、新类型和建造者模式的 API:

// BEFORE: Easy to misuse
// 之前:容易误用
fn create_server(host: &str, port: &str, max_conn: &str) -> Server { ... }

Design a ServerConfig with validated types Host, Port (1–65535), and MaxConnections (1–10000) that reject invalid values at parse time.

设计一个 ServerConfig,包含经过校验的类型 HostPort (1–65535) 和 MaxConnections (1–10000),并在解析阶段拒绝无效值。

🔑 Solution / 参考答案
#[derive(Debug, Clone)]
struct Host(String);

impl TryFrom<&str> for Host {
    type Error = String;
    fn try_from(s: &str) -> Result<Self, String> {
        if s.is_empty() { return Err("host cannot be empty / host 不能为空".into()); }
        if s.contains(' ') { return Err("host cannot contain spaces / host 不能包含空格".into()); }
        Ok(Host(s.to_string()))
    }
}

#[derive(Debug, Clone, Copy)]
struct Port(u16);

impl TryFrom<u16> for Port {
    type Error = String;
    fn try_from(p: u16) -> Result<Self, String> {
        if p == 0 { return Err("port must be >= 1 / 端口必须 >= 1".into()); }
        Ok(Port(p))
    }
}

#[derive(Debug, Clone, Copy)]
struct MaxConnections(u32);

impl TryFrom<u32> for MaxConnections {
    type Error = String;
    fn try_from(n: u32) -> Result<Self, String> {
        if n == 0 || n > 10_000 {
            return Err(format!("max_connections must be 1–10000, got {n} / max_connections 必须在 1-10000 之间,当前为 {n}"));
        }
        Ok(MaxConnections(n))
    }
}

#[derive(Debug)]
struct ServerConfig {
    host: Host,
    port: Port,
    max_connections: MaxConnections,
}

impl ServerConfig {
    fn new(host: Host, port: Port, max_connections: MaxConnections) -> Self {
        ServerConfig { host, port, max_connections }
    }
}

fn main() {
    let config = ServerConfig::new(
        Host::try_from("localhost").unwrap(),
        Port::try_from(8080).unwrap(),
        MaxConnections::try_from(100).unwrap(),
    );
    println!("{config:?}");

    // Invalid values caught at parse time:
    // 无效值在解析时被捕获:
    assert!(Host::try_from("").is_err());
    assert!(Port::try_from(0).is_err());
    assert!(MaxConnections::try_from(99999).is_err());
}

16. Async/Await Essentials / 16. Async/Await 核心要点 🔶

What you’ll learn / 你将学到:

  • How Rust’s Future trait differs from Go’s goroutines and Python’s asyncio / Rust 的 Future trait 与 Go 的 goroutine 以及 Python 的 asyncio 有何不同
  • Tokio quick-start: spawning tasks, join!, and runtime configuration / Tokio 快速上手:生成任务、join! 以及运行时的配置
  • Common async pitfalls and how to fix them / 常见的异步陷阱及其解决方法
  • When to offload blocking work with spawn_blocking / 何时使用 spawn_blocking 来卸载阻塞性工作

Futures, Runtimes, and async fn / Future、运行时与 async fn

Rust’s async model is fundamentally different from Go’s goroutines or Python’s asyncio. Understanding three concepts is enough to get started:

Rust 的异步模型与 Go 的 goroutine 或 Python 的 asyncio 有着 本质上的不同。了解以下三个概念就足以入门:

  1. A Future is a lazy state machine / Future 是一个惰性状态机 — calling async fn doesn’t execute anything; it returns a Future that must be polled. 调用 async fn 不会执行任何操作;它会返回一个必须被轮询(poll)的 Future
  2. You need a runtime / 你需要一个运行时 to poll futures — tokio, async-std, or smol. The standard library defines Future but provides no runtime. 来轮询 future —— 例如 tokioasync-stdsmol。标准库定义了 Future 但不提供运行时。
  3. async fn is sugar / async fn 是语法糖 — the compiler transforms it into a state machine that implements Future. 编译器会将其转换为实现 Future 的状态机。
#![allow(unused)]
fn main() {
// A Future is just a trait:
// Future 只是一个 trait:
pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

// async fn desugars to:
// async fn 会脱糖(desugar)为:
// fn fetch_data(url: &str) -> impl Future<Output = Result<Vec<u8>, Error>>
async fn fetch_data(url: &str) -> Result<Vec<u8>, reqwest::Error> {
    let response = reqwest::get(url).await?;  // .await yields until ready
                                              // .await 会在未就绪时让出控制权
    let bytes = response.bytes().await?;
    Ok(bytes.to_vec())
}
}

Tokio Quick Start / Tokio 快速上手

# Cargo.toml
[dependencies]
tokio = { version = "1", features = ["full"] }
use tokio::time::{sleep, Duration};
use tokio::task;

#[tokio::main]
async fn main() {
    // Spawn concurrent tasks (like lightweight threads):
    // 生成并发任务(就像轻量级线程):
    let handle_a = task::spawn(async {
        sleep(Duration::from_millis(100)).await;
        "task A done"
    });

    let handle_b = task::spawn(async {
        sleep(Duration::from_millis(50)).await;
        "task B done"
    });

    // .await both — they run concurrently, not sequentially:
    // 等待(.await)两者 —— 它们是并发运行的,而不是顺序运行:
    let (a, b) = tokio::join!(handle_a, handle_b);
    println!("{}, {}", a.unwrap(), b.unwrap());
}

Async Common Pitfalls / 异步常见陷阱

Pitfall / 陷阱Why It Happens / 原因Fix / 解决方法
Blocking in async / 在异步中阻塞std::thread::sleep or CPU work blocks the executor / std::thread::sleep 或 CPU 运算阻塞了执行器Use tokio::task::spawn_blocking or rayon / 使用 tokio::task::spawn_blockingrayon
Send bound errors / Send 约束错误Future held across .await contains !Send type (e.g., Rc, MutexGuard) / 在 .await 处跨越持有的 Future 包含 !Send 类型(如 RcMutexGuardRestructure to drop non-Send values before .await / 重新调整结构,在 .await 前丢弃非 Send 值
Future not polled / Future 未被轮询Calling async fn without .await or spawning — nothing happens / 调用 async fn 后未进行 .await 或 spawn —— 导致没有任何反应Always .await or tokio::spawn the returned future / 始终对返回的 future 进行 .await 或使用 tokio::spawn
Holding MutexGuard across .await / 跨 .await 持有 MutexGuardstd::sync::MutexGuard is !Send; async tasks may resume on different thread / std::sync::MutexGuard!Send;异步任务可能会在不同线程恢复执行Use tokio::sync::Mutex or drop the guard before .await / 使用 tokio::sync::Mutex 或在 .await 前丢弃 guard
Accidental sequential execution / 意外的顺序执行let a = foo().await; let b = bar().await; runs sequentially / let a = foo().await; let b = bar().await; 会依次运行Use tokio::join! or tokio::spawn for concurrency / 使用 tokio::join!tokio::spawn 实现并发
#![allow(unused)]
fn main() {
// ❌ Blocking the async executor:
// ❌ 阻塞异步执行器:
async fn bad() {
    std::thread::sleep(std::time::Duration::from_secs(5)); // Blocks entire thread!
                                                        // 会阻塞整个线程!
}

// ✅ Offload blocking work:
// ✅ 卸载阻塞性工作:
async fn good() {
    tokio::task::spawn_blocking(|| {
        std::thread::sleep(std::time::Duration::from_secs(5)); // Runs on blocking pool
                                                               // 在阻塞池中运行
    }).await.unwrap();
}
}

Comprehensive async coverage: For Stream, select!, cancellation safety, structured concurrency, and tower middleware, see our dedicated Async Rust Training guide. This section covers just enough to read and write basic async code.

Spawning and Structured Concurrency / 任务生成与结构化并发

Tokio’s spawn creates a new asynchronous task — similar to thread::spawn but much lighter:

Tokio 的 spawn 会创建一个新的异步任务 —— 类似于 thread::spawn 但要轻量得多:

use tokio::task;
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() {
    // Spawn three concurrent tasks
    // 生成三个并发任务
    let h1 = task::spawn(async {
        sleep(Duration::from_millis(200)).await;
        "fetched user profile"
    });

    let h2 = task::spawn(async {
        sleep(Duration::from_millis(100)).await;
        "fetched order history"
    });

    let h3 = task::spawn(async {
        sleep(Duration::from_millis(150)).await;
        "fetched recommendations"
    });

    // Wait for all three concurrently (not sequentially!)
    // 同时等待这三个任务(并发而非顺序等待!)
    let (r1, r2, r3) = tokio::join!(h1, h2, h3);
    println!("{}", r1.unwrap());
    println!("{}", r2.unwrap());
    println!("{}", r3.unwrap());
}

join! vs try_join! vs select!

Macro / 宏Behavior / 行为Use when / 适用场景
join!Waits for ALL futures / 等待所有 future所有任务都必须完成
try_join!Waits for all, short-circuits on first Err / 等待所有任务,遇错则立即短路任务返回 Result
select!Returns when FIRST future completes / 第一个 future 完成时即返回超时、取消操作
use tokio::time::{timeout, Duration};

async fn fetch_with_timeout() -> Result<String, Box<dyn std::error::Error>> {
    let result = timeout(Duration::from_secs(5), async {
        // Simulate slow network call
        // 模拟慢速网络调用
        tokio::time::sleep(Duration::from_millis(100)).await;
        Ok::<_, Box<dyn std::error::Error>>("data".to_string())
    }).await??; // First ? unwraps Elapsed, second ? unwraps inner Result
                // 第一个 ? 解包 Elapsed 超时,第二个 ? 解包内部 Result
    Ok(result)
}

Send Bounds and Why Futures Must Be Send / Send 约束以及为何 Future 必须满足 Send

When you tokio::spawn a future, it may resume on a different OS thread. This means the future must be Send. Common pitfalls:

当你通过 tokio::spawn 生成一个 future 时,它可能会在不同的系统线程上恢复执行。这意味着此 future 必须满足 Send。常见的陷阱有:

use std::rc::Rc;

async fn not_send() {
    let rc = Rc::new(42); // Rc is !Send / Rc 是非 Send 的
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    println!("{}", rc); // rc is held across .await — future is !Send
                         // rc 在 .await 期间被持有 —— future 为非 Send
}

// Fix 1: Drop before .await / 解决方法 1:在 .await 前丢弃
async fn fixed_drop() {
    let data = {
        let rc = Rc::new(42);
        *rc // Copy the value out / 将值拷贝出来
    }; // rc dropped here / rc 在此处被丢弃
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    println!("{}", data); // Just an i32, which is Send / 只是一个满足 Send 的 i32
}

// Fix 2: Use Arc instead of Rc / 解决方法 2:使用 Arc 代替 Rc
async fn fixed_arc() {
    let arc = std::sync::Arc::new(42); // Arc is Send / Arc 是 Send 的
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    println!("{}", arc); // ✅ Future is Send / ✅ Future 满足 Send
}

Comprehensive async coverage / 异步内容的全面覆盖:For Stream, select!, cancellation safety, structured concurrency, and tower middleware, see our dedicated Async Rust Training guide. This section covers just enough to read and write basic async code.

有关 Streamselect!、取消安全性(cancellation safety)、结构化并发以及 tower 中间件的详细内容,请参阅我们专门的 Async Rust 进阶指南。本节仅涵盖阅读和编写基础异步代码所需的知识。

See also / 延伸阅读Ch 05 — Channels 了解同步通道。Ch 06 — Concurrency 了解操作系统线程与异步任务的对比。

Key Takeaways — Async / 关键要点:异步

  • async fn returns a lazy Future — nothing runs until you .await or spawn it / async fn 返回一个惰性的 Future —— 除非对其进行 .await 或 spawn 否则什么都不会发生
  • Use tokio::task::spawn_blocking for CPU-heavy or blocking work inside async contexts / 在异步上下文中使用 tokio::task::spawn_blocking 来处理 CPU 密集型或阻塞型工作
  • Don’t hold std::sync::MutexGuard across .await — use tokio::sync::Mutex instead / 不要跨 .await 持有 std::sync::MutexGuard —— 请改用 tokio::sync::Mutex
  • Futures must be Send when spawned — drop !Send types before .await points / 被 spawn 的 Future 必须满足 Send —— 在 .await 点之前丢弃所有非 Send 类型

Exercise: Concurrent Fetcher with Timeout ★★ (~25 min) / 练习:带有超时的并发获取器

Write an async function fetch_all that spawns three tokio::spawn tasks, each simulating a network call with tokio::time::sleep. Join all three with tokio::try_join! wrapped in tokio::time::timeout(Duration::from_secs(5), ...). Return Result<Vec<String>, ...> or an error if any task fails or the deadline expires.

编写一个异步函数 fetch_all,使用 tokio::spawn 生成三个任务,每个任务都使用 tokio::time::sleep 模拟网络调用。使用 tokio::try_join! 同时等待这三个任务,并将其包装在 tokio::time::timeout(Duration::from_secs(5), ...) 中。如果任何一个任务失败或超过截止时间,则返回错误,否则返回 Result<Vec<String>, ...>

🔑 Solution / 参考答案
use tokio::time::{sleep, timeout, Duration};

async fn fake_fetch(name: &'static str, delay_ms: u64) -> Result<String, String> {
    sleep(Duration::from_millis(delay_ms)).await;
    Ok(format!("{name}: OK"))
}

async fn fetch_all() -> Result<Vec<String>, Box<dyn std::error::Error>> {
    let deadline = Duration::from_secs(5);

    let (a, b, c) = timeout(deadline, async {
        let h1 = tokio::spawn(fake_fetch("svc-a", 100));
        let h2 = tokio::spawn(fake_fetch("svc-b", 200));
        let h3 = tokio::spawn(fake_fetch("svc-c", 150));
        tokio::try_join!(h1, h2, h3)
    })
    .await??; // First ? for Timeout, second ? for JoinError / 第一个 ? 处理超时,第二个 ? 处理 JoinError

    Ok(vec![a?, b?, c?]) // Propagate any inner Result errors / 传播任何内部 Result 错误
}

#[tokio::main]
async fn main() {
    let results = fetch_all().await.unwrap();
    // Print results / 打印结果
    for r in &results {
        println!("{r}");
    }
}

17. Exercises / 17. 练习

Exercises / 练习

Exercise 1: Type-Safe State Machine ★★ (~30 min) / 练习 1:类型安全的状态机

Build a traffic light state machine using the type-state pattern. The light must transition Red → Green → Yellow → Red and no other order should be possible.

使用类型状态(type-state)模式构建一个红绿灯状态机。该灯必须遵循 红 → 绿 → 黄 → 红 的切换顺序,且不应允许任何其他顺序。

🔑 Solution / 参考答案
use std::marker::PhantomData;

struct Red;
struct Green;
struct Yellow;

struct TrafficLight<State> {
    _state: PhantomData<State>,
}

impl TrafficLight<Red> {
    fn new() -> Self {
        println!("🔴 Red — STOP / 红灯 —— 停止");
        TrafficLight { _state: PhantomData }
    }

    fn go(self) -> TrafficLight<Green> {
        println!("🟢 Green — GO / 绿灯 —— 行进");
        TrafficLight { _state: PhantomData }
    }
}

impl TrafficLight<Green> {
    fn caution(self) -> TrafficLight<Yellow> {
        println!("🟡 Yellow — CAUTION / 黄灯 —— 注意");
        TrafficLight { _state: PhantomData }
    }
}

impl TrafficLight<Yellow> {
    fn stop(self) -> TrafficLight<Red> {
        println!("🔴 Red — STOP / 红灯 —— 停止");
        TrafficLight { _state: PhantomData }
    }
}

fn main() {
    let light = TrafficLight::new(); // Red / 红灯
    let light = light.go();          // Green / 绿灯
    let light = light.caution();     // Yellow / 黄灯
    let light = light.stop();        // Red / 红灯

    // light.caution(); // ❌ Compile error: no method `caution` on Red
                        // ❌ 编译错误:Red 类型没有 `caution` 方法
    // TrafficLight::new().stop(); // ❌ Compile error: no method `stop` on Red
                                   // ❌ 编译错误:Red 类型没有 `stop` 方法
}

Key takeaway / 关键要点:Invalid transitions are compile errors, not runtime panics. / 无效的转换会导致编译错误,而不是运行时 panic。


Exercise 2: Unit-of-Measure with PhantomData ★★ (~30 min) / 练习 2:结合 PhantomData 的计量单位

Extend the unit-of-measure pattern from Ch 4 to support:

扩展第 4 章中的计量单位模式,以支持:

  • Meters, Seconds, Kilograms / 米、秒、千克
  • Addition of same units / 相同单位的加法
  • Multiplication: Meters * Meters = SquareMeters / 乘法:米 * 米 = 平方米
  • Division: Meters / Seconds = MetersPerSecond / 除法:米 / 秒 = 米/秒
🔑 Solution / 参考答案
use std::marker::PhantomData;
use std::ops::{Add, Mul, Div};

#[derive(Clone, Copy)]
struct Meters;
#[derive(Clone, Copy)]
struct Seconds;
#[derive(Clone, Copy)]
struct Kilograms;
#[derive(Clone, Copy)]
struct SquareMeters;
#[derive(Clone, Copy)]
struct MetersPerSecond;

#[derive(Debug, Clone, Copy)]
struct Qty<U> {
    value: f64,
    _unit: PhantomData<U>,
}

impl<U> Qty<U> {
    fn new(v: f64) -> Self { Qty { value: v, _unit: PhantomData } }
}

impl<U> Add for Qty<U> {
    type Output = Qty<U>;
    fn add(self, rhs: Self) -> Self::Output { Qty::new(self.value + rhs.value) }
}

impl Mul<Qty<Meters>> for Qty<Meters> {
    type Output = Qty<SquareMeters>;
    fn mul(self, rhs: Qty<Meters>) -> Qty<SquareMeters> {
        Qty::new(self.value * rhs.value)
    }
}

impl Div<Qty<Seconds>> for Qty<Meters> {
    type Output = Qty<MetersPerSecond>;
    fn div(self, rhs: Qty<Seconds>) -> Qty<MetersPerSecond> {
        Qty::new(self.value / rhs.value)
    }
}

fn main() {
    let width = Qty::<Meters>::new(5.0);
    let height = Qty::<Meters>::new(3.0);
    let area = width * height; // Qty<SquareMeters> / 平方米
    println!("Area: {:.1} m²", area.value);

    let dist = Qty::<Meters>::new(100.0);
    let time = Qty::<Seconds>::new(9.58);
    let speed = dist / time; // MetersPerSecond / 米每秒
    println!("Speed: {:.2} m/s", speed.value);

    let sum = width + height; // Same unit ✅ / 相同单位 ✅
    println!("Sum: {:.1} m", sum.value);

    // let bad = width + time; // ❌ Compile error: can't add Meters + Seconds
                               // ❌ 编译错误:无法将“米”与“秒”相加

Exercise 3: Channel-Based Worker Pool ★★★ (~45 min) / 练习 3:基于通道的线程池(Worker Pool)

Build a worker pool using channels where:

使用通道构建一个工作线程池,要求:

  • A dispatcher sends Job structs through a channel / 调度器(Dispatcher)通过通道发送 Job 结构体
  • N workers consume jobs and send results back / N 个工作线程(Worker)消费任务并将结果发回
  • Use crossbeam-channel (or std::sync::mpsc if crossbeam is unavailable) / 使用 crossbeam-channel(如果不可用,请使用 std::sync::mpsc
🔑 Solution / 参考答案 (Ex 3)
use std::sync::mpsc;
use std::thread;

struct Job {
    id: u64,
    data: String,
}

struct JobResult {
    job_id: u64,
    output: String,
    worker_id: usize,
}

fn worker_pool(jobs: Vec<Job>, num_workers: usize) -> Vec<JobResult> {
    let (job_tx, job_rx) = mpsc::channel::<Job>();
    let (result_tx, result_rx) = mpsc::channel::<JobResult>();

    // Wrap receiver in Arc<Mutex> for sharing among workers
    let job_rx = std::sync::Arc::new(std::sync::Mutex::new(job_rx));

    // Spawn workers
    let mut handles = Vec::new();
    for worker_id in 0..num_workers {
        let job_rx = job_rx.clone();
        let result_tx = result_tx.clone();
        handles.push(thread::spawn(move || {
            loop {
                // Lock, receive, unlock — short critical section
                // 加锁,接收,解锁 —— 极短的临界区
                let job = {
                    let rx = job_rx.lock().unwrap();
                    rx.recv() // Blocks until a job or channel closes / 阻塞直至获取任务或通道关闭
                };
                match job {
                    Ok(job) => {
                        let output = format!("processed '{}' by worker {worker_id}", job.data);
                        result_tx.send(JobResult {
                            job_id: job.id,
                            output,
                            worker_id,
                        }).unwrap();
                    }
                    Err(_) => break, // Channel closed — exit / 通道已关闭 —— 退出
                }
            }
        }));
    }
    drop(result_tx); // Drop our copy so result channel closes when workers finish
                     // 丢弃我们手中的副本,以便在所有工作线程完成时关闭结果通道

    // Dispatch jobs / 调度任务
    let num_jobs = jobs.len();
    for job in jobs {
        job_tx.send(job).unwrap();
    }
    drop(job_tx); // Close the job channel — workers will exit after draining
                  // 关闭任务通道 —— 工作线程在处理完剩余任务后将退出

    // Collect results / 收集结果
    let mut results = Vec::new();
    for result in result_rx {
        results.push(result);
    }
    assert_eq!(results.len(), num_jobs);

    for h in handles { h.join().unwrap(); }
    results
}

fn main() {
    let jobs: Vec<Job> = (0..20).map(|i| Job {
        id: i,
        data: format!("task-{i}"),
    }).collect();

    let results = worker_pool(jobs, 4);
    for r in &results {
        println!("[worker {}] job {}: {}", r.worker_id, r.job_id, r.output);
        println!("[工作线程 {}] 任务 {}: {}", r.worker_id, r.job_id, r.output);
    }
}

Exercise 4: Higher-Order Combinator Pipeline ★★ (~25 min) / 练习 4:高阶组合器流水线

Create a Pipeline struct that chains transformations. It should support .pipe(f) to add a transformation and .execute(input) to run the full chain.

创建一个 Pipeline 结构体用于串联各种转换操作(transformations)。它应该支持 .pipe(f) 方法来添加转换步骤,以及 .execute(input) 方法来运行完整的流水线链。

🔑 Solution / 参考答案 (Ex 4)
struct Pipeline<T> {
    transforms: Vec<Box<dyn Fn(T) -> T>>,
}

impl<T: 'static> Pipeline<T> {
    fn new() -> Self {
        Pipeline { transforms: Vec::new() }
    }

    fn pipe(mut self, f: impl Fn(T) -> T + 'static) -> Self {
        self.transforms.push(Box::new(f));
        self
    }

    fn execute(self, input: T) -> T {
        self.transforms.into_iter().fold(input, |val, f| f(val))
    }
}

fn main() {
    let result = Pipeline::new()
        .pipe(|s: String| s.trim().to_string())
        .pipe(|s| s.to_uppercase())
        .pipe(|s| format!(">>> {s} <<<"))
        .execute("  hello world  ".to_string());

    println!("{result}"); // >>> HELLO WORLD <<<

    // Numeric pipeline:
    let result = Pipeline::new()
        .pipe(|x: i32| x * 2)
        .pipe(|x| x + 10)
        .pipe(|x| x * x)
        .execute(5);

    println!("{result}"); // (5*2 + 10)^2 = 400
}

Bonus / 加分项:Generic pipeline that changes type between stages would use a different design — each .pipe() returns a Pipeline with a different output type (this requires more advanced generic plumbing).

能够在各个阶段改变类型的泛型流水线需要不同的设计 —— 每一个 .pipe() 都返回一个具有不同输出类型的 Pipeline(这需要更高级的泛型技巧)。


Exercise 5: Error Hierarchy with thiserror ★★ (~30 min) / 练习 5:使用 thiserror 构建错误层级

Design an error type hierarchy for a file-processing application that can fail during I/O, parsing (JSON and CSV), and validation. Use thiserror and demonstrate ? propagation.

为一个文件处理应用程序设计一个错误类型层级。该程序可能会在 I/O、解析(JSON 和 CSV)以及校验阶段失败。请使用 thiserror 库并展示 ? 操作符的传播。

🔑 Solution / 参考答案 (Ex 5)
use thiserror::Error;

#[derive(Error, Debug)]
pub enum AppError {
    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("JSON parse error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("CSV error at line {line}: {message}")]
    Csv { line: usize, message: String },

    #[error("validation error: {field} — {reason}")]
    Validation { field: String, reason: String },
}

fn read_file(path: &str) -> Result<String, AppError> {
    Ok(std::fs::read_to_string(path)?) // io::Error → AppError::Io via #[from]
}

fn parse_json(content: &str) -> Result<serde_json::Value, AppError> {
    Ok(serde_json::from_str(content)?) // serde_json::Error → AppError::Json
}

fn validate_name(value: &serde_json::Value) -> Result<String, AppError> {
    let name = value.get("name")
        .and_then(|v| v.as_str())
        .ok_or_else(|| AppError::Validation {
            field: "name".into(),
            reason: "must be a non-null string".into(),
        })?;

    if name.is_empty() {
        return Err(AppError::Validation {
            field: "name".into(),
            reason: "must not be empty".into(),
        });
    }

    Ok(name.to_string())
}

fn process_file(path: &str) -> Result<String, AppError> {
    let content = read_file(path)?;
    let json = parse_json(&content)?;
    let name = validate_name(&json)?;
    Ok(name)
}

fn main() {
    match process_file("config.json") {
        Ok(name) => println!("Name: {name}"),
        Err(e) => eprintln!("Error: {e}"),
    }
}

Exercise 6: Generic Trait with Associated Types ★★★ (~40 min) / 练习 6:带有关联类型的泛型 Trait

Design a Repository<T> trait with associated Error and Id types. Implement it for an in-memory store and demonstrate compile-time type safety.

设计一个 Repository<T> trait,包含关联类型 ErrorId。为一个内存存储(in-memory store)实现该 trait,并展示编译时类型安全性。

🔑 Solution / 参考答案 (Ex 6)
use std::collections::HashMap;

trait Repository {
    type Item;
    type Id;
    type Error;

    fn get(&self, id: &Self::Id) -> Result<Option<&Self::Item>, Self::Error>;
    fn insert(&mut self, item: Self::Item) -> Result<Self::Id, Self::Error>;
    fn delete(&mut self, id: &Self::Id) -> Result<bool, Self::Error>;
}

#[derive(Debug, Clone)]
struct User {
    name: String,
    email: String,
}

struct InMemoryUserRepo {
    data: HashMap<u64, User>,
    next_id: u64,
}

impl InMemoryUserRepo {
    fn new() -> Self {
        InMemoryUserRepo { data: HashMap::new(), next_id: 1 }
    }
}

// Error type is Infallible — in-memory ops never fail
// 错误类型为 Infallible —— 内存操作永远不会失败
impl Repository for InMemoryUserRepo {
    type Item = User;
    type Id = u64;
    type Error = std::convert::Infallible;

    fn get(&self, id: &u64) -> Result<Option<&User>, Self::Error> {
        Ok(self.data.get(id))
    }

    fn insert(&mut self, item: User) -> Result<u64, Self::Error> {
        let id = self.next_id;
        self.next_id += 1;
        self.data.insert(id, item);
        Ok(id)
    }

    fn delete(&mut self, id: &u64) -> Result<bool, Self::Error> {
        Ok(self.data.remove(id).is_some())
    }
}

// Generic function works with ANY repository:
// 泛型函数适用于任何 Repository:
fn create_and_fetch<R: Repository>(repo: &mut R, item: R::Item) -> Result<(), R::Error>
where
    R::Item: std::fmt::Debug,
    R::Id: std::fmt::Debug,
{
    let id = repo.insert(item)?;
    println!("Inserted with id: {id:?}");
    let retrieved = repo.get(&id)?;
    println!("Retrieved: {retrieved:?}");
    Ok(())
}

fn main() {
    let mut repo = InMemoryUserRepo::new();
    create_and_fetch(&mut repo, User {
        name: "Alice".into(),
        email: "alice@example.com".into(),
    }).unwrap();
}

Exercise 7: Safe Wrapper around Unsafe (Ch 12) ★★★ (~45 min) / 练习 7:Unsafe 的安全封装(第 12 章)

Write a FixedVec<T, const N: usize> — a fixed-capacity, stack-allocated vector. Requirements:

编写一个 FixedVec<T, const N: usize> —— 一个固定容量且分配在栈上的向量。要求如下:

  • push(&mut self, value: T) -> Result<(), T> returns Err(value) when full / 满时返回 Err(value)
  • pop(&mut self) -> Option<T> returns and removes the last element / 返回并移除最后一个元素
  • as_slice(&self) -> &[T] borrows initialized elements / 借用已初始化的元素
  • All public methods must be safe; all unsafe must be encapsulated with SAFETY: comments / 所有公共方法必须安全;所有 unsafe 必须带有 SAFETY: 注释封装
  • Drop must clean up initialized elements / Drop 必须清理已初始化的元素

Hint / 提示:Use MaybeUninit<T> and [const { MaybeUninit::uninit() }; N].

🔑 Solution / 参考答案 (Ex 7)
use std::mem::MaybeUninit;

pub struct FixedVec<T, const N: usize> {
    data: [MaybeUninit<T>; N],
    len: usize,
}

impl<T, const N: usize> FixedVec<T, N> {
    pub fn new() -> Self {
        FixedVec {
            data: [const { MaybeUninit::uninit() }; N],
            len: 0,
        }
    }

    pub fn push(&mut self, value: T) -> Result<(), T> {
        if self.len >= N { return Err(value); }
        // SAFETY: len < N, so data[len] is within bounds.
        self.data[self.len] = MaybeUninit::new(value);
        self.len += 1;
        Ok(())
    }

    pub fn pop(&mut self) -> Option<T> {
        if self.len == 0 { return None; }
        self.len -= 1;
        // SAFETY: data[len] was initialized (len was > 0 before decrement).
        Some(unsafe { self.data[self.len].assume_init_read() })
    }

    pub fn as_slice(&self) -> &[T] {
        // SAFETY: data[0..len] are all initialized, and MaybeUninit<T>
        // has the same layout as T.
        unsafe { std::slice::from_raw_parts(self.data.as_ptr() as *const T, self.len) }
    }

    pub fn len(&self) -> usize { self.len }
    pub fn is_empty(&self) -> bool { self.len == 0 }
}

impl<T, const N: usize> Drop for FixedVec<T, N> {
    fn drop(&mut self) {
        // SAFETY: data[0..len] are initialized — drop each one.
        for i in 0..self.len {
            unsafe { self.data[i].assume_init_drop(); }
        }
    }
}

fn main() {
    let mut v = FixedVec::<String, 4>::new();
    v.push("hello".into()).unwrap();
    v.push("world".into()).unwrap();
    assert_eq!(v.as_slice(), &["hello", "world"]);
    assert_eq!(v.pop(), Some("world".into()));
    assert_eq!(v.len(), 1);
    // Drop cleans up remaining "hello"
}

Exercise 8: Declarative Macro — map! (Ch 13) ★ (~15 min) / 练习 8:声明式宏 —— map!(第 13 章)

Write a map! macro that creates a HashMap from key-value pairs, similar to vec![]:

编写一个 map! 宏,用于从键值对创建 HashMap,类似于 vec![]

#![allow(unused)]
fn main() {
let m = map! {
    "host" => "localhost",
    "port" => "8080",
};
assert_eq!(m.get("host"), Some(&"localhost"));
assert_eq!(m.len(), 2);
}

Requirements / 要求:

  • Support trailing comma / 支持尾随逗号
  • Support empty invocation map!{} / 支持空调用 map!{}
  • Work with any types that implement Into<K> and Into<V> for maximum flexibility / 为了最大灵活性,使其支持实现了 Into<K>Into<V> 的任何类型
🔑 Solution / 参考答案 (Ex 8)
macro_rules! map {
    // Empty case
    () => {
        std::collections::HashMap::new()
    };
    // One or more key => value pairs (trailing comma optional)
    ( $( $key:expr => $val:expr ),+ $(,)? ) => {{
        let mut m = std::collections::HashMap::new();
        $( m.insert($key, $val); )+
        m
    }};
}

fn main() {
    // Basic usage:
    let config = map! {
        "host" => "localhost",
        "port" => "8080",
        "timeout" => "30",
    };
    assert_eq!(config.len(), 3);
    assert_eq!(config["host"], "localhost");

    // Empty map:
    let empty: std::collections::HashMap<String, String> = map!();
    assert!(empty.is_empty());

    // Different types:
    let scores = map! {
        1 => 100,
        2 => 200,
    };
    assert_eq!(scores[&1], 100);
}

Exercise 9: Custom serde Deserialization (Ch 11) ★★★ (~45 min) / 练习 9:自定义 serde 反序列化(第 11 章)

Design a Duration wrapper that deserializes from human-readable strings like "30s", "5m", "2h" using a custom serde deserializer. The struct should also serialize back to the same format.

设计一个 Duration 包装器,使用自定义 serde 反序列化器从人类可读的字符串(如 "30s""5m""2h")中进行反序列化。该结构体还应当能序列化回相同的格式。

🔑 Solution / 参考答案 (Ex 9)
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::fmt;

#[derive(Debug, Clone, PartialEq)]
struct HumanDuration(std::time::Duration);

impl HumanDuration {
    fn from_str(s: &str) -> Result<Self, String> {
        let s = s.trim();
        if s.is_empty() { return Err("empty duration string".into()); }

        let (num_str, suffix) = s.split_at(
            s.find(|c: char| !c.is_ascii_digit()).unwrap_or(s.len())
        );
        let value: u64 = num_str.parse()
            .map_err(|_| format!("invalid number: {num_str}"))?;

        let duration = match suffix {
            "s" | "sec"  => std::time::Duration::from_secs(value),
            "m" | "min"  => std::time::Duration::from_secs(value * 60),
            "h" | "hr"   => std::time::Duration::from_secs(value * 3600),
            "ms"         => std::time::Duration::from_millis(value),
            other        => return Err(format!("unknown suffix: {other}")),
        };
        Ok(HumanDuration(duration))
    }
}

impl fmt::Display for HumanDuration {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let secs = self.0.as_secs();
        if secs == 0 {
            write!(f, "{}ms", self.0.as_millis())
        } else if secs % 3600 == 0 {
            write!(f, "{}h", secs / 3600)
        } else if secs % 60 == 0 {
            write!(f, "{}m", secs / 60)
        } else {
            write!(f, "{}s", secs)
        }
    }
}

impl Serialize for HumanDuration {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        serializer.serialize_str(&self.to_string())
    }
}

impl<'de> Deserialize<'de> for HumanDuration {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        let s = String::deserialize(deserializer)?;
        HumanDuration::from_str(&s).map_err(serde::de::Error::custom)
    }
}

#[derive(Debug, Deserialize, Serialize)]
struct Config {
    timeout: HumanDuration,
    retry_interval: HumanDuration,
}

fn main() {
    let json = r#"{ "timeout": "30s", "retry_interval": "5m" }"#;
    let config: Config = serde_json::from_str(json).unwrap();

    assert_eq!(config.timeout.0, std::time::Duration::from_secs(30));
    assert_eq!(config.retry_interval.0, std::time::Duration::from_secs(300));

    // Round-trips correctly:
    let serialized = serde_json::to_string(&config).unwrap();
    assert!(serialized.contains("30s"));
    assert!(serialized.contains("5m"));
    println!("Config: {serialized}");
}

Exercise 10 — Concurrent Fetcher with Timeout ★★ (~25 min) / 练习 10 —— 带有超时的并发获取器

Write an async function fetch_all that spawns three tokio::spawn tasks, each simulating a network call with tokio::time::sleep. Join all three with tokio::try_join! wrapped in tokio::time::timeout(Duration::from_secs(5), ...). Return Result<Vec<String>, ...> or an error if any task fails or the deadline expires.

编写一个异步函数 fetch_all,生成三个 tokio::spawn 任务,每个任务使用 tokio::time::sleep 模拟网络调用。使用 tokio::try_join! 将三者结合,并包装在 tokio::time::timeout(Duration::from_secs(5), ...) 中。如果任何任务失败或截止时间到期,则返回错误,否则返回 Result<Vec<String>, ...>

Learning goals / 学习目标tokio::spawn, try_join!, timeout, error propagation across task boundaries. / tokio::spawntry_join!timeout 以及跨任务边界的错误传播。

Hint / 提示

Each spawned task returns Result<String, _>. try_join! unwraps all three. Wrap the whole try_join! in timeout() — the Elapsed error means you hit the deadline.

每个生成的任务都返回 Result<String, _>try_join! 会解包这三个任务。将整个 try_join! 包装在 timeout() 中 —— Elapsed 错误意味着你触发了截止时间。

Solution / 参考答案 (Ex 10)
use tokio::time::{sleep, timeout, Duration};

async fn fake_fetch(name: &'static str, delay_ms: u64) -> Result<String, String> {
    sleep(Duration::from_millis(delay_ms)).await;
    Ok(format!("{name}: OK"))
}

async fn fetch_all() -> Result<Vec<String>, Box<dyn std::error::Error>> {
    let deadline = Duration::from_secs(5);

    let (a, b, c) = timeout(deadline, async {
        let h1 = tokio::spawn(fake_fetch("svc-a", 100));
        let h2 = tokio::spawn(fake_fetch("svc-b", 200));
        let h3 = tokio::spawn(fake_fetch("svc-c", 150));
        tokio::try_join!(h1, h2, h3)
    })
    .await??; // first ? = timeout, second ? = join

    Ok(vec![a?, b?, c?]) // unwrap inner Results
}

#[tokio::main]
async fn main() {
    let results = fetch_all().await.unwrap();
    for r in &results {
        println!("{r}");
    }
}

Exercise 11 — Async Channel Pipeline ★★★ (~40 min) / 练习 11 —— 异步通道流水线

Build a producer → transformer → consumer pipeline using tokio::sync::mpsc:

使用 tokio::sync::mpsc 构建一个“生产者 → 转换器 → 消费者”流水线:

  1. Producer / 生产者:sends integers 1..=20 into channel A (capacity 4). / 将整数 1..=20 发送到通道 A(容量为 4)。
  2. Transformer / 转换器:reads from channel A, squares each value, sends into channel B. / 从通道 A 读取,计算各值的平方,然后发送到通道 B。
  3. Consumer / 消费者:reads from channel B, collects into a Vec<u64>, returns it. / 从通道 B 读取,收集到 Vec<u64> 中并返回。

All three stages run as concurrent tokio::spawn tasks. Use bounded channels to demonstrate back-pressure. Assert the final vec equals [1, 4, 9, ..., 400].

这三个阶段都作为并发的 tokio::spawn 任务运行。使用有界通道来演示背压(back-pressure)。断言最后的向量等于 [1, 4, 9, ..., 400]

Learning goals / 学习目标mpsc::channel, bounded back-pressure, tokio::spawn with move closures, graceful shutdown via channel close. / mpsc::channel、有界背压、带 move 闭包的 tokio::spawn、通过通道关闭实现优雅停机。

Solution / 参考答案 (Ex 11)
use tokio::sync::mpsc;

#[tokio::main]
async fn main() {
    let (tx_a, mut rx_a) = mpsc::channel::<u64>(4); // bounded — back-pressure
    let (tx_b, mut rx_b) = mpsc::channel::<u64>(4);

    // Producer
    let producer = tokio::spawn(async move {
        for i in 1..=20u64 {
            tx_a.send(i).await.unwrap();
        }
        // tx_a dropped here → channel A closes
    });

    // Transformer
    let transformer = tokio::spawn(async move {
        while let Some(val) = rx_a.recv().await {
            tx_b.send(val * val).await.unwrap();
        }
        // tx_b dropped here → channel B closes
    });

    // Consumer
    let consumer = tokio::spawn(async move {
        let mut results = Vec::new();
        while let Some(val) = rx_b.recv().await {
            results.push(val);
        }
        results
    });

    producer.await.unwrap();
    transformer.await.unwrap();
    let results = consumer.await.unwrap();

    let expected: Vec<u64> = (1..=20).map(|x: u64| x * x).collect();
    assert_eq!(results, expected);
    println!("Pipeline complete: {results:?}");
}

Summary and Reference Card / 总结与速查卡

Quick Reference Card / 快速参考卡

Pattern Decision Guide / 模式决策指南

Need type safety for primitives? / 需要原始类型的类型安全性?
└── Newtype pattern (Ch3) / 新类型模式(第 3 章)

Need compile-time state enforcement? / 需要在编译时强制执行状态检查?
└── Type-state pattern (Ch3) / 类型状态模式(第 3 章)

Need a "tag" with no runtime data? / 需要一个不带运行时数据的“标签”?
└── PhantomData (Ch4) / PhantomData(第 4 章 / 原书为 Ch4)

Need to break Rc/Arc reference cycles? / 需要打破 Rc/Arc 的循环引用?
└── Weak<T> / sync::Weak<T> (Ch9) / 弱引用(第 9 章)

Need to wait for a condition without busy-looping? / 需要在不忙轮询的情况下等待某个条件?
└── Condvar + Mutex (Ch6) / 条件变量 + 互斥锁(第 6 章)

Need to handle "one of N types"? / 需要处理“N 选 1”的类型?
├── Known closed set → Enum / 已知的封闭集合 → 枚举
├── Open set, hot path → Generics / 开放集合、热点路径 → 泛型
├── Open set, cold path → dyn Trait / 开放集合、非热点路径 → 动态分发(dyn Trait)
└── Completely unknown types → Any + TypeId (Ch2) / 完全未知的类型 → Any + TypeId(第 2 章)

Need shared state across threads? / 需要跨线程共享状态?
├── Simple counter/flag → Atomics / 简单的计数器/标志 → 原子操作
├── Short critical section → Mutex / 较短的临界区 → 互斥锁
├── Read-heavy → RwLock / 读多写少 → 读写锁
├── Lazy one-time init → OnceLock / LazyLock (Ch6) / 惰性的一次性初始化 → OnceLock / LazyLock(第 6 章)
└── Complex state → Actor + Channels / 复杂状态 → Actor 模式 + 通道

Need to parallelize computation? / 需要将计算并行化?
├── Collection processing → rayon::par_iter / 集合处理 → rayon::par_iter
├── Background task → thread::spawn / 后台任务 → thread::spawn
└── Borrow local data → thread::scope / 借用本地数据 → thread::scope

Need async I/O or concurrent networking? / 需要异步 I/O 或并发网络?
├── Basic → tokio + async/await (Ch16) / 基础 → tokio + async/await(第 16 章)
└── Advanced (streams, middleware) → see Async Rust Training / 进阶(流、中间件)→ 参见 Async Rust 进阶指南

Need error handling? / 需要错误处理?
├── Library → thiserror (#[derive(Error)]) / 库 → thiserror
└── Application → anyhow (Result<T>) / 应用程序 → anyhow

Need to prevent a value from being moved? / 需要防止某个值被移动?
└── Pin<T> (Ch9) / Pin(第 9 章)— required for Futures, self-referential types / Future 及自引用类型所需

Trait Bounds Cheat Sheet / Trait 约束速查表

Bound / 约束Meaning / 含义
T: CloneCan be duplicated / 可被复制
T: SendCan be moved to another thread / 可被移动到另一个线程
T: Sync&T can be shared between threads / 其不可变引用 &T 可跨线程共享
T: 'staticContains no non-static references / 不包含非静态引用的生命周期
T: SizedSize known at compile time (default) / 编译时大小已知(默认)
T: ?SizedSize may not be known ([T], dyn Trait) / 大小可能未知
T: UnpinSafe to move after pinning / 被固定(pin)后仍可安全地移动
T: DefaultHas a default value / 具有默认值
T: Into<U>Can be converted to U / 可以转换为类型 U
T: AsRef<U>Can be borrowed as &U / 可以作为 &U 被借用
T: Deref<Target = U>Auto-derefs to &U / 自动解引用为 &U
F: Fn(A) -> BCallable, borrows state immutably / 可调用,以不可变方式借用状态
F: FnMut(A) -> BCallable, may mutate state / 可调用,可能会修改状态
F: FnOnce(A) -> BCallable exactly once, may consume state / 仅可被调用一次,可能会消耗状态

Lifetime Elision Rules / 生命周期省略规则

The compiler inserts lifetimes automatically in three cases (so you don’t have to):

编译器会在以下三种情况下自动插入生命周期(无需手动标注):

#![allow(unused)]
fn main() {
// Rule 1: Each reference parameter gets its own lifetime
// 规则 1:每一个引用类型的参数都会获得其各自的生命周期
// fn foo(x: &str, y: &str)  →  fn foo<'a, 'b>(x: &'a str, y: &'b str)

// Rule 2: If there's exactly ONE input lifetime, it's used for all outputs
// 规则 2:如果恰好只有一个输入参数的生命周期,它将被用于所有的输出。
// fn foo(x: &str) -> &str   →  fn foo<'a>(x: &'a str) -> &'a str

// Rule 3: If one parameter is &self or &mut self, its lifetime is used
// 规则 3:如果包含 &self 或 &mut self 参数,则该生命周期将用于输出。
// fn foo(&self, x: &str) -> &str  →  fn foo<'a>(&'a self, x: &str) -> &'a str
}

When you MUST write explicit lifetimes / 必须手动编写显式生命周期的情况

  • Multiple input references and a reference output (compiler can’t guess which input) / 存在多个输入引用参数且有一个返回引用结果(编译器无法推断应遵循哪个输入)
  • Struct fields that hold references: struct Ref<'a> { data: &'a str } / 结构体持有引用类型的字段:struct Ref<'a> { data: &'a str }
  • 'static bounds when you need data without borrowed references / 当需要不带任何被借用引用的数据时,使用 'static 约束

Common Derive Traits / 常用的 Derive Trait

#![allow(unused)]
fn main() {
#[derive(
    Debug,          // {:?} formatting / 格式化输出
    Clone,          // .clone()
    Copy,           // Implicit copy (only for simple types) / 隐式拷贝(仅适用于简单类型)
    PartialEq, Eq,  // == comparison / 等值比较
    PartialOrd, Ord, // < > comparison + sorting / 大小比较 + 排序
    Hash,           // HashMap/HashSet key / HashMap/HashSet 键名
    Default,        // Type::default() / 默认值
)]
struct MyType { /* ... */ }
}

Module Visibility Quick Reference / 模块可见性快速参考

pub           → visible everywhere / 到处可见
pub(crate)    → visible within the crate / 在当前 crate 内可见
pub(super)    → visible to parent module / 对父模块可见
pub(in path)  → visible within a specific path / 在特定路径内可见
(nothing)     → private to current module + children / 对当前模块及其子模块私有

Further Reading / 延伸阅读

Resource / 资源Why / 推荐理由
Rust Design PatternsCatalog of idiomatic patterns and anti-patterns / 惯用模式与反模式的百科目录
Rust API GuidelinesOfficial checklist for polished public APIs / 官方发布的公共 API 完善检查清单
Rust Atomics and LocksMara Bos’s deep dive into concurrency primitives / Mara Bos 对并发原语的深入探讨
The RustonomiconOfficial guide to unsafe Rust and dark corners / 关于 Unsafe Rust 与黑暗角落的官方指南
Error Handling in RustAndrew Gallant’s comprehensive guide / Andrew Gallant 撰写的错误处理全面指南
Jon Gjengset — Crust of Rust seriesDeep dives into iterators, lifetimes, channels, etc. / 深入探讨迭代器、生命周期、通道等专题
Effective Rust35 specific ways to improve your Rust code / 35 个改进 Rust 代码的具体方法

End of Rust Patterns & Engineering How-Tos

《Rust 模式与工程实务指南》—— 完

Capstone Project: Type-Safe Task Scheduler / 综合项目:类型安全的任务调度器

This project integrates patterns from across the book into a single, production-style system. You’ll build a type-safe, concurrent task scheduler that uses generics, traits, typestate, channels, error handling, and testing.

该项目将本书各章节中的模式整合到一个生产级别的系统中。你将构建一个 类型安全的并发任务调度器,其中会用到泛型、trait、类型状态(typestate)、通道、错误处理和测试。

Estimated time / 估计用时:4–6 hours | Difficulty / 难度:★★★

What you’ll practice / 你将练习:

  • Generics and trait bounds (Ch 1–2) / 泛型与 trait 约束(第 1–2 章)
  • Typestate pattern for task lifecycle (Ch 3) / 用于任务生命周期的类型状态模式(第 3 章)
  • PhantomData for zero-cost state markers (Ch 4) / 用于零成本状态标记的 PhantomData(第 4 章)
  • Channels for worker communication (Ch 5) / 用于工作线程通信的通道(第 5 章)
  • Concurrency with scoped threads (Ch 6) / 使用作用域线程的并发(第 6 章)
  • Error handling with thiserror (Ch 10) / 使用 thiserror 进行错误处理(第 10 章)
  • Testing with property-based tests (Ch 14) / 使用基于属性的测试进行测试(第 14 章)
  • API design with TryFrom and validated types (Ch 15) / 使用 TryFrom 和经验证类型的 API 设计(第 15 章)

The Problem / 问题背景描述

Build a task scheduler where / 构建一个满足以下要求的任务调度器:

  1. Tasks have a typed lifecycle: Pending → Running → Completed (or Failed) / 任务 具有类型化的生命周期:等待(Pending) → 运行中(Running) → 已完成(Completed)(或 失败(Failed)
  2. Workers pull tasks from a channel, execute them, and report results / 工作线程 从通道拉取任务,执行并上报结果
  3. The scheduler manages task submission, worker coordination, and result collection / 调度器 管理任务提交、工作线程协调和结果收集
  4. Invalid state transitions are compile-time errors / 无效的状态转换应导致 编译时错误
stateDiagram-v2
    [*] --> Pending: scheduler.submit(task)
    Pending --> Running: worker picks up task / 工作线程提取任务
    Running --> Completed: task succeeds / 任务成功
    Running --> Failed: task returns Err / 任务返回 Err
    Completed --> [*]: scheduler.results()
    Failed --> [*]: scheduler.results()

    Pending --> Pending: ❌ can't execute directly / 无法直接执行
    Completed --> Running: ❌ can't re-run / 无法重新运行

Step 1: Define the Task Types / 第一步:定义任务类型

Start with the typestate markers and a generic Task:

从类型状态标记(markers)和一个泛型的 Task 结构体开始:

#![allow(unused)]
fn main() {
use std::marker::PhantomData;

// --- State markers (zero-sized) / 状态标记(零大小) ---
struct Pending;
struct Running;
struct Completed;
struct Failed;

// --- Task ID (newtype for type safety) / 任务 ID(用于类型安全的新类型) ---
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
struct TaskId(u64);

// --- The Task struct, parameterized by lifecycle state / Task 结构体,由生命周期状态及其参数化 ---
struct Task<State, R> {
    id: TaskId,
    name: String,
    _state: PhantomData<State>,
    _result: PhantomData<R>,
}
}

Your job / 你的任务:Implement state transitions so that:

实现以下状态转换:

  • Task<Pending, R> can transition to Task<Running, R> (via start()) / Task<Pending, R> 可转换为 Task<Running, R>(通过 start()
  • Task<Running, R> can transition to Task<Completed, R> or Task<Failed, R> / Task<Running, R> 可转换为 Task<Completed, R>Task<Failed, R>
  • No other transitions compile / 其他转换均无法通过编译
💡 Hint / 提示

Each transition method should consume self and return the new state:

每一个转换方法都应该消耗(consume) self 并返回新状态:

#![allow(unused)]
fn main() {
impl<R> Task<Pending, R> {
    fn start(self) -> Task<Running, R> {
        Task {
            id: self.id,
            name: self.name,
            _state: PhantomData,
            _result: PhantomData,
        }
    }
}
}

Step 2: Define the Work Function / 第二步:定义工作函数

Tasks need a function to execute. Use a boxed closure: / 任务需要一个可执行的函数。请使用装箱的闭包(boxed closure):

#![allow(unused)]
fn main() {
struct WorkItem<R: Send + 'static> {
    id: TaskId,
    name: String,
    work: Box<dyn FnOnce() -> Result<R, String> + Send>,
}
}

Your job / 你的任务:Implement WorkItem::new() that accepts a task name and closure. Add a TaskId generator (simple atomic counter or mutex-protected counter).

实现 WorkItem::new(),使其接收任务名称和闭包。添加一个 TaskId 生成器(简单的原子计数器或受互斥锁保护的计数器)。

Step 3: Error Handling / 第三步:错误处理

Define the scheduler’s error types using thiserror: / 使用 thiserror 定义调度器的错误类型:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum SchedulerError {
    #[error("scheduler is shut down")]
    ShutDown,

    #[error("task {0:?} failed: {1}")]
    TaskFailed(TaskId, String),

    #[error("channel send error")]
    ChannelError(#[from] std::sync::mpsc::SendError<()>),

    #[error("worker panicked")]
    WorkerPanic,
}

Step 4: The Scheduler / 第四步:调度器

Build the scheduler using channels (Ch 5) and scoped threads (Ch 6):

使用通道(第 5 章)和作用域线程(第 6 章)构建调度器:

#![allow(unused)]
fn main() {
use std::sync::mpsc;

struct Scheduler<R: Send + 'static> {
    sender: Option<mpsc::Sender<WorkItem<R>>>,
    results: mpsc::Receiver<TaskResult<R>>,
    num_workers: usize,
}

struct TaskResult<R> {
    id: TaskId,
    name: String,
    outcome: Result<R, String>,
}
}

Your job / 你的任务:Implement:

  • Scheduler::new(num_workers: usize) -> Self — creates channels and spawns workers / 创建通道并生成工作线程
  • Scheduler::submit(&self, item: WorkItem<R>) -> Result<TaskId, SchedulerError> / 提交任务
  • Scheduler::shutdown(self) -> Vec<TaskResult<R>> — drops the sender, joins workers, collects results / 丢弃发送端、合并(join)工作线程、收集结果
💡 Hint — Worker loop / 提示 —— 工作线程循环
#![allow(unused)]
fn main() {
fn worker_loop<R: Send + 'static>(
    rx: std::sync::Arc<std::sync::Mutex<mpsc::Receiver<WorkItem<R>>>>,
    result_tx: mpsc::Sender<TaskResult<R>>,
    worker_id: usize,
) {
    loop {
        let item = {
            let rx = rx.lock().unwrap();
            rx.recv()
        };
        match item {
            Ok(work_item) => {
                let outcome = (work_item.work)();
                let _ = result_tx.send(TaskResult {
                    id: work_item.id,
                    name: work_item.name,
                    outcome,
                });
            }
            Err(_) => break, // Channel closed / 通道已关闭
        }
    }
}
}

Step 5: Integration Test / 第五步:集成测试

Write tests that verify:

编写测试来验证以下内容:

  1. Happy path / 正常路径:Submit 10 tasks, shut down, verify all 10 results are Ok / 提交 10 个任务,关闭调度器,验证全部 10 个结果均为 Ok
  2. Error handling / 错误处理:Submit tasks that fail, verify TaskResult.outcome is Err / 提交会失败的任务,验证 TaskResult.outcomeErr
  3. Empty scheduler / 空调度器:Create and immediately shut down — no panics / 创建并立即关闭 —— 不应产生 panic
  4. Property test (bonus) / 属性测试(加分项):Use proptest to verify that for any N tasks (1..100), the scheduler always returns exactly N results / 使用 proptest 验证对于任意 N 个任务(1..100),调度器始终准确返回 N 个结果
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn happy_path() {
        let scheduler = Scheduler::<String>::new(4);

        for i in 0..10 {
            let item = WorkItem::new(
                format!("task-{i}"),
                move || Ok(format!("result-{i}")),
            );
            scheduler.submit(item).unwrap();
        }

        let results = scheduler.shutdown();
        assert_eq!(results.len(), 10);
        for r in &results {
            assert!(r.outcome.is_ok());
        }
    }

    #[test]
    fn handles_failures() {
        let scheduler = Scheduler::<String>::new(2);

        scheduler.submit(WorkItem::new("good", || Ok("ok".into()))).unwrap();
        scheduler.submit(WorkItem::new("bad", || Err("boom".into()))).unwrap();

        let results = scheduler.shutdown();
        assert_eq!(results.len(), 2);

        let failures: Vec<_> = results.iter()
            .filter(|r| r.outcome.is_err())
            .collect();
        assert_eq!(failures.len(), 1);
    }
}
}

Step 6: Put It All Together / 第六步:融会贯通

Here’s the main() that demonstrates the full system:

以下是展示完整系统的 main() 函数:

fn main() {
    let scheduler = Scheduler::<String>::new(4);

    // Submit tasks with varying workloads / 提交具有不同工作负载的任务
    for i in 0..20 {
        let item = WorkItem::new(
            format!("compute-{i}"),
            move || {
                // Simulate work / 模拟工作
                std::thread::sleep(std::time::Duration::from_millis(10));
                if i % 7 == 0 {
                    Err(format!("task {i} hit a simulated error / 任务 {i} 触发模拟错误"))
                } else {
                    Ok(format!("task {i} completed with value {} / 任务 {i} 完成,值为 {}", i * i, i * i))
                }
            },
        );
        // NOTE: .unwrap() is used for brevity — handle SendError in production.
        // 注意:此处使用 .unwrap() 是为了简洁 —— 在生产环境中请处理 SendError。
        scheduler.submit(item).unwrap();
    }

    println!("All tasks submitted. Shutting down...");
    let results = scheduler.shutdown();

    let (ok, err): (Vec<_>, Vec<_>) = results.iter()
        .partition(|r| r.outcome.is_ok());

    println!("\n✅ Succeeded: {}", ok.len());
    for r in &ok {
        println!("  {} → {}", r.name, r.outcome.as_ref().unwrap());
    }

    println!("\n❌ Failed: {}", err.len());
    for r in &err {
        println!("  {} → {}", r.name, r.outcome.as_ref().unwrap_err());
    }
}

Evaluation Criteria / 评价标准

Criterion / 准则Target / 目标
Type safety / 类型安全Invalid state transitions don’t compile / 无效的状态转换无法通过编译
Concurrency / 并发Workers run in parallel, no data races / 工作线程并行运行,无数据竞争
Error handling / 错误处理All failures captured in TaskResult, no panics / 所有失败均被 TaskResult 捕获,不产生 panic
Testing / 测试At least 3 tests; bonus for proptest / 至少 3 个测试;proptest 为加分项
Code organization / 代码组织Clean module structure, public API uses validated types / 模块结构清晰,公共 API 使用经验证的类型
Documentation / 文档Key types have doc comments explaining invariants / 关键类型备有解释其不变性(invariants)的文档注释

Extension Ideas / 拓展思路

Once the basic scheduler works, try these enhancements:

基本调度器运行成功后,可以尝试以下增强功能:

  1. Priority queue / 优先级队列:Add a Priority newtype (1–10) and process higher-priority tasks first / 添加 Priority 新类型(1–10)并优先处理高优先级任务
  2. Retry policy / 重试策略:Failed tasks retry up to N times before being marked permanently failed / 失败的任务在被标记为永久失败前最多重试 N 次
  3. Cancellation / 取消机制:Add a cancel(TaskId) method that removes pending tasks / 添加 cancel(TaskId) 方法来移除等待中的任务
  4. Async version / 异步版本:Port to tokio::spawn with tokio::sync::mpsc channels (Ch 16) / 迁移到带 tokio::sync::mpsc 通道的 tokio::spawn(第 16 章)
  5. Metrics / 指标统计:Track per-worker task counts, average execution time, and failure rates / 统计每个工作线程的任务数、平均执行时间和失败率