Course Navigation
← Back to Course OverviewAll Lessons
✓
Introduction and Database Fundamentals ✓
Building the Core Data Structure ✓
Concurrency and Thread Safety 4
Append-Only Log (Write-Ahead Log) 5
SSTables and LSM Trees 6
Compaction and Optimization 7
TCP Server and Protocol Design 8
Client Library and Advanced Networking 9
Transactions and ACID Properties 10
Replication and High Availability 11
Monitoring, Metrics, and Observability 12
Performance Optimization and Tuning 13
Configuration and Deployment 14
Security and Production Hardening 15
Final Project and Beyond Current Lesson
4 of 15
Progress 27%
Append-Only Log (Write-Ahead Log)
Learning Objectives
- • Understand why Write-Ahead Logging is essential for durability
- • Master file I/O patterns and binary encoding in Go
- • Implement crash recovery mechanisms
- • Learn different durability levels and their trade-offs
- • Build a complete WAL with checksums and corruption detection
Lesson 4.1: WAL Fundamentals
Why Append-Only Logs Matter
The Write-Ahead Log (WAL) is the foundation of database durability. It ensures that no committed data is lost, even if the system crashes.
The Durability Problem
Without persistence, all data lives in memory:
┌─────────────────────────────────────┐
│ In-Memory Store (RAM) │
│ ┌──────────────────────────────┐ │
│ │ "user:1" → "Alice" │ │
│ │ "user:2" → "Bob" │ │
│ │ "counter" → 42 │ │
│ └──────────────────────────────┘ │
│ │
│ PROBLEM: │
│ Power failure → All data lost! │
│ System crash → All data lost! │
│ OOM killer → All data lost! │
└─────────────────────────────────────┘ Real-world scenario:
Application running:
1. Receive Put("user:100", "Alice")
2. Store in memory
3. Return success to client
Client: "Data saved!"
Immediately after:
- Power failure
- Server reboots
- Data is gone!
Client: "Where's my data?!" The Write-Ahead Log Solution
┌──────────────────────────────────────────┐
│ Application │
│ Put("user:1", "Alice") │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────┐
│ Write to WAL │
│ (Disk) │
└────────────────┘
▲ │
│ │ fsync()
│ ▼
┌────────────────┐
│ OS Buffer │
│ (Page Cache) │
└────────────────┘
│
│ fsync()
▼
┌────────────────┐
│ Disk Storage │
│ (Persistent) │
└────────────────┘
│
▼
Return to client
(Data is safe!)
│
▼
Update in-memory store Key principle: Write to disk BEFORE acknowledging to client.
Timeline:
t0: Client sends Put("key", "value")
t1: Server writes entry to WAL
t2: Server calls fsync() - blocks until on disk
t3: Server updates in-memory store
t4: Server returns success to client
(If crash between t0-t4, can recover from WAL)
If crash at t3:
└─ On restart: Replay WAL, recover Put
└─ Data is safe
If crash after t4:
└─ On restart: Replay WAL
└─ Data was already in memory
└─ Data is safe Log Structure and Format Design
Basic Log Entry Format
Entry:
┌────────────┬─────────┬────────┬─────────┬──────────┐
│ Operation │ KeyLen │ Key │ValueLen │ Value │
│ 1 byte │ 4 bytes │ VarLen │ 4 bytes │ VarLen │
└────────────┴─────────┴────────┴─────────┴──────────┘
Example:
Operation: 0x01 (Put)
KeyLen: 0x00000004 (4 bytes)
Key: "user"
ValueLen: 0x00000005 (5 bytes)
Value: "alice" Log File Format (Multiple Entries)
WAL File (append-only):
┌─────────────────────────────────────────┐
│ Header │
│ - Version: 1 │
│ - CreatedAt: timestamp │
│ - Checksum: CRC32 │
├─────────────────────────────────────────┤
│ Entry 1: Put("user:1", "alice") │
├─────────────────────────────────────────┤
│ Entry 2: Put("user:2", "bob") │
├─────────────────────────────────────────┤
│ Entry 3: Delete("user:3") │
├─────────────────────────────────────────┤
│ Entry 4: Put("counter", "42") │
├─────────────────────────────────────────┤
│ ... (more entries) │
│ │
│ File only grows (append-only) │
│ Never modifies existing entries │
└─────────────────────────────────────────┘ Checksums for Corruption Detection
// Entry with checksum
type LogEntry struct {
Op byte // 0: Put, 1: Delete
KeyLen uint32
Key []byte
ValueLen uint32
Value []byte
Checksum uint32 // CRC32 of (Op + KeyLen + Key + ValueLen + Value)
}
// Verify on read
func (e *LogEntry) Verify() error {
computed := crc32.ChecksumIEEE(e.Serialize())
if computed != e.Checksum {
return ErrChecksumMismatch
}
return nil
} Lesson 4.2: WAL Implementation
File I/O in Go
Basic File Operations
import "os"
// Create/open file
f, err := os.OpenFile(
"wal.log",
os.O_CREATE|os.O_WRONLY|os.O_APPEND, // Create, write-only, append
0644, // Permissions: rw-r--r--
)
if err != nil {
return err
}
defer f.Close()
// Write data
data := []byte("hello")
n, err := f.Write(data)
if err != nil {
return err
}
fmt.Printf("Wrote %d bytes\n", n)
// Force to disk
if err := f.Sync(); err != nil {
return err
}
// Get file size
info, err := f.Stat()
if err != nil {
return err
}
fmt.Printf("File size: %d\n", info.Size())
// Read from file
f2, _ := os.Open("wal.log")
buf := make([]byte, 1024)
n, err := f2.Read(buf) Binary Encoding
Go provides binary encoding in the `encoding/binary` package:
import "encoding/binary"
// Write fixed-size integers
var buf bytes.Buffer
binary.Write(&buf, binary.BigEndian, uint32(42)) // Big-endian encoding
binary.Write(&buf, binary.LittleEndian, uint32(42)) // Little-endian
// Read fixed-size integers
var val uint32
binary.Read(bytes.NewReader(buf.Bytes()), binary.BigEndian, &val)
// Variable-length encoding (varint)
buf2 := bytes.NewBuffer(nil)
binary.PutUvarint([]byte{}, 300) // Encodes efficiently Varint efficiency:
- • Small values (0-127): 1 byte
- • Medium values (128-16383): 2 bytes
- • Large values: 3-10 bytes
Example:
- • Encode 42: 0x2A (1 byte)
- • Encode 300: 0xAC 0x02 (2 bytes)
- • Encode 1000: 0xE8 0x07 (2 bytes)
Writing Log Entries
package persistence
import (
"bytes"
"encoding/binary"
"hash/crc32"
"io"
"os"
"sync"
)
// LogEntry represents a single write operation
type LogEntry struct {
Op uint8 // 0: Put, 1: Delete
Key []byte
Value []byte // Empty for Delete
Checksum uint32
}
// WriteAheadLog manages persistent writes
type WriteAheadLog struct {
file *os.File
mu sync.Mutex
offset int64 // Current file offset
unfsynced int // Entries since last fsync
syncEvery int // Sync every N writes
buf *bytes.Buffer
}
// NewWriteAheadLog creates a new WAL
func NewWriteAheadLog(filepath string, syncEvery int) (*WriteAheadLog, error) {
f, err := os.OpenFile(
filepath,
os.O_CREATE|os.O_WRONLY|os.O_APPEND,
0644,
)
if err != nil {
return nil, err
}
// Get current file size
info, err := f.Stat()
if err != nil {
f.Close()
return nil, err
}
return &WriteAheadLog{
file: f,
offset: info.Size(),
syncEvery: syncEvery,
buf: bytes.NewBuffer(make([]byte, 0, 4096)),
}, nil
}
// Write appends an entry to the WAL
func (w *WriteAheadLog) Write(entry *LogEntry) error {
w.mu.Lock()
defer w.mu.Unlock()
// Serialize entry
data, err := w.serializeEntry(entry)
if err != nil {
return err
}
// Write to file
if _, err := w.file.Write(data); err != nil {
return err
}
w.offset += int64(len(data))
w.unfsynced++
// Check if should fsync
if w.unfsynced >= w.syncEvery {
if err := w.file.Sync(); err != nil {
return err
}
w.unfsynced = 0
}
return nil
} Durability Guarantees
fsync() and OS Buffers
The operating system buffers writes in memory (page cache):
Application writes data:
write() → OS Page Cache → Returns immediately
Data in page cache is NOT durable yet!
└─ Another power failure → Data lost
fsync() forces to disk:
fsync() → Blocks until on persistent storage → Returns
Now data is durable! Performance trade-off:
Without fsync():
- • Write latency: 1-10 µs (memory operation)
- • Throughput: 100M+ ops/sec
- • Durability: NONE (data lost on crash)
With fsync():
- • Write latency: 1-10 ms (disk I/O)
- • Throughput: 100-1000 ops/sec
- • Durability: FULL (data survives crashes)
Durability Levels
Level 0: No Durability (Caching)
func (wal *WAL) Write(entry *LogEntry) error {
buf := entry.Serialize()
_, err := wal.file.Write(buf)
return err
// No fsync() - data in OS cache only
} Level 1: Write-Ahead Log with Periodic fsync
func (wal *WAL) Write(entry *LogEntry) error {
buf := entry.Serialize()
_, err := wal.file.Write(buf)
if err != nil {
return err
}
wal.unfsynced++
// fsync every N writes
if wal.unfsynced >= 1000 {
if err := wal.file.Sync(); err != nil {
return err
}
wal.unfsynced = 0
}
return nil
} Level 2: Strict Durability (fsync every write)
func (wal *WAL) Write(entry *LogEntry) error {
buf := entry.Serialize()
_, err := wal.file.Write(buf)
if err != nil {
return err
}
// Immediate fsync - guarantees durability
return wal.file.Sync()
} Choosing Durability Level
| Level | Latency | Throughput | Durability | Use Case |
|---|---|---|---|---|
| 0 | <1µs | 100M ops/sec | None | Caching only |
| 1 | 100µs | 10K ops/sec | Likely | Web sessions |
| 2 | 10ms | 100 ops/sec | Strict | Financial |
Lesson 4.3: Crash Recovery
Reading and Replay
// LogReader reads entries from a WAL file
type LogReader struct {
file *os.File
reader io.Reader
}
// NewLogReader creates a reader for the WAL
func NewLogReader(filepath string) (*LogReader, error) {
f, err := os.Open(filepath)
if err != nil {
return nil, err
}
return &LogReader{
file: f,
reader: f,
}, nil
}
// Read reads next entry
func (lr *LogReader) Read() (*LogEntry, error) {
entry := &LogEntry{}
// Read operation type
opBuf := make([]byte, 1)
if _, err := io.ReadFull(lr.reader, opBuf); err != nil {
if err == io.EOF {
return nil, io.EOF
}
return nil, err
}
entry.Op = opBuf[0]
// Read key
var keyLen uint32
if err := binary.Read(lr.reader, binary.BigEndian, &keyLen); err != nil {
return nil, err
}
entry.Key = make([]byte, keyLen)
if _, err := io.ReadFull(lr.reader, entry.Key); err != nil {
return nil, err
}
// Read value
var valLen uint32
if err := binary.Read(lr.reader, binary.BigEndian, &valLen); err != nil {
return nil, err
}
entry.Value = make([]byte, valLen)
if _, err := io.ReadFull(lr.reader, entry.Value); err != nil {
return nil, err
}
// Read and verify checksum
var checksum uint32
if err := binary.Read(lr.reader, binary.BigEndian, &checksum); err != nil {
return nil, err
}
entry.Checksum = checksum
// Verify data integrity
if err := entry.VerifyChecksum(); err != nil {
return nil, err
}
return entry, nil
} Crash Recovery Implementation
// RecoveryManager replays WAL on startup
type RecoveryManager struct {
walPath string
store Store
}
// NewRecoveryManager creates a recovery manager
func NewRecoveryManager(walPath string, store Store) *RecoveryManager {
return &RecoveryManager{
walPath: walPath,
store: store,
}
}
// Recover replays the WAL and recovers to last consistent state
func (rm *RecoveryManager) Recover(ctx context.Context) error {
// Check if WAL exists
if _, err := os.Stat(rm.walPath); err != nil {
if os.IsNotExist(err) {
return nil // No WAL, clean start
}
return err
}
reader, err := NewLogReader(rm.walPath)
if err != nil {
return err
}
defer reader.Close()
var recovered, skipped int
for {
entry, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
// Corrupted entry, skip
skipped++
continue
}
// Replay operation
switch entry.Op {
case 0: // Put
if err := rm.store.Put(ctx, entry.Key, entry.Value); err != nil {
return err
}
recovered++
case 1: // Delete
if err := rm.store.Delete(ctx, entry.Key); err != nil {
return err
}
recovered++
}
}
log.Printf("Recovered %d entries, skipped %d corrupted entries", recovered, skipped)
return nil
} Log Rotation
// LogRotator manages WAL file rotation
type LogRotator struct {
baseDir string
maxFileSize int64
currentWAL *WriteAheadLog
rotationMu sync.Mutex
}
// Rotate switches to a new WAL file
func (lr *LogRotator) Rotate() error {
lr.rotationMu.Lock()
defer lr.rotationMu.Unlock()
// Close current file
if lr.currentWAL != nil {
if err := lr.currentWAL.Sync(); err != nil {
return err
}
if err := lr.currentWAL.Close(); err != nil {
return err
}
}
// Create new WAL with timestamp
filename := fmt.Sprintf("wal-%d.log", time.Now().UnixNano())
filepath := filepath.Join(lr.baseDir, filename)
wal, err := NewWriteAheadLog(filepath, 100)
if err != nil {
return err
}
lr.currentWAL = wal
return nil
} Hands-on Lab: Implement Write-Ahead Log with crash recovery
Build a persistent WAL with crash recovery.
Lab Tasks:
- 1. Implement WriteAheadLog with checksums
- 2. Support configurable fsync frequency
- 3. Implement LogReader with checksum verification
- 4. Build RecoveryManager for crash recovery
- 5. Write comprehensive tests
- 6. Benchmark different fsync rates
- 7. Test with corrupted entries
Summary & Learning Outcomes
What You've Learned:
- • Why Write-Ahead Logging is essential for durability
- • File I/O patterns and binary encoding in Go
- • Different durability levels and their trade-offs
- • Checksums and corruption detection
- • Crash recovery mechanisms
- • Log rotation and management
Key Takeaways
- • Always write to disk before acknowledging to client
- • Choose durability level based on your use case
- • Use checksums to detect corruption
- • Design for crash recovery from day one
- • fsync() is expensive but necessary for durability