Course Navigation
← Back to Course OverviewAll Lessons
Introduction and Database Fundamentals
Learning Objectives
- • Understand what key-value databases are and their characteristics
- • Learn when to use key-value stores vs other database types
- • Analyze functional and non-functional requirements
- • Design a layered architecture for our database
- • Plan the project structure and milestones
Lesson 1.1: Understanding Key-Value Stores
What is a Key-Value Database?
A key-value database is the simplest form of NoSQL database that stores data as a collection of key-value pairs. Think of it like a massive, persistent hash map or dictionary that survives system restarts.
Core Concepts:
- Key: A unique identifier (string, integer, or binary data)
- Value: The data associated with the key (can be any data type)
- Operations: Primarily GET, PUT, DELETE
- Simplicity: No schema, no SQL, no complex queries
Real-world example:
Key: "user:1001" Value: {"name": "John", "age": 30}
Key: "session:abc123" Value: "active"
Key: "counter:pageviews" Value: 15420 Characteristics of Key-Value Stores
Strengths:
- • High Performance: O(1) average lookup time
- • Horizontal Scalability: Easy to partition data across nodes
- • Flexibility: Schema-less design allows storing any data
- • Low Latency: Optimized for fast reads and writes
- • Simplicity: Easy to understand and implement
Limitations:
- • No Complex Queries: Can't search by value or join data
- • No Relationships: No foreign keys or references
- • Limited Indexing: Only the key is indexed
- • Range Queries: Difficult without sorted keys
- • No Transactions: Limited ACID support in basic implementations
Use Cases and Trade-offs
Ideal Use Cases:
Session Storage
Web application sessions with session ID as key
Why: Fast lookup, short-lived data, millions of concurrent sessions
Caching Layer
Cache frequently accessed data
Why: Sub-millisecond access, reduces database load
Real-time Analytics
Counters, metrics, leaderboards
Why: Atomic increment operations, extremely fast
When NOT to use Key-Value Stores:
- • Complex reporting and analytics
- • Data with many relationships
- • Full-text search requirements
- • Complex transactions spanning multiple keys
- • Data with intricate query patterns
Comparison with Other Database Types
| Feature | Key-Value | Relational | Document | Graph |
|---|---|---|---|---|
| Query Complexity | Simple (by key) | Complex (SQL) | Moderate (JSON) | Graph queries |
| Performance | Fastest | Slower | Fast | Medium |
| Schema Flexibility | None | Rigid | Flexible | Flexible |
| Scalability | Excellent | Difficult | Good | Medium |
Real-world Examples
Redis
- • In-memory key-value store with optional persistence
- • Supports advanced data types (lists, sets, sorted sets, streams)
- • Widely used for caching, sessions, real-time analytics
- • Performance: 100,000+ ops/sec per instance
etcd
- • Distributed key-value store based on Raft consensus
- • Strong consistency guarantees
- • Used for configuration management, service discovery
- • Built-in watch mechanism for key changes
Memcached
- • Simple in-memory cache
- • Very high performance for basic key-value operations
- • No persistence, data lost on restart
- • Used for caching in large-scale systems
LevelDB
- • Embedded key-value store (part of a larger application)
- • Sorted keys, efficient range queries
- • Durability through write-ahead logging
- • Written in C++, bindings available for many languages
Lesson 1.2: Core Requirements Analysis
Defining Functional Requirements
Functional requirements describe what the system does. For our key-value database:
Basic CRUD Operations
Create/Put Operation:
- INPUT: key (string), value (any data)
- OUTPUT: success/error
- SIDE EFFECT: stores or updates value at key
Read/Get Operation:
- INPUT: key (string)
- OUTPUT: value (if exists), error (if not found)
- SIDE EFFECT: none (read-only)
Delete Operation:
- INPUT: key (string)
- OUTPUT: success/error
- SIDE EFFECT: removes key-value pair
Extended Operations
Scan/Range Query:
Iterate over keys in sorted order, return all key-value pairs in a range
Batch Operations:
Apply multiple operations atomically - all succeed or all fail
Exists/Contains:
Check if a key exists without retrieving value
Defining Non-Functional Requirements
Non-functional requirements describe how well the system performs. These are critical for production systems.
Performance Requirements
Throughput:
- • Single-threaded: 100,000+ ops/sec for in-memory operations
- • Multi-threaded: 1,000,000+ ops/sec with 16 cores
- • Network operations: 50,000+ ops/sec over TCP
Latency (p99 - 99th percentile):
- • In-memory reads: < 1 millisecond
- • In-memory writes: < 5 milliseconds
- • Disk reads (cached): < 10 milliseconds
- • Disk writes (persisted): < 100 milliseconds
Durability Requirements
Write-Ahead Logging (WAL):
- • Every write must be logged to disk before returning success
- • Survives system crashes
- • No data loss on power failure
Durability Levels:
- • No durability: Fastest, data lost on crash (for caching only)
- • Write-ahead log: Moderate speed, no data loss
- • Synchronous writes: Slowest, highest durability guarantee
Consistency vs Availability Trade-off (CAP Theorem)
The CAP theorem states a distributed system can guarantee at most two of three properties:
Consistency
All nodes see the same data
Availability
System responds to requests
Partition Tolerance
System works despite network splits
Lesson 1.3: Architecture Planning
High-level System Design
Our key-value database will have layered architecture:
┌─────────────────────────────────────────────┐
│ Client Applications │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Network Layer (TCP/Protocol Handling) │
│ - Connection management │
│ - Request/Response serialization │
│ - Pipelining support │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Query Processor & Executor │
│ - Parse commands │
│ - Execute operations │
│ - Enforce consistency │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Transaction & Concurrency Manager │
│ - MVCC implementation │
│ - Locking mechanism │
│ - Transaction coordination │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Storage Engine (Memory + Disk) │
│ ┌──────────────┬──────────────────────────┐│
│ │ In-Memory │ Persistence Layer ││
│ │ - HashMap │ - WAL (Write-Ahead Log) ││
│ │ - LSM Tree │ - SSTables ││
│ │ - Cache │ - Compaction ││
│ └──────────────┴──────────────────────────┘│
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Replication & Consensus (optional) │
│ - Leader-follower sync │
│ - Raft consensus │
│ - Conflict resolution │
└─────────────────────────────────────────────┘ Component Identification
1. Storage Engine (Core)
Responsibility: Physically store and retrieve data
- • Memtable: In-memory sorted map for recent writes
- • SSTable: On-disk sorted key-value files
- • LSM Tree: Multi-level structure for write optimization
- • Cache: Hot data in memory for faster access
- • Bloom Filters: Probabilistic lookup structures
2. Persistence Layer
Responsibility: Ensure durability and crash recovery
- • Write-Ahead Log (WAL): Record all writes before applying
- • Checkpoint Mechanism: Periodic snapshots of state
- • Recovery Manager: Replay logs on startup
- • Compaction: Merge files and reclaim space
3. Concurrency Manager
Responsibility: Handle multiple concurrent operations safely
- • Lock Manager: Distribute locks, prevent deadlocks
- • MVCC Engine: Multiple versions for non-blocking reads
- • Transaction Coordinator: Manage transaction lifecycle
- • Conflict Detector: Identify concurrent modifications
Interface Design and API Contracts
Core Database Interface
// Store defines the key-value store interface
type Store interface {
// Single key operations
Get(ctx context.Context, key []byte) ([]byte, error)
Put(ctx context.Context, key, value []byte) error
Delete(ctx context.Context, key []byte) error
// Batch operations
Batch(ctx context.Context, ops []Operation) error
// Iteration
Scan(ctx context.Context, start, end []byte) (Iterator, error)
// Lifecycle
Close() error
// Durability
Flush() error
Snapshot() (Snapshot, error)
}
// Iterator for range queries
type Iterator interface {
Next() bool
Key() []byte
Value() []byte
Err() error
Close() error
} Project Structure and Organization
kvdb/
├── cmd/
│ ├── server/ # Main server executable
│ │ └── main.go
│ └── cli/ # Command-line client
│ └── main.go
├── pkg/
│ ├── storage/ # Storage engine
│ │ ├── store.go
│ │ ├── memtable.go
│ │ ├── sstable.go
│ │ └── cache.go
│ ├── persistence/ # WAL & recovery
│ │ ├── wal.go
│ │ └── recovery.go
│ ├── network/ # Network layer
│ │ ├── server.go
│ │ ├── client.go
│ │ └── protocol.go
│ ├── concurrency/ # Concurrency control
│ │ ├── lock.go
│ │ ├── mvcc.go
│ │ └── transaction.go
│ ├── monitor/ # Observability
│ │ ├── metrics.go
│ │ └── logger.go
│ └── config/ # Configuration
│ └── config.go
├── internal/
│ └── testutil/ # Test utilities
├── tests/
│ ├── integration_test.go
│ └── benchmark_test.go
├── docs/
│ ├── architecture.md
│ ├── api.md
│ └── operational_guide.md
├── go.mod
├── go.sum
├── Makefile
└── README.md Summary & Learning Outcomes
By completing this lesson, you should understand:
- • What key-value databases are and why they're different from relational databases
- • Key characteristics - strengths and limitations
- • Real-world use cases and when NOT to use them
- • Functional requirements - CRUD operations and extended features
- • Non-functional requirements - performance, durability, availability targets
- • Architecture overview - layered design with clear components
- • Interface contracts - how components communicate
- • Project structure - organized codebase layout
Key Takeaways
- • Key-value stores excel at simple, fast access by exact key match
- • Non-functional requirements (performance, durability) are as important as functional requirements
- • Clean architecture with separation of concerns enables testing and evolution
- • Trade-offs are inevitable - document decisions and their rationale
- • Start simple, add features incrementally
Next Steps
Next Week: We'll implement the core in-memory data structure and begin building our storage engine.
Preparation for Lab:
- • Review Go basics: maps, slices, goroutines, interfaces
- • Familiarize yourself with Go's testing framework
- • Understand benchmarking concepts
Hands-on Lab: Set up project repository and design initial API interface
Create the basic project structure and define the core interfaces for our key-value database.
Lab Tasks:
- 1. Create the project directory structure as outlined above
- 2. Initialize Go module with proper dependencies
- 3. Define the core Store interface in pkg/storage/store.go
- 4. Create basic unit tests for the interface
- 5. Set up Makefile with common commands (build, test, benchmark)