Course Navigation
← Back to Course OverviewAll Lessons
Building the Core Data Structure
Learning Objectives
- • Choose the right data structure for your key-value store
- • Implement HashMap with proper hash functions
- • Handle value serialization strategies
- • Build comprehensive testing frameworks
- • Master Go's testing and benchmarking tools
Lesson 2.1: HashMap Implementation
Choosing the Right Data Structure
Before we implement, let's understand the options and trade-offs for our core in-memory key-value store.
Option 1: Go's Native Map (map[string]interface{})
Characteristics:
- • Built-in hash table implementation
- • O(1) average case lookup
- • Not thread-safe (race conditions with concurrent access)
- • Keys are unordered (iteration order randomized)
- • Fast for basic use cases
When to use:
- • Single-threaded applications
- • Prototyping and testing
- • When simplicity matters more than concurrency
Option 2: Custom HashMap with Sharding
Characteristics:
- • Divide keys into N shards (sub-maps)
- • Each shard has its own lock
- • Multiple goroutines can operate on different shards concurrently
- • More complex but much better concurrency
- • Still unordered
Performance improvement:
- • Single lock: throughput = 100K ops/sec (8 cores bottlenecked)
- • 16 shards: throughput = 1.2M ops/sec (8 cores utilized)
When to use:
- • Multi-threaded applications
- • High concurrency, many readers/writers
- • Good balance of simplicity and performance
Option 3: Skip List
Characteristics:
- • Probabilistic data structure maintaining sorted order
- • O(log n) lookup, insert, delete
- • Enables range queries efficiently
- • Can be made lock-free with careful design
- • More memory overhead than hash tables (~33%)
When to use:
- • Need sorted iteration (range queries)
- • Building LSM tree memtable
- • Distributed systems (Bigtable, HBase use this)
Option 4: B-Tree or B+ Tree
Characteristics:
- • Balanced tree structure
- • O(log n) operations
- • High degree (many children per node)
- • Excellent disk I/O patterns
- • More complex implementation
When to use:
- • Disk-based storage (SSTables)
- • Very large datasets
- • Need to minimize disk seeks
Decision: Sharded HashMap for MVP
For Week 2 (MVP): We'll implement a sharded hash map because:
- ✓ Good balance of simplicity and performance
- ✓ Thread-safe for concurrent operations
- ✓ Sufficient for most use cases
- ✓ Foundation for learning concurrency patterns
Later (Week 5): We'll add sorted keys via LSM tree when we add persistence.
Go's Native Map vs Custom Implementation
Go's Map Under the Hood
type hmap struct {
count int // number of elements
flags uint8 // flags tracking rehashing state
B uint8 // log_2 of buckets array size
noverflow uint16 // overflow buckets count
hash0 uint32 // seed hash
buckets unsafe.Pointer // bucket array
oldbuckets unsafe.Pointer // previous bucket array (during resize)
nevacuate uintptr // progress of old bucket evacuation
extra *mapextra // overflow buckets, if any
}
type bmap struct {
tophash [8]uint8 // hash values for 8 entries
keys [8]keytype // keys
values [8]valuetype // values
pad uintptr
overflow *bmap // overflow bucket for collision chain
} Key Points:
- • Bucket size: 8 entries per bucket (cache line optimization)
- • Collision handling: overflow buckets linked as a chain
- • Growth: Doubles bucket count when load factor > 6.5/8 = ~0.8
- • Rehashing: Incremental during operations, not all at once
Why We Can't Just Use map[string]interface{}
Problem: Not thread-safe
// Problem: Not thread-safe
m := make(map[string]interface{})
go func() {
m["key1"] = "value1" // goroutine 1 writes
}()
go func() {
_ = m["key1"] // goroutine 2 reads
}()
// Result: RACE CONDITION - panics or corrupts data Thread Safety Considerations
What Makes Maps Unsafe?
During a resize operation, the map is internally inconsistent. Two concurrent operations can see different states:
Goroutine A: Reading key="foo" during resize
1. Compute hash(foo) = 0x123abc
2. Look in old bucket[3]
3. Find value = "bar"
4. Return "bar"
Goroutine B: Writing key="baz" = "qux" during resize
1. Acquire exclusive access (panic or data corruption)
2. Modify bucket array
3. Element A sees inconsistent state Solution: Synchronization
Option 1: Coarse-grained lock (simple, slower)
type Map struct {
mu sync.RWMutex
data map[string]interface{}
}
func (m *Map) Get(key string) interface{} {
m.mu.RLock()
defer m.mu.RUnlock()
return m.data[key]
} Option 2: Fine-grained locking (complex, faster)
type ShardedMap struct {
shards []*mapShard
}
type mapShard struct {
mu sync.RWMutex
data map[string]interface{}
} Option 3: sync.Map (Go 1.9+, specific use case)
var m sync.Map
m.Store("key", "value")
val, _ := m.Load("key") When to use each:
- • Coarse-grained: Simple, fine for modest concurrency
- • Sharded: High concurrency, many goroutines
- • sync.Map: Write-rarely, read-heavy workloads
Lesson 2.2: Basic Operations (Get, Put, Delete)
Architecture of Our Sharded HashMap
┌─────────────────────────────────┐
│ ShardedMap (thread-safe) │
├─────────────────────────────────┤
│ shards [16] │
└──┬──────────┬────────┬─────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Shard 0 │ │Shard 1 │...│Shard15│
├────────┤ ├────────┤ ├────────┤
│Lock │ │Lock │ │Lock │
│HashMap │ │HashMap │ │HashMap │
└────────┘ └────────┘ └────────┘ Implementing Get Operation
// Full implementation with documentation
type ShardedMap struct {
shards []*mapShard
}
type mapShard struct {
mu sync.RWMutex
data map[string]interface{}
}
func NewShardedMap(shardCount int) *ShardedMap {
shards := make([]*mapShard, shardCount)
for i := 0; i < shardCount; i++ {
shards[i] = &mapShard{
data: make(map[string]interface{}),
}
}
return &ShardedMap{shards: shards}
}
// Get retrieves a value by key
// Returns (value, exists)
// Time complexity: O(1) average case
// Thread-safe: Multiple readers allowed simultaneously
func (m *ShardedMap) Get(key string) (interface{}, bool) {
// Step 1: Determine which shard owns this key
shard := m.getShard(key)
// Step 2: Acquire read lock (allows multiple concurrent readers)
shard.mu.RLock()
defer shard.mu.RUnlock()
// Step 3: Look up value in shard's map
val, exists := shard.data[key]
return val, exists
}
// getShard returns the shard responsible for a given key
// Uses hash function to distribute keys uniformly
func (m *ShardedMap) getShard(key string) *mapShard {
// Hash the key to get a number
h := fnv.New64a()
h.Write([]byte(key))
// Map hash to shard index using modulo
shardIndex := int(h.Sum64() % uint64(len(m.shards)))
return m.shards[shardIndex]
} Implementing Put Operation
// Put stores a value at the given key
// If key exists, overwrites the value
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during write
func (m *ShardedMap) Put(key string, value interface{}) error {
// Input validation
if key == "" {
return errors.New("key cannot be empty")
}
if value == nil {
return errors.New("value cannot be nil")
}
// Step 1: Get the appropriate shard
shard := m.getShard(key)
// Step 2: Acquire write lock (exclusive access)
shard.mu.Lock()
defer shard.mu.Unlock()
// Step 3: Store in map
shard.data[key] = value
return nil
}
// Thread safety notes:
// - Lock is held for entire operation
// - Other goroutines cannot access this shard during Put
// - Readers on OTHER shards still proceed normally (parallelism) Implementing Delete Operation
// Delete removes a key and its value
// Returns true if key existed, false otherwise
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during deletion
func (m *ShardedMap) Delete(key string) bool {
shard := m.getShard(key)
shard.mu.Lock()
defer shard.mu.Unlock()
_, existed := shard.data[key]
delete(shard.data, key)
return existed
} Handling Key Collisions
Hash Collision: Two different keys hash to the same value
Example: Hash function: hash(key) = len(key) % 16
hash("a") = 1
hash("bcd") = 3
hash("e") = 1 <- COLLISION with "a"! How Go's Map Handles Collisions:
- 1. Bucket array has 16 buckets (for example)
- 2. Each bucket stores up to 8 entries inline
- 3. Hash collision → same bucket
- 4. If bucket full, create overflow bucket (linked list)
Our Sharded Map:
- • Primary collision handling: Shard hash (% number of shards)
- • Secondary collision handling: Go's map internal chaining
Value Serialization Strategies
Our in-memory store stores interface{} which can be any Go type. But what about networking and persistence?
Option 1: JSON Serialization
type JSONCodec struct{}
func (c JSONCodec) Encode(v interface{}) ([]byte, error) {
return json.Marshal(v)
}
func (c JSONCodec) Decode(data []byte, v interface{}) error {
return json.Unmarshal(data, v)
}
// Usage:
codec := JSONCodec{}
data, _ := codec.Encode("hello") // => []byte(`"hello"`)
var s string
codec.Decode(data, &s) // => s = "hello" Pros:
- • Human-readable
- • Widely supported
Cons:
- • Verbose
- • Slow
- • Type information lost
Option 2: Protocol Buffers
protobuf
syntax = "proto3";
message Value {
oneof data {
string string_value = 1;
int64 int_value = 2;
bytes bytes_value = 3;
}
} Pros:
- • Compact
- • Fast
- • Schema versioning
Cons:
- • Requires code generation
- • Steeper learning curve
Option 3: Custom Binary Encoding
// Simple binary format for our use case
type BinaryCodec struct{}
// Format: [type_byte][length_varint][data...]
func (c BinaryCodec) Encode(v interface{}) ([]byte, error) {
buf := bytes.NewBuffer(nil)
switch val := v.(type) {
case string:
buf.WriteByte(1) // type: string
binary.Write(buf, binary.BigEndian, int32(len(val)))
buf.WriteString(val)
case int64:
buf.WriteByte(2) // type: int64
binary.Write(buf, binary.BigEndian, val)
case []byte:
buf.WriteByte(3) // type: bytes
binary.Write(buf, binary.BigEndian, int32(len(val)))
buf.Write(val)
default:
return nil, errors.New("unsupported type")
}
return buf.Bytes(), nil
} Pros:
- • Minimal overhead
- • Fast
- • Control over format
Cons:
- • Manual implementation
- • Limited type support
Decision for our course: Start with storing byte slices ([]byte)
This simplifies serialization and forces explicit encoding decisions.
Lesson 2.3: Testing Strategy
Unit Testing Fundamentals
Test Structure in Go - Test functions must start with "Test"
// Filename: store_test.go (same directory as store.go)
// Test functions must start with "Test"
func TestStoreGet(t *testing.T) {
// Arrange: Set up test fixtures
store := NewShardedStore(4)
store.Put(context.Background(), []byte("key1"), []byte("value1"))
// Act: Execute the code under test
result, err := store.Get(context.Background(), []byte("key1"))
// Assert: Verify the result
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(result) != "value1" {
t.Errorf("got %q, want %q", string(result), "value1")
}
} Subtests for Organization
func TestStore(t *testing.T) {
store := NewShardedStore(4)
t.Run("Put and Get", func(t *testing.T) {
store.Put(context.Background(), []byte("k"), []byte("v"))
v, _ := store.Get(context.Background(), []byte("k"))
if string(v) != "v" {
t.Errorf("got %q, want %q", string(v), "v")
}
})
t.Run("Get missing key", func(t *testing.T) {
_, err := store.Get(context.Background(), []byte("missing"))
if err != ErrKeyNotFound {
t.Errorf("got %v, want ErrKeyNotFound", err)
}
})
t.Run("Delete", func(t *testing.T) {
store.Put(context.Background(), []byte("k2"), []byte("v2"))
store.Delete(context.Background(), []byte("k2"))
_, err := store.Get(context.Background(), []byte("k2"))
if err != ErrKeyNotFound {
t.Errorf("expected key to be deleted")
}
})
} Table-Driven Tests
Table-driven tests reduce duplication and make it easy to add more test cases.
func TestStoreOperations(t *testing.T) {
tests := []struct {
name string
key []byte
value []byte
setupFn func(*ShardedStore)
wantValue []byte
wantErr bool
}{
{
name: "put and get string",
key: []byte("user:1"),
value: []byte("alice"),
wantValue: []byte("alice"),
},
{
name: "put and get binary",
key: []byte("binary:1"),
value: []byte{0x00, 0x01, 0x02},
wantValue: []byte{0x00, 0x01, 0x02},
},
{
name: "get nonexistent key",
key: []byte("missing"),
wantErr: true,
},
{
name: "overwrite value",
key: []byte("counter"),
value: []byte("2"),
setupFn: func(s *ShardedStore) {
s.Put(context.Background(), []byte("counter"), []byte("1"))
},
wantValue: []byte("2"),
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
store := NewShardedStore(4)
if tt.setupFn != nil {
tt.setupFn(store)
}
store.Put(context.Background(), tt.key, tt.value)
got, err := store.Get(context.Background(), tt.key)
if (err != nil) != tt.wantErr {
t.Errorf("error = %v, wantErr %v", err, tt.wantErr)
}
if !tt.wantErr && !bytes.Equal(got, tt.wantValue) {
t.Errorf("got %v, want %v", got, tt.wantValue)
}
})
}
} Benchmarking Basics
Benchmarks measure performance characteristics.
func BenchmarkGet(b *testing.B) {
store := NewShardedStore(16)
store.Put(context.Background(), []byte("key"), []byte("value"))
b.ResetTimer() // Don't count setup time
for i := 0; i < b.N; i++ {
store.Get(context.Background(), []byte("key"))
}
}
// Run: go test -bench=. -benchmem
// Output: BenchmarkGet-8 5000000 256 ns/op 64 B/alloc
// ^timing ^memory Interpreting results:
- • 5000000 iterations
- • 256 ns/op - nanoseconds per operation
- • 64 B/alloc - bytes allocated per operation
Property-Based Testing Introduction
Property-based testing generates random inputs and verifies invariants.
func TestStoreProperties(t *testing.T) {
// Property: After Put, Get returns same value
for trial := 0; trial < 100; trial++ {
store := NewShardedStore(4)
key := []byte(randomString(10))
value := []byte(randomString(100))
store.Put(context.Background(), key, value)
got, _ := store.Get(context.Background(), key)
if !bytes.Equal(got, value) {
t.Errorf("trial %d: put-get invariant failed", trial)
}
}
}
func TestConcurrentSafety(t *testing.T) {
// Property: Concurrent operations don't cause data corruption
store := NewShardedStore(16)
done := make(chan bool)
// Writers
for w := 0; w < 10; w++ {
go func(writerID int) {
for i := 0; i < 1000; i++ {
key := fmt.Sprintf("key:%d:%d", writerID, i)
store.Put(context.Background(), []byte(key), []byte("value"))
}
done <- true
}(w)
}
// Readers
for r := 0; r < 10; r++ {
go func() {
for i := 0; i < 1000; i++ {
for w := 0; w < 10; w++ {
key := fmt.Sprintf("key:%d:%d", w, i)
store.Get(context.Background(), []byte(key))
}
}
done <- true
}()
}
// Wait for all goroutines
for i := 0; i < 20; i++ {
<-done
}
// If we get here without panic or corruption, test passes
t.Log("concurrent safety property verified")
} Hands-on Lab: Implement thread-unsafe in-memory store
Build a basic key-value store without concurrency concerns.
Lab Tasks:
- 1. Implement Get method with proper validation
- 2. Implement Put method with value copying
- 3. Implement Delete method
- 4. Write comprehensive tests (basic operations, edge cases)
- 5. Add benchmarks for Get, Put, Delete operations
- 6. Test with binary data and various key types
- 7. Achieve >90% code coverage
Summary & Learning Outcomes
What You've Learned:
- • Data structure trade-offs and when to use each
- • Go's native map implementation details
- • Thread safety considerations and solutions
- • Value serialization strategies
- • Comprehensive testing techniques
- • Benchmarking and performance measurement
Key Takeaways
- • Choose the right data structure for your use case
- • Always consider thread safety in concurrent environments
- • Test-driven development leads to better code quality
- • Benchmark early and often to catch performance issues
- • Property-based testing helps find edge cases