Data Structure –Map

both Rust and C++ basically give you two big “map families”:

  • ordered map (tree-based): keys kept sorted, predictable iteration order
  • unordered map (hash-based): keys not sorted, usually faster average lookups

What an ordered map actually is

BTreeMap (Rust) and std::map (C++) are not arrays.

They are balanced trees (B-tree in Rust, red-black tree in C++).
Conceptually they look like this:

                50
           /           \
        20               80
      /    \          /      \
    10     30       60       90

Ordered maps/sets (tree-based)

  • Rust: BTreeMap<K,V>, BTreeSet<K>
  • C++: std::map, std::set (typically red-black tree)

Differences

  • Operations: O(log n) lookup/insert/remove
  • Keeps keys sorted
  • Supports range queries: “all keys between a..b”
  • Stable-ish iteration order (sorted)

Searching cost in Ordered Map, memory killer

Searching in Ordered Map is not a simple binary search on a sorted array.

Binary search is fast:

sorted array + random access → O(log n)

But a tree map is not an array.

In an array:

[10, 20, 30, 40, 50, 60, 70]
           ^
       jump directly to middle

You can instantly jump to index n/2.

A tree cannot.

A tree lives in scattered memory:

node(50) -> pointer -> node(80) -> pointer -> node(60)

The CPU cannot jump to “middle element”.
It must follow pointers step-by-step.

So even though the keys are ordered, the computer still has to walk the tree.

The real performance killer: memory access

Here is the part many programmers only learn much later:

For modern CPUs, memory access cost dominates complexity.

OperationRough cost
integer add1 cycle
hash computation~5–20 cycles
RAM access (pointer chasing)100–300 cycles

Tree maps do this repeatedly:

follow pointer → cache miss → wait
follow pointer → cache miss → wait
follow pointer → cache miss → wait

Hash maps instead:

compute hash
jump to contiguous array
maybe 1–2 comparisons

This is why in practice:

HashMap is often 5–20× faster than std::map/BTreeMap

Even though O(1) vs O(log n) sounds like a small difference mathematically.

Unordered maps/sets (hash-based)

  • Rust: HashMap<K,V>, HashSet<K>
  • C++: std::unordered_map, std::unordered_set

Differences

  • Average O(1) lookup/insert/remove
  • No ordering
  • Depends on hashing quality + load factor; worst-case can degrade

Choose when

  • You mostly do point lookups by key and don’t care about order
  • You want best average performance
Why a hash map is O(1)

A hash map does something completely different.

Instead of navigating structure, it computes a memory address.

Step 1 — hashing

We run a hash function:

hash("alice") → 1,874,923,123

Step 2 — bucket index

index = hash % table_size

Example:

table_size = 1024
index = 1,874,923,123 % 1024 = 371

Now we jump directly to bucket 371 in an array.

The size of the hashmap

Do you need to know the table size beforehand?

Good intuition — it looks like you should.
But no, you do not.

You can do:

let mut map = HashMap::new();

and insert millions of elements.

So what’s happening?


The hidden behavior: resizing (rehashing)

A hash map internally stores an array of buckets:

[ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ]

Each key goes to a bucket:

index = hash(key) % bucket_count

At the beginning:

bucket_count = small (like 8)

When it fills up → performance would degrade.

So the map automatically resizes.

What resize does

When load factor becomes too large (~70–80% full):

  1. allocate bigger table (usually 2×)
  2. recompute bucket for EVERY key
  3. move all elements

This is called:

rehashing

This is why hash map insertion is:

amortized O(1)

Because occasionally an insertion costs O(n) (when resize happens).


Then why do with_capacity() exist?

Because resizing is expensive.

If you know approximately:

"I'm going to insert 1 million items"

you should do:

Rust:

HashMap::with_capacity(1_000_000);

C++:

unordered_map.reserve(1'000'000);

Why it matters:

Without it:

8 → 16 → 32 → 64 → 128 → ... → 1,048,576

Each step rehashes everything.

You just paid multiple full O(n) copies.

In performance-critical systems (compilers, databases, zk provers), this matters a lot.

A very practical rule

Use:

HashMap
→ default choice

Switch to:

BTreeMap/std::map only if you need:

  • sorted iteration
  • ranges
  • deterministic output
  • next/previous key

Leave a Reply

Your email address will not be published. Required fields are marked *