Apache ZooKeeper - Ryan Lynch's Hub

# Overview ZooKeeper is a [[cooridination service]]. They are a thin layer built on top of distributed consensus systems to track relevant configuration metadata. This includes things like IP address for servers and databases, replication schema, partitioning breakdown, etc. At the surface, ZooKeeper can seem like a database (and in some ways it is). It persists data for much of its functionality. But, one of its key features is the ability to provide updates to services when data or configurations are updated. This influences its primary categorization to be a coordination service, rather than a database. ## Architecture Overview of ZooKeeper ### Read vs. Write Operations in Zookeeper #flashcard 1. **Read Operations**: Any server in the ensemble can serve read requests directly from its in-memory copy of the data. This provides high throughput and low latency for reads, which is why ZooKeeper excels at "read-dominant" workloads with ratios of around 10:1 reads to writes. 2. **Write Operations**: All write requests must go through the leader, which coordinates the update across the ensemble using the ZAB protocol. This centralized approach ensures consistent ordering but makes writes more expensive than reads.  ## Strengths - Optimized for read-heavy workloads ## Weaknesses of Zookeeper #flashcard - **Hot Spotting Issues**: When many clients watch the same ZNode (common in leader election or locks), servers can be overwhelmed with notification traffic. At scale, popular nodes become bottlenecks—imagine millions of users coming online simultaneously in our chat example. - **Performance Limitations**: ZooKeeper's consistency model makes writes expensive as they must propagate through the leader to a quorum. Its in-memory storage model limits data capacity—keep ZNodes under 1MB and ensure your dataset fits in memory. - **Operational Complexity**: ZooKeeper requires careful configuration of Java parameters, disk layouts, and ongoing monitoring of timeouts and connections. As one maintainer put it: "ZooKeeper is simple to use but complex to operate."  # Key Considerations ## Core Concepts ### Data Model in ZooKeeper #flashcard The data in ZooKeeper is organized into a hierarchical namespace that looks a lot like a file system, or a tree. Each node is a ZNode. ZNodes can store data (< 1MB) and have associated metadata. They are identified by a path, similar to filesystems (`app1/config`). - There are 3 main flavors to ZNodes: 1. **Persistent ZNodes** - exist until explicitly deleted. In a chat app use case, these nodes would be for configuration data like maximum message size or rate limiting parameters 2. **Ephemeral ZNodes** - automatically deleted then the session that created them end. Perfect for tracking which servers are alive and which users are online/ 3. **Sequential ZNodes** - automatically appended monotonoically increaing couneter to their name. In a chat app use case, could use this for orgering messages or implement distributed locks.  ![[2025-04-21_Apache ZooKeeper-1.png]] ### Server Roles #flashcard ZooKeeper is designed to run on a group of servers called an **ensemble**. Within the ensemble, a server may take on different roles: - **Leader** - responsible for processing all update requests - **Follower** - responsible for serving read requests, and any other instructions from the leader  ### Watches #flashcard A **watch** allows servers to be notified when a ZNode changes, eliminating the need for constant polling or complex server-to-server communication.  ## Core Use Cases ### ZooKeeper for Configuration Management #flashcard The watch functionality allows ZooKeepter to update all interested services when a configuration changes. Imagine you want to enable a new feature across your entire chat platform. With ZooKeeper, you can: 1. Update a single ZNode: `set /chat-app/config/enable_reactions "true"` 2. All chat servers watching this node receive a notification 3. Servers update their behavior without restarting If using a cloud environment, you should stick to using their native solutions, though (e.g., [[AWS Systems Manager Parameter Store]]).  ### ZooKeeper for Service Discovery #flashcard Allows for tracking when services go on or off-line. A new service can find all available services by reading the children in one of the paths.  ### ZooKeeper for Leader Election #flashcard ZooKeeper's sequential ZNodes make leader election straightforward: 1. Each server creates a sequential ephemeral node under a designated path 2. The server with the lowest sequence number becomes the leader 3. All other servers watch the node with the next lowest sequence number 4. If a leader fails, its node disappears, and the next server in sequence steps up  ### ZooKeeper for Distributed Locking #flashcard Once again, sequential ZNodes can be used for distributed locks: 1. Each client trying to acquire the lock creates a sequential ephemeral node under a lock path 2. All clients sort the nodes by sequence number 3. The client with the lowest sequence number holds the lock 4. Each client watches the node with the next lower sequence number 5. When a client releases the lock (or crashes), its ZNode is removed, and the next client is notified #### ZooKeeper vs. Redis for Distributed Locking Choose ZooKeeper locks instead when you need stronger consistency guarantees for critical operations where correctness trumps performance (like financial transactions). ZooKeeper is also preferable for long-lived locks (hours) where its automatic failure detection via ephemeral nodes provides more robust handling of server crashes than Redis locks which would require careful timeout management and heartbeat mechanisms.  # Implementation Details ## ZooKeeper Atomic Broadcast (ZAB) Protocol #flashcard ZAB is what allows all the distributed ZNodes to agree on state within ZooKeeper's distributed architecture. ZAB works in two main phases: 1. **Leader Election** - to elect a new leader, the servers compare their transaction history to see which is most up-to-date. If multiple servers have equivalent transaction histories, then the server with the highest ID will be preferred. 2. **Atomic Broadcast** - once a leader is elected, all write requests go to the leader. A write is only considered successful when a majority (quorum) of servers have persisted the change.  ## Durability #flashcard 1. **Transaction Log**: Every state change (transaction) is first written to a transaction log on persistent storage. This write-ahead logging ensures that no acknowledged update is ever lost, even if a server crashes immediately after confirming the update. 2. **Snapshots**: Periodically, ZooKeeper creates snapshots of its in-memory database to speed up recovery. When a server restarts, it loads the most recent snapshot and then replays transaction logs to recover the complete state.  # Useful Links # Related Topics - [[etcd]] - [[Consul]] ## Reference #### Working Notes #### Sources