Created: 2022-03-30 Wed Last modified: 2022-05-04 Wed
Kafka: Key Points¶
Kafka as a distibuted system
Servers
Brokers
Storage layer
Kafka Connect
Import / Export data as event streams
Clients
Persistence
Linear writes performance
JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec
Using the filesystem and relying on pagecache
In order to avoid in-process cache duplicate (x2) and JVM object memory overhead (x2)
Data structure
Reads and appends to files
O(1)
Instead of BTree
O(log N) is slow if disk operations are involved
Cheap, low-rotational speed 1+ TB SATA dirves can be used
at 1/3 the price and 3x the capacity
Avoiding other causes of inefficiency
Too many small I/O opertations
Batching, "message set" abstraction, chunks
Orders of magnitude speed up
Excessive byte copying
The common data path for transfer of data from file to socket
The operating system reads data from the disk into pagecache in kernel space
The application reads the data from kernel space into a user-space buffer
The application writes the data back into kernel space into a socket buffer
The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
Linux sendfile system call
Avoiding 2 and 3
On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache
Not CPU, but disk or network bandwidth bottleneck
For instnace, between data centers over a wide-area network
Compress a batch of messages
More efficient than client side compression of every message
Messages remain compressed in the log
Message Delivery Semantics
Producer
At most once
Send
At least once
Send - Resend
Exactly once
Idempotent delivery option
Resending will not result in duplicate entries
Broker deduplicates by using producer_id and sequence number
Durability level
Consumer
At most once
Commit - Consume
At least once
Consume - Commit
Can be fine if idempotentancy is appliable
Exactly once
Kafka topic is used for consuming and producing
Consume offset writing is part of transaction
Isolation levels
read_uncommitted
read_committed
Writing in extenal system
Options
Two-phase commit
Storing the offset and output in the same place
Handling failures
"In-sync" node
Definition
Session with ZooKeeper (via hearbeat mechanism)
Must replicate the writes happening on the leader and not fall "too far" behind
Stuck and lagging replicas
replica.lag.time.max.ms
Does not handle "Byzantine" failures
A message is considered committed when all in sync replicas for that partition have applied it to their log
Acknowledgment
The guarantee that Kafka offers is that a committed message will not be lost, as long as there is at least one in sync replica alive, at all times
fsync
doesn't used for every messageCan reduce performance by two to three orders of magnitude
Availability
Success in case of node failures after a short fail-over period
Fail in case of network partitioning
All replicas died
Two options (tradeoff between consistency and availability)
Wait for a replica in the ISR
Choose the first replica (not necessarily in the ISR) that comes back
unclean.leader.election.enable
Durability and availability tradeoff
acks=all
relies only on ISRTwo topic-level configuration
Disable unclean leader election
Specify a minimum ISR size