Created:        2022-03-30 Wed
Last modified:  2022-05-04 Wed

Kafka: Key Points

Kafka 3.1 Documentation


  • Kafka as a distibuted system

    • Servers

      • Brokers

        • Storage layer

      • Kafka Connect

        • Import / Export data as event streams

    • Clients


  • Persistence

    • Linear writes performance

      • JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec

    • Using the filesystem and relying on pagecache

      • In order to avoid in-process cache duplicate (x2) and JVM object memory overhead (x2)

    • Data structure

      • Reads and appends to files

        • O(1)

      • Instead of BTree

        • O(log N) is slow if disk operations are involved

    • Cheap, low-rotational speed 1+ TB SATA dirves can be used

      • at 1/3 the price and 3x the capacity

  • Avoiding other causes of inefficiency

    • Too many small I/O opertations

      • Batching, "message set" abstraction, chunks

        • Orders of magnitude speed up

    • Excessive byte copying

      • The common data path for transfer of data from file to socket

        1. The operating system reads data from the disk into pagecache in kernel space

        2. The application reads the data from kernel space into a user-space buffer

        3. The application writes the data back into kernel space into a socket buffer

        4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network

      • Linux sendfile system call

        • Avoiding 2 and 3

        • On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache

    • Not CPU, but disk or network bandwidth bottleneck

      • For instnace, between data centers over a wide-area network

      • Compress a batch of messages

        • More efficient than client side compression of every message

      • Messages remain compressed in the log


  • Message Delivery Semantics

    • Producer

      • At most once

        • Send

      • At least once

        • Send - Resend

      • Exactly once

        • Idempotent delivery option

          • Resending will not result in duplicate entries

          • Broker deduplicates by using producer_id and sequence number

      • Durability level

    • Consumer

      • At most once

        • Commit - Consume

      • At least once

        • Consume - Commit

        • Can be fine if idempotentancy is appliable

      • Exactly once

        • Kafka topic is used for consuming and producing

          • Consume offset writing is part of transaction

          • Isolation levels

            • read_uncommitted

            • read_committed

        • Writing in extenal system

          • Options

            • Two-phase commit

            • Storing the offset and output in the same place


  • Handling failures

    • "In-sync" node

      • Definition

        1. Session with ZooKeeper (via hearbeat mechanism)

        2. Must replicate the writes happening on the leader and not fall "too far" behind

      • Stuck and lagging replicas

        • replica.lag.time.max.ms

      • Does not handle "Byzantine" failures

    • A message is considered committed when all in sync replicas for that partition have applied it to their log

      • Acknowledgment

    • The guarantee that Kafka offers is that a committed message will not be lost, as long as there is at least one in sync replica alive, at all times

      • fsync doesn't used for every message

        • Can reduce performance by two to three orders of magnitude

    • Availability

      • Success in case of node failures after a short fail-over period

      • Fail in case of network partitioning

    • All replicas died

      • Two options (tradeoff between consistency and availability)

        1. Wait for a replica in the ISR

        2. Choose the first replica (not necessarily in the ISR) that comes back

      • unclean.leader.election.enable

  • Durability and availability tradeoff

    • acks=all relies only on ISR

    • Two topic-level configuration

      • Disable unclean leader election

      • Specify a minimum ISR size