Kafka 基本操作之集群扩容


一. 前言

二. Kafka 集群扩容

2.1. 总览

2.2. 自动将数据迁移到新机器(Automatically migrating data to new machines)

2.3. 自定义分区分配和迁移(Custom partition assignment and migration)

一. 前言

    Kafka 集群的服务发现是由 ZooKeeper 实现的,因此 Kafka 集群想添加新的 Broker 就非常容易。我们只需要为新的 Broker 设置一个唯一的 broker.id, 然后启动新增的 Kafka 就行。Kafka 集群会自动发现新增的 Broker 并同步原数据,包括当前集群有哪些 topics 以及 topics 的分区信息等。

二. Kafka 集群扩容

2.1. 总览

原文引用:Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines.

    将服务器添加到 Kafka 集群很容易,只需为它们分配一个唯一的 broker id,并在新服务器上启动 Kafka。然而,这些新服务器不会自动分配任何数据分区,因此除非将分区移动到它们,否则在创建新 Topic 之前,它们不会做任何工作。因此,通常当您将机器添加到集群中时,您会希望将一些现有数据迁移到这些机器中。

原文引用:The process of migrating data is manually initiated but fully automated. Under the covers what happens is that Kafka will add the new server as a follower of the partition it is migrating and allow it to fully replicate the existing data in that partition. When the new server has fully replicated the contents of this partition and joined the in-sync replica one of the existing replicas will delete their partition's data. 

    迁移数据的过程是手动启动的,但执行过程是完全自动化的。在这种情况下,Kafka 将添加新服务器作为它正在迁移的分区的追随者,并允许它完全复制该分区中的现有数据。当新服务器完全复制了此分区的内容并加入同步副本时,其中一个现有副本将删除其分区的数据。

原文引用:The partition reassignment tool can be used to move partitions across brokers. An ideal partition distribution would ensure even data load and partition sizes across all brokers. The partition reassignment tool does not have the capability to automatically study the data distribution in a Kafka cluster and move partitions around to attain an even load distribution. As such, the admin has to figure out which topics or partitions should be moved around.

    分区重新分配工具可用于在 Broker 之间移动分区。理想的分区分布将确保所有 Broker 的数据负载和分区大小均匀。分区重新分配工具无法自动研究 Kafka 集群中的数据分布,也无法移动分区以实现均匀的负载分布。因此,管理员必须弄清楚应该移动哪些 Topic 或分区。

原文引用:The partition reassignment tool can run in 3 mutually exclusive modes:
--generate: In this mode, given a list of topics and a list of brokers, the tool generates a candidate reassignment to move all partitions of the specified topics to the new brokers. This option merely provides a convenient way to generate a partition reassignment plan given a list of topics and target brokers.
--execute: In this mode, the tool kicks off the reassignment of partitions based on the user provided reassignment plan. (using the --reassignment-json-file option). This can either be a custom reassignment plan hand crafted by the admin or provided by using the --generate option
--verify: In this mode, the tool verifies the status of the reassignment for all partitions listed during the last --execute. The status can be either of successfully completed, failed or in progress


  • --generate:在这种模式下,给定一个 Topic 列表和一个 Broker 列表,该工具生成一个候选重新分配,将指定 Topic 的所有分区移动到新的 Broker。此选项仅提供了一种方便的方法,可以在给定 Topic 和目标 Broker 列表的情况下生成分区重新分配计划。
  • --execute:在这种模式下,该工具根据用户提供的重新分配计划启动分区的重新分配。(使用 --reassignment-json-file 选项)。这可以是管理员手工编制的自定义重新分配计划,也可以使用 --generate 选项提供。
  • --verify:在这种模式下,该工具会验证上次执行过程中列出的所有分区的重新分配状态。状态可以是成功完成、失败或正在进行中的。

2.2. 自动将数据迁移到新机器(Automatically migrating data to new machines)

原文引用:The partition reassignment tool can be used to move some topics off of the current set of brokers to the newly added brokers. This is typically useful while expanding an existing cluster since it is easier to move entire topics to the new set of brokers, than moving one partition at a time. When used to do this, the user should provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. The tool then evenly distributes all partitions for the given list of topics across the new set of brokers. During this move, the replication factor of the topic is kept constant. Effectively the replicas for all partitions for the input list of topics are moved from the old set of brokers to the newly added brokers.

    分区重新分配工具可用于将一些 Topic 从当前 Broker 集中移到新添加的 Broker 中。这在扩展现有集群时非常有用,因为将整个 Topic 移动到新的 Broker 集比一次移动一个分区更容易。当用于执行此操作时,用户应提供移动到新 Broker 集的 Topic 列表和新 Broker 的目标列表。然后,该工具将给定 Topic 列表的所有分区均匀地分布在新的 Broker 集合中。在移动过程中,Topic 的复制因子保持不变。实际上,Topic 输入列表所有分区的副本都会从旧的 Broker 集中移动到新添加的 Broker 中。

原文引用:For instance, the following example will move all partitions for topics foo1,foo2 to the new set of brokers 5,6. At the end of this move, all partitions for topics foo1 and foo2 will only exist on brokers 5,6.

    例如,以下示例将把Topic foo1、foo2 的所有分区移动到新的 Broker 5、6集合中。移动结束后,Topic foo1 和 foo2 的所有分区将只存在于 Broker 5,6上。

注意:提示各位 Kafka 学习者,下面所有的 json 文件,都是要你自己新建的,不是自动创建的,需要你自己把生成的规则复制到你新建的 json 文件里,然后执行。

原文引用:Since the tool accepts the input list of topics as a json file, you first need to identify the topics you want to move and create the json file as follows:

    由于该工具将 Topic 的输入列表作为 json 文件接收,因此您首先需要确定要移动的 Topic,并创建 json 文件,如下所示:

> cat topics-to-move.json
  {"topics": [{"topic": "foo1"},
              {"topic": "foo2"}],

原文引用:Once the json file is ready, use the partition reassignment tool to generate a candidate assignment:

    json 文件准备好后,使用分区重新分配工具生成候选分配:

> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate
  Current partition replica assignment


  Proposed partition reassignment configuration


原文引用:The tool generates a candidate assignment that will move all partitions from topics foo1,foo2 to brokers 5,6. Note, however, that at this point, the partition movement has not started, it merely tells you the current assignment and the proposed new assignment. The current assignment should be saved in case you want to rollback to it. The new assignment should be saved in a json file (e.g. expand-cluster-reassignment.json) to be input to the tool with the --execute option as follows:

    该工具生成一个候选分配,将所有分区从Topic foo1、foo2 移动到 Broker 5、6。然而,请注意,在这一点上,分区移动还没有开始,它只是告诉您当前的分配和建议的新分配。如果您想回滚到当前分配,则应保存当前分配。新分配应保存在 json 文件中(例如,expand-cluster-revalment.json),以便使用 --execute 选项输入到工具中,如下所示:

> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --reassignment-json-file expand-cluster-reassignment.json --execute
  Current partition replica assignment


  Save this to use as the --reassignment-json-file option during rollback
  Successfully started partition reassignments for foo1-0,foo1-1,foo1-2,foo2-0,foo2-1,foo2-2

原文引用:Finally, the --verify option can be used with the tool to check the status of the partition reassignment. Note that the same expand-cluster-reassignment.json (used with the --execute option) should be used with the --verify option:

    最后,--verify 选项可以与该工具一起使用,以检查分区重新分配的状态。请注意,相同的expand-cluster-removement.json(与 --execute 选项一起使用)应与 --verify 选项一起使用:

> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --reassignment-json-file expand-cluster-reassignment.json --verify
  Status of partition reassignment:
  Reassignment of partition [foo1,0] is completed
  Reassignment of partition [foo1,1] is still in progress
  Reassignment of partition [foo1,2] is still in progress
  Reassignment of partition [foo2,0] is completed
  Reassignment of partition [foo2,1] is completed
  Reassignment of partition [foo2,2] is completed

2.3. 自定义分区分配和迁移(Custom partition assignment and migration)

原文引用:The partition reassignment tool can also be used to selectively move replicas of a partition to a specific set of brokers. When used in this manner, it is assumed that the user knows the reassignment plan and does not require the tool to generate a candidate reassignment, effectively skipping the --generate step and moving straight to the --execute step

    分区重新分配工具还可以用于选择性地将分区的副本移动到特定的 Broker 集。当以这种方式使用时,假设用户知道重新分配计划,并且不需要该工具来生成候选重新分配,从而有效地跳过 --generate 步骤,直接进入 --execute 步骤。

原文引用:For instance, the following example moves partition 0 of topic foo1 to brokers 5,6 and partition 1 of topic foo2 to brokers 2,3:

The first step is to hand craft the custom reassignment plan in a json file:

    例如,以下示例将 Topic foo1 的分区0移动到 Broker 5,6,并将 Topic foo2的分区1移动到 Broker 2,3:

    第一步是在 json 文件中手工制作自定义重新分配计划:

> cat custom-reassignment.json

原文引用:Then, use the json file with the --execute option to start the reassignment process:

    然后,使用带有 --execute 选项的 json 文件来启动重新分配过程:

> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --reassignment-json-file custom-reassignment.json --execute
  Current partition replica assignment


  Save this to use as the --reassignment-json-file option during rollback
  Successfully started partition reassignments for foo1-0,foo2-1

原文引用:The --verify option can be used with the tool to check the status of the partition reassignment. Note that the same custom-reassignment.json (used with the --execute option) should be used with the --verify option:

    --verify 选项可以与该工具一起使用,以检查分区重新分配的状态。请注意,应将相同的 custom-reassignment.json(与 --execute 选项一起使用)与 --verify 选项一起使用:

> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --reassignment-json-file custom-reassignment.json --verify
  Status of partition reassignment:
  Reassignment of partition [foo1,0] is completed
  Reassignment of partition [foo2,1] is completed




