Version: Next

Kafka Connect

When building data pipelines:
- Timeliness
- Reliability
Default port in distributed mode

Components

Connector
- Defines how data will be copied
- They perform the copy of the data using jobs by breaking the job into a set of Tasks
- Two types of connectors:
  - Source connector: push data to Kafka topic
  - Sink connector: pull data from kafka
- Is responsible for three things:
  - How many tasks to run for the connector
  - How to split data-copying between tasks
  - Getting configurations of tasks from the workers and pass it along
Tasks
- Responsible for getting data in and out of Kafka
- They are initialized by receiving a context from the connector (Source or Sink context)
- Task states are stored in special topics config.storage.topic and status.storage.topic and managed by the associated connector
Workers
- They are the container process that execute connectors and tasks
- Responsible for
  - Handle HTTP request and their configurations
  - Store connectors and tasks configurations
  - Start connectors and thier tasks and passing the appropriate configurations along
  - Commit offset for source and sink connectors
  - Handle retries when task fails
- When worker fails, tasks are rebalanced over active workers, but when tasks fail they are considered as an exception and no balance is triggered
- Two types:
  - Standalone Workers: single process is responsible for executing all tasks
  - Distributed Workers: starts many process using group.id
Converters: convert data from kafka to source system
- JSON converter: is part of Kafka
- Avro converter: provided by Confluent Schema Registry

Internal topics

connect-configs
connect-offset
connect-status
Tool
Connect version: curl http://localhost:8083/
Available connector pluging: curl http://localhost:8083/connector-plugins
All connectors: curl http://localhost:8083/connectors

Components​

Internal topics​

Components

Internal topics