Name | Default | Description |
Job |
|
|
job.factory.class |
none |
Required: The job factory to use when running a task. This can be either samza.job.local.LocalJobFactory, or samza.job.yarn.YarnJobFactory. |
job.name |
none |
Required: The name of your job. This is the name that will appear on the Samza dashboard, when your job is running. |
job.id |
1 |
An ID string that is used to distinguish between multiple concurrent executions of the same Samza job. |
Task |
|
|
task.class |
none |
Required: The package name of the StreamTask to execute. For example, samza.task.example.MyStreamerTask. |
task.execute |
bin/run-container.sh |
The command that a StreamJob should invoke to start the TaskRunner. |
task.message.buffer.size |
10000 |
The number of messages that the TaskRunner will buffer in the event loop queue, before it begins blocking StreamConsumers. |
task.inputs |
none |
Required: A CSV list of stream names that the TaskRunner should use to read messages from for your StreamTasks (e.g. page-view-event,service-metrics). |
task.window.ms |
-1 |
How often the TaskRunner should call window() on a WindowableTask. A negative number tells the TaskRunner to never call window(). |
task.commit.ms |
60000 |
How often the TaskRunner should call writeCheckpoint for a partition. |
task.command.class |
samza.task.ShellCommandBuilder |
The class to use to build environment variables for the task.execute command. |
task.lifecycle.listeners |
none |
A CSV list of lifecycle listener names that the TaskRunner notify when lifecycle events occur (e.g. my-lifecycle-manager). |
task.lifecycle.listener.%s.class |
none |
The class name for a lifecycle listener factory (e.g. task.lifecycle.listener.my-lifecycle-manager.class=foo.bar.MyLifecycleManagerFactory) |
task.checkpoint.factory |
none |
The class name for the checkpoint manager to use (e.g. samza.task.state.KafkaCheckpointManagerFactory) |
task.checkpoint.failure.retry.ms |
10000 |
If readLastCheckpoint, or writeCheckpoint fails, the TaskRunner will wait this interval before retrying the checkpoint. |
task.opts |
none |
JVM options that should be attached to each JVM that is running StreamTasks. If you wish to reference the log directory from this parameter, use logs/. If you wish to reference code in the Samza job's TGZ package use __package/. |
System |
|
|
systems.%s.samza.consumer.factory |
none |
The StreamConsumerFactory class to use when creating a new StreamConsumer for this system (e.g. samza.stream.kafka.KafkaConsumerFactory). |
systems.%s.samza.producer.factory |
none |
The StreamProducerFactory class to use when creating a new StreamProducer for this system (e.g. samza.stream.kafka.KafkaProducerFactory). |
systems.%s.samza.partition.manager |
none |
The PartitionManager class to use when fetching partition information about streams for the system (e.g. samza.stream.kafka.KafkaPartitionManager). |
systems.%s.producer.reconnect.interval.ms |
10000 |
If a producer fails, the TaskRunner will wait this interval before retrying. |
systems.%s.* |
none |
For both Kafka and Databus, any configuration you supply under this namespace will be given to the underlying Kafka consumer/producer, and Databus consumer/producer. This is useful for configuring things like autooffset.reset, socket buffer size, fetch size, batch size, etc. |
Stream |
|
|
streams.%s.system |
none |
The name of the system associated with this stream (e.g. kafka-aggregate-tracking). This name must match with a system defined in the configuration file. |
streams.%s.stream |
none |
The name of the stream in the system (e.g. PageViewEvent). |
streams.%s.serde |
none |
The serde to use to serialize and deserialize messages for this stream. If undefined, the TaskRunner will try to fall back to the default serde, if it's defined. |
streams.%s.consumer.reset.offset |
false |
If set to true, the TaskRunner will ignore the last checkpoint offset for this stream, and use null as the offset for the stream instead. In the case of Kafka's consumer, it will fall back to autooffset.reset. In the case of Databus' consumer, it will fall back to SCN 0. |
streams.%s.consumer.failure.retry.ms |
10000 |
If a StreamConsumer fails, the TaskRunner will wait this interval before retrying. |
streams.%s.consumer.max.bytes.per.sec |
none |
The maximum number of bytes that the TaskRunner will allow from all partitions that it's reading for this stream. For example, if you have an input stream with two partitions, and 1 MB/sec max, then the maximum bytes the TaskRunner will read per second from all of the input stream's partitions is 1 MB/sec. |
streams.%s.producer.reconnect.interval.ms |
10000 |
If a StreamProducer fails, the TaskRunner will wait this interval before retrying. |
Serdes |
|
|
serializers.registry.%s.class |
none |
The name of a class that implements both SerializerFactory and DeserializerFactory (e.g. serializers.registry.json.class=samza.serializers.JsonSerdeFactory). |
serializers.default |
none |
The default serde to use, if one is not defined for an input or output stream (e.g. serializers.default=json). |
YARN |
|
|
yarn.package.path |
none |
The tgz location of your Samza job. This tgz file is well structured. See the YARN section for details. |
yarn.container.memory.mb |
1024 |
How much memory to ask for (per-container), when Samza is starting a YARN container. |
yarn.am.container.memory.mb |
1024 |
How much memory to ask for (per-application-master), when Samza is starting a YARN container. |
yarn.container.count |
1 |
How many containers to start when a Samza job is started in YARN. Partitions are divided evenly among the containers. |
yarn.am.opts |
none |
JVM options that should be attached to each JVM that is running the ApplicationMaster. If you wish to reference the log directory from this parameter, use logs/. If you wish to reference code in the Samza job's TGZ package use __package/. |
Metrics |
|
|
metrics.reporter.%s.class |
none |
The package and class for a metrics reporter (e.g. metrics.reporter.foo-bar.class=samza.metrics.reporter.MetricsSnapshotReporter). |
metrics.reporter.%s.window.ms |
10000 |
How often the TaskRunner tells the metrics reporter to send update or send its metrics. |
metrics.reporters |
none |
A CSV list of metric reporter names (e.g. metrics.reporters=foo-bar). |