NameDefaultDescription
Job
job.factory.class none Required: The job factory to use when running a task. This can be either samza.job.local.LocalJobFactory, or samza.job.yarn.YarnJobFactory.
job.name none Required: The name of your job. This is the name that will appear on the Samza dashboard, when your job is running.
job.id 1 An ID string that is used to distinguish between multiple concurrent executions of the same Samza job.
Task
task.class none Required: The package name of the StreamTask to execute. For example, samza.task.example.MyStreamerTask.
task.execute bin/run-container.sh The command that a StreamJob should invoke to start the TaskRunner.
task.message.buffer.size 10000 The number of messages that the TaskRunner will buffer in the event loop queue, before it begins blocking StreamConsumers.
task.inputs none Required: A CSV list of stream names that the TaskRunner should use to read messages from for your StreamTasks (e.g. page-view-event,service-metrics).
task.window.ms -1 How often the TaskRunner should call window() on a WindowableTask. A negative number tells the TaskRunner to never call window().
task.commit.ms 60000 How often the TaskRunner should call writeCheckpoint for a partition.
task.command.class samza.task.ShellCommandBuilder The class to use to build environment variables for the task.execute command.
task.lifecycle.listeners none A CSV list of lifecycle listener names that the TaskRunner notify when lifecycle events occur (e.g. my-lifecycle-manager).
task.lifecycle.listener.%s.class none The class name for a lifecycle listener factory (e.g. task.lifecycle.listener.my-lifecycle-manager.class=foo.bar.MyLifecycleManagerFactory)
task.checkpoint.factory none The class name for the checkpoint manager to use (e.g. samza.task.state.KafkaCheckpointManagerFactory)
task.checkpoint.failure.retry.ms 10000 If readLastCheckpoint, or writeCheckpoint fails, the TaskRunner will wait this interval before retrying the checkpoint.
task.opts none JVM options that should be attached to each JVM that is running StreamTasks. If you wish to reference the log directory from this parameter, use logs/. If you wish to reference code in the Samza job's TGZ package use __package/.
System
systems.%s.samza.consumer.factory none The StreamConsumerFactory class to use when creating a new StreamConsumer for this system (e.g. samza.stream.kafka.KafkaConsumerFactory).
systems.%s.samza.producer.factory none The StreamProducerFactory class to use when creating a new StreamProducer for this system (e.g. samza.stream.kafka.KafkaProducerFactory).
systems.%s.samza.partition.manager none The PartitionManager class to use when fetching partition information about streams for the system (e.g. samza.stream.kafka.KafkaPartitionManager).
systems.%s.producer.reconnect.interval.ms 10000 If a producer fails, the TaskRunner will wait this interval before retrying.
systems.%s.* none For both Kafka and Databus, any configuration you supply under this namespace will be given to the underlying Kafka consumer/producer, and Databus consumer/producer. This is useful for configuring things like autooffset.reset, socket buffer size, fetch size, batch size, etc.
Stream
streams.%s.system none The name of the system associated with this stream (e.g. kafka-aggregate-tracking). This name must match with a system defined in the configuration file.
streams.%s.stream none The name of the stream in the system (e.g. PageViewEvent).
streams.%s.serde none The serde to use to serialize and deserialize messages for this stream. If undefined, the TaskRunner will try to fall back to the default serde, if it's defined.
streams.%s.consumer.reset.offset false If set to true, the TaskRunner will ignore the last checkpoint offset for this stream, and use null as the offset for the stream instead. In the case of Kafka's consumer, it will fall back to autooffset.reset. In the case of Databus' consumer, it will fall back to SCN 0.
streams.%s.consumer.failure.retry.ms 10000 If a StreamConsumer fails, the TaskRunner will wait this interval before retrying.
streams.%s.consumer.max.bytes.per.sec none The maximum number of bytes that the TaskRunner will allow from all partitions that it's reading for this stream. For example, if you have an input stream with two partitions, and 1 MB/sec max, then the maximum bytes the TaskRunner will read per second from all of the input stream's partitions is 1 MB/sec.
streams.%s.producer.reconnect.interval.ms 10000 If a StreamProducer fails, the TaskRunner will wait this interval before retrying.
Serdes
serializers.registry.%s.class none The name of a class that implements both SerializerFactory and DeserializerFactory (e.g. serializers.registry.json.class=samza.serializers.JsonSerdeFactory).
serializers.default none The default serde to use, if one is not defined for an input or output stream (e.g. serializers.default=json).
YARN
yarn.package.path none The tgz location of your Samza job. This tgz file is well structured. See the YARN section for details.
yarn.container.memory.mb 1024 How much memory to ask for (per-container), when Samza is starting a YARN container.
yarn.am.container.memory.mb 1024 How much memory to ask for (per-application-master), when Samza is starting a YARN container.
yarn.container.count 1 How many containers to start when a Samza job is started in YARN. Partitions are divided evenly among the containers.
yarn.am.opts none JVM options that should be attached to each JVM that is running the ApplicationMaster. If you wish to reference the log directory from this parameter, use logs/. If you wish to reference code in the Samza job's TGZ package use __package/.
Metrics
metrics.reporter.%s.class none The package and class for a metrics reporter (e.g. metrics.reporter.foo-bar.class=samza.metrics.reporter.MetricsSnapshotReporter).
metrics.reporter.%s.window.ms 10000 How often the TaskRunner tells the metrics reporter to send update or send its metrics.
metrics.reporters none A CSV list of metric reporter names (e.g. metrics.reporters=foo-bar).