camel-file-kafka-connector source configuration

Connector Description: Read and write files.

When using camel-file-kafka-connector as source make sure to use the following Maven dependency to have support for the connector:

<dependency>
  <groupId>org.apache.camel.kafkaconnector</groupId>
  <artifactId>camel-file-kafka-connector</artifactId>
  <version>x.x.x</version>
  <!-- use the same version as your Camel Kafka connector version -->
</dependency>

To use this source connector in Kafka connect you’ll need to set the following connector.class

connector.class=org.apache.camel.kafkaconnector.file.CamelFileSourceConnector

The camel-file source connector supports 81 options, which are listed below.

Name Description Default Priority

camel.source.path.directoryName

Required The starting directory.

HIGH

camel.source.endpoint.charset

This option is used to specify the encoding of the file. You can use this on the consumer, to specify the encodings of the files, which allow Camel to know the charset it should load the file content in case the file content is being accessed. Likewise when writing a file, you can use this option to specify which charset to write the file as well. Do mind that when writing the file Camel may have to read the message content into memory to be able to convert the data into the configured charset, so do not use this if you have big messages.

MEDIUM

camel.source.endpoint.doneFileName

Producer: If provided, then Camel will write a 2nd done file when the original file has been written. The done file will be empty. This option configures what file name to use. Either you can specify a fixed name. Or you can use dynamic placeholders. The done file will always be written in the same folder as the original file. Consumer: If provided, Camel will only consume files if a done file exists. This option configures what file name to use. Either you can specify a fixed name. Or you can use dynamic placeholders.The done file is always expected in the same folder as the original file. Only $\{file.name} and $\{file.name.next} is supported as dynamic placeholders.

MEDIUM

camel.source.endpoint.fileName

Use Expression such as File Language to dynamically set the filename. For consumers, it’s used as a filename filter. For producers, it’s used to evaluate the filename to write. If an expression is set, it take precedence over the CamelFileName header. (Note: The header itself can also be an Expression). The expression options support both String and Expression types. If the expression is a String type, it is always evaluated using the File Language. If the expression is an Expression type, the specified Expression type is used - this allows you, for instance, to use OGNL expressions. For the consumer, you can use it to filter filenames, so you can for instance consume today’s file using the File Language syntax: mydata-$\{date:now:yyyyMMdd}.txt. The producers support the CamelOverruleFileName header which takes precedence over any existing CamelFileName header; the CamelOverruleFileName is a header that is used only once, and makes it easier as this avoids to temporary store CamelFileName and have to restore it afterwards.

MEDIUM

camel.source.endpoint.bridgeErrorHandler

Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions occurred while the consumer is trying to pickup incoming messages, or the likes, will now be processed as a message and handled by the routing Error Handler. By default the consumer will use the org.apache.camel.spi.ExceptionHandler to deal with exceptions, that will be logged at WARN or ERROR level and ignored.

false

MEDIUM

camel.source.endpoint.delete

If true, the file will be deleted after it is processed successfully.

false

MEDIUM

camel.source.endpoint.moveFailed

Sets the move failure expression based on Simple language. For example, to move files into a .error subdirectory use: .error. Note: When moving the files to the fail location Camel will handle the error and will not pick up the file again.

MEDIUM

camel.source.endpoint.noop

If true, the file is not moved or deleted in any way. This option is good for readonly data, or for ETL type requirements. If noop=true, Camel will set idempotent=true as well, to avoid consuming the same files over and over again.

false

MEDIUM

camel.source.endpoint.preMove

Expression (such as File Language) used to dynamically set the filename when moving it before processing. For example to move in-progress files into the order directory set this value to order.

MEDIUM

camel.source.endpoint.preSort

When pre-sort is enabled then the consumer will sort the file and directory names during polling, that was retrieved from the file system. You may want to do this in case you need to operate on the files in a sorted order. The pre-sort is executed before the consumer starts to filter, and accept files to process by Camel. This option is default=false meaning disabled.

false

MEDIUM

camel.source.endpoint.recursive

If a directory, will look for files in all the sub-directories as well.

false

MEDIUM

camel.source.endpoint.sendEmptyMessageWhenIdle

If the polling consumer did not poll any files, you can enable this option to send an empty message (no body) instead.

false

MEDIUM

camel.source.endpoint.directoryMustExist

Similar to the startingDirectoryMustExist option but this applies during polling (after starting the consumer).

false

MEDIUM

camel.source.endpoint.exceptionHandler

To let the consumer use a custom ExceptionHandler. Notice if the option bridgeErrorHandler is enabled then this option is not in use. By default the consumer will deal with exceptions, that will be logged at WARN or ERROR level and ignored.

MEDIUM

camel.source.endpoint.exchangePattern

Sets the exchange pattern when the consumer creates an exchange. One of: [InOnly] [InOut] [InOptionalOut].

Enum values:

  • InOnly

  • InOut

  • InOptionalOut

MEDIUM

camel.source.endpoint.extendedAttributes

To define which file attributes of interest. Like posix:permissions,posix:owner,basic:lastAccessTime, it supports basic wildcard like posix:, basic:lastAccessTime.

MEDIUM

camel.source.endpoint.inProgressRepository

A pluggable in-progress repository org.apache.camel.spi.IdempotentRepository. The in-progress repository is used to account the current in progress files being consumed. By default a memory based repository is used.

MEDIUM

camel.source.endpoint.localWorkDirectory

When consuming, a local work directory can be used to store the remote file content directly in local files, to avoid loading the content into memory. This is beneficial, if you consume a very big remote file and thus can conserve memory.

MEDIUM

camel.source.endpoint.onCompletionExceptionHandler

To use a custom org.apache.camel.spi.ExceptionHandler to handle any thrown exceptions that happens during the file on completion process where the consumer does either a commit or rollback. The default implementation will log any exception at WARN level and ignore.

MEDIUM

camel.source.endpoint.pollStrategy

A pluggable org.apache.camel.PollingConsumerPollingStrategy allowing you to provide your custom implementation to control error handling usually occurred during the poll operation before an Exchange have been created and being routed in Camel.

MEDIUM

camel.source.endpoint.probeContentType

Whether to enable probing of the content type. If enable then the consumer uses Files#probeContentType(java.nio.file.Path) to determine the content-type of the file, and store that as a header with key Exchange#FILE_CONTENT_TYPE on the Message.

false

MEDIUM

camel.source.endpoint.processStrategy

A pluggable org.apache.camel.component.file.GenericFileProcessStrategy allowing you to implement your own readLock option or similar. Can also be used when special conditions must be met before a file can be consumed, such as a special ready file exists. If this option is set then the readLock option does not apply.

MEDIUM

camel.source.endpoint.startingDirectoryMustExist

Whether the starting directory must exist. Mind that the autoCreate option is default enabled, which means the starting directory is normally auto created if it doesn’t exist. You can disable autoCreate and enable this to ensure the starting directory must exist. Will thrown an exception if the directory doesn’t exist.

false

MEDIUM

camel.source.endpoint.startingDirectoryMustHaveAccess

Whether the starting directory has access permissions. Mind that the startingDirectoryMustExist parameter must be set to true in order to verify that the directory exists. Will thrown an exception if the directory doesn’t have read and write permissions.

false

MEDIUM

camel.source.endpoint.autoCreate

Automatically create missing directories in the file’s pathname. For the file consumer, that means creating the starting directory. For the file producer, it means the directory the files should be written to.

true

MEDIUM

camel.source.endpoint.bufferSize

Buffer size in bytes used for writing files (or in case of FTP for downloading and uploading files).

131072

MEDIUM

camel.source.endpoint.copyAndDeleteOnRenameFail

Whether to fallback and do a copy and delete file, in case the file could not be renamed directly. This option is not available for the FTP component.

true

MEDIUM

camel.source.endpoint.renameUsingCopy

Perform rename operations using a copy and delete strategy. This is primarily used in environments where the regular rename operation is unreliable (e.g. across different file systems or networks). This option takes precedence over the copyAndDeleteOnRenameFail parameter that will automatically fall back to the copy and delete strategy, but only after additional delays.

false

MEDIUM

camel.source.endpoint.synchronous

Sets whether synchronous processing should be strictly used.

false

MEDIUM

camel.source.endpoint.antExclude

Ant style filter exclusion. If both antInclude and antExclude are used, antExclude takes precedence over antInclude. Multiple exclusions may be specified in comma-delimited format.

MEDIUM

camel.source.endpoint.antFilterCaseSensitive

Sets case sensitive flag on ant filter.

true

MEDIUM

camel.source.endpoint.antInclude

Ant style filter inclusion. Multiple inclusions may be specified in comma-delimited format.

MEDIUM

camel.source.endpoint.eagerMaxMessagesPerPoll

Allows for controlling whether the limit from maxMessagesPerPoll is eager or not. If eager then the limit is during the scanning of files. Where as false would scan all files, and then perform sorting. Setting this option to false allows for sorting all files first, and then limit the poll. Mind that this requires a higher memory usage as all file details are in memory to perform the sorting.

true

MEDIUM

camel.source.endpoint.exclude

Is used to exclude files, if filename matches the regex pattern (matching is case in-sensitive). Notice if you use symbols such as plus sign and others you would need to configure this using the RAW() syntax if configuring this as an endpoint uri. See more details at configuring endpoint uris.

MEDIUM

camel.source.endpoint.excludeExt

Is used to exclude files matching file extension name (case insensitive). For example to exclude bak files, then use excludeExt=bak. Multiple extensions can be separated by comma, for example to exclude bak and dat files, use excludeExt=bak,dat. Note that the file extension includes all parts, for example having a file named mydata.tar.gz will have extension as tar.gz. For more flexibility then use the include/exclude options.

MEDIUM

camel.source.endpoint.filter

Pluggable filter as a org.apache.camel.component.file.GenericFileFilter class. Will skip files if filter returns false in its accept() method.

MEDIUM

camel.source.endpoint.filterDirectory

Filters the directory based on Simple language. For example to filter on current date, you can use a simple date pattern such as $\{date:now:yyyMMdd}.

MEDIUM

camel.source.endpoint.filterFile

Filters the file based on Simple language. For example to filter on file size, you can use $\{file:size} 5000.

MEDIUM

camel.source.endpoint.idempotent

Option to use the Idempotent Consumer EIP pattern to let Camel skip already processed files. Will by default use a memory based LRUCache that holds 1000 entries. If noop=true then idempotent will be enabled as well to avoid consuming the same files over and over again.

"false"

MEDIUM

camel.source.endpoint.idempotentKey

To use a custom idempotent key. By default the absolute path of the file is used. You can use the File Language, for example to use the file name and file size, you can do: idempotentKey=$\{file:name}-$\{file:size}.

MEDIUM

camel.source.endpoint.idempotentRepository

A pluggable repository org.apache.camel.spi.IdempotentRepository which by default use MemoryMessageIdRepository if none is specified and idempotent is true.

MEDIUM

camel.source.endpoint.include

Is used to include files, if filename matches the regex pattern (matching is case in-sensitive). Notice if you use symbols such as plus sign and others you would need to configure this using the RAW() syntax if configuring this as an endpoint uri. See more details at configuring endpoint uris.

MEDIUM

camel.source.endpoint.includeExt

Is used to include files matching file extension name (case insensitive). For example to include txt files, then use includeExt=txt. Multiple extensions can be separated by comma, for example to include txt and xml files, use includeExt=txt,xml. Note that the file extension includes all parts, for example having a file named mydata.tar.gz will have extension as tar.gz. For more flexibility then use the include/exclude options.

MEDIUM

camel.source.endpoint.maxDepth

The maximum depth to traverse when recursively processing a directory.

2147483647

MEDIUM

camel.source.endpoint.maxMessagesPerPoll

To define a maximum messages to gather per poll. By default no maximum is set. Can be used to set a limit of e.g. 1000 to avoid when starting up the server that there are thousands of files. Set a value of 0 or negative to disabled it. Notice: If this option is in use then the File and FTP components will limit before any sorting. For example if you have 100000 files and use maxMessagesPerPoll=500, then only the first 500 files will be picked up, and then sorted. You can use the eagerMaxMessagesPerPoll option and set this to false to allow to scan all files first and then sort afterwards.

MEDIUM

camel.source.endpoint.minDepth

The minimum depth to start processing when recursively processing a directory. Using minDepth=1 means the base directory. Using minDepth=2 means the first sub directory.

MEDIUM

camel.source.endpoint.move

Expression (such as Simple Language) used to dynamically set the filename when moving it after processing. To move files into a .done subdirectory just enter .done.

MEDIUM

camel.source.endpoint.exclusiveReadLockStrategy

Pluggable read-lock as a org.apache.camel.component.file.GenericFileExclusiveReadLockStrategy implementation.

MEDIUM

camel.source.endpoint.readLock

Used by consumer, to only poll the files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Camel will wait until the file lock is granted. This option provides the build in strategies: - none - No read lock is in use - markerFile - Camel creates a marker file (fileName.camelLock) and then holds a lock on it. This option is not available for the FTP component - changed - Changed is using file length/modification timestamp to detect whether the file is currently being copied or not. Will at least use 1 sec to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process. The option readLockCheckInterval can be used to set the check frequency. - fileLock - is for using java.nio.channels.FileLock. This option is not avail for Windows OS and the FTP component. This approach should be avoided when accessing a remote file system via a mount/share unless that file system supports distributed file locks. - rename - rename is for using a try to rename the file as a test if we can get exclusive read-lock. - idempotent - (only for file component) idempotent is for using a idempotentRepository as the read-lock. This allows to use read locks that supports clustering if the idempotent repository implementation supports that. - idempotent-changed - (only for file component) idempotent-changed is for using a idempotentRepository and changed as the combined read-lock. This allows to use read locks that supports clustering if the idempotent repository implementation supports that. - idempotent-rename - (only for file component) idempotent-rename is for using a idempotentRepository and rename as the combined read-lock. This allows to use read locks that supports clustering if the idempotent repository implementation supports that.Notice: The various read locks is not all suited to work in clustered mode, where concurrent consumers on different nodes is competing for the same files on a shared file system. The markerFile using a close to atomic operation to create the empty marker file, but its not guaranteed to work in a cluster. The fileLock may work better but then the file system need to support distributed file locks, and so on. Using the idempotent read lock can support clustering if the idempotent repository supports clustering, such as Hazelcast Component or Infinispan. One of: [none] [markerFile] [fileLock] [rename] [changed] [idempotent] [idempotent-changed] [idempotent-rename].

Enum values:

  • none

  • markerFile

  • fileLock

  • rename

  • changed

  • idempotent

  • idempotent-changed

  • idempotent-rename

"none"

MEDIUM

camel.source.endpoint.readLockCheckInterval

Interval in millis for the read-lock, if supported by the read lock. This interval is used for sleeping between attempts to acquire the read lock. For example when using the changed read lock, you can set a higher interval period to cater for slow writes. The default of 1 sec. may be too fast if the producer is very slow writing the file. Notice: For FTP the default readLockCheckInterval is 5000. The readLockTimeout value must be higher than readLockCheckInterval, but a rule of thumb is to have a timeout that is at least 2 or more times higher than the readLockCheckInterval. This is needed to ensure that amble time is allowed for the read lock process to try to grab the lock before the timeout was hit.

1000L

MEDIUM

camel.source.endpoint.readLockDeleteOrphanLockFiles

Whether or not read lock with marker files should upon startup delete any orphan read lock files, which may have been left on the file system, if Camel was not properly shutdown (such as a JVM crash). If turning this option to false then any orphaned lock file will cause Camel to not attempt to pickup that file, this could also be due another node is concurrently reading files from the same shared directory.

true

MEDIUM

camel.source.endpoint.readLockIdempotentReleaseAsync

Whether the delayed release task should be synchronous or asynchronous. See more details at the readLockIdempotentReleaseDelay option.

false

MEDIUM

camel.source.endpoint.readLockIdempotentReleaseAsyncPoolSize

The number of threads in the scheduled thread pool when using asynchronous release tasks. Using a default of 1 core threads should be sufficient in almost all use-cases, only set this to a higher value if either updating the idempotent repository is slow, or there are a lot of files to process. This option is not in-use if you use a shared thread pool by configuring the readLockIdempotentReleaseExecutorService option. See more details at the readLockIdempotentReleaseDelay option.

MEDIUM

camel.source.endpoint.readLockIdempotentReleaseDelay

Whether to delay the release task for a period of millis. This can be used to delay the release tasks to expand the window when a file is regarded as read-locked, in an active/active cluster scenario with a shared idempotent repository, to ensure other nodes cannot potentially scan and acquire the same file, due to race-conditions. By expanding the time-window of the release tasks helps prevents these situations. Note delaying is only needed if you have configured readLockRemoveOnCommit to true.

MEDIUM

camel.source.endpoint.readLockIdempotentReleaseExecutorService

To use a custom and shared thread pool for asynchronous release tasks. See more details at the readLockIdempotentReleaseDelay option.

MEDIUM

camel.source.endpoint.readLockLoggingLevel

Logging level used when a read lock could not be acquired. By default a DEBUG is logged. You can change this level, for example to OFF to not have any logging. This option is only applicable for readLock of types: changed, fileLock, idempotent, idempotent-changed, idempotent-rename, rename. One of: [TRACE] [DEBUG] [INFO] [WARN] [ERROR] [OFF].

Enum values:

  • TRACE

  • DEBUG

  • INFO

  • WARN

  • ERROR

  • OFF

"DEBUG"

MEDIUM

camel.source.endpoint.readLockMarkerFile

Whether to use marker file with the changed, rename, or exclusive read lock types. By default a marker file is used as well to guard against other processes picking up the same files. This behavior can be turned off by setting this option to false. For example if you do not want to write marker files to the file systems by the Camel application.

true

MEDIUM

camel.source.endpoint.readLockMinAge

This option is applied only for readLock=changed. It allows to specify a minimum age the file must be before attempting to acquire the read lock. For example use readLockMinAge=300s to require the file is at last 5 minutes old. This can speedup the changed read lock as it will only attempt to acquire files which are at least that given age.

0L

MEDIUM

camel.source.endpoint.readLockMinLength

This option is applied only for readLock=changed. It allows you to configure a minimum file length. By default Camel expects the file to contain data, and thus the default value is 1. You can set this option to zero, to allow consuming zero-length files.

1L

MEDIUM

camel.source.endpoint.readLockRemoveOnCommit

This option is applied only for readLock=idempotent. It allows to specify whether to remove the file name entry from the idempotent repository when processing the file is succeeded and a commit happens. By default the file is not removed which ensures that any race-condition do not occur so another active node may attempt to grab the file. Instead the idempotent repository may support eviction strategies that you can configure to evict the file name entry after X minutes - this ensures no problems with race conditions. See more details at the readLockIdempotentReleaseDelay option.

false

MEDIUM

camel.source.endpoint.readLockRemoveOnRollback

This option is applied only for readLock=idempotent. It allows to specify whether to remove the file name entry from the idempotent repository when processing the file failed and a rollback happens. If this option is false, then the file name entry is confirmed (as if the file did a commit).

true

MEDIUM

camel.source.endpoint.readLockTimeout

Optional timeout in millis for the read-lock, if supported by the read-lock. If the read-lock could not be granted and the timeout triggered, then Camel will skip the file. At next poll Camel, will try the file again, and this time maybe the read-lock could be granted. Use a value of 0 or lower to indicate forever. Currently fileLock, changed and rename support the timeout. Notice: For FTP the default readLockTimeout value is 20000 instead of 10000. The readLockTimeout value must be higher than readLockCheckInterval, but a rule of thumb is to have a timeout that is at least 2 or more times higher than the readLockCheckInterval. This is needed to ensure that amble time is allowed for the read lock process to try to grab the lock before the timeout was hit.

10000L

MEDIUM

camel.source.endpoint.backoffErrorThreshold

The number of subsequent error polls (failed due some error) that should happen before the backoffMultipler should kick-in.

MEDIUM

camel.source.endpoint.backoffIdleThreshold

The number of subsequent idle polls that should happen before the backoffMultipler should kick-in.

MEDIUM

camel.source.endpoint.backoffMultiplier

To let the scheduled polling consumer backoff if there has been a number of subsequent idles/errors in a row. The multiplier is then the number of polls that will be skipped before the next actual attempt is happening again. When this option is in use then backoffIdleThreshold and/or backoffErrorThreshold must also be configured.

MEDIUM

camel.source.endpoint.delay

Milliseconds before the next poll.

500L

MEDIUM

camel.source.endpoint.greedy

If greedy is enabled, then the ScheduledPollConsumer will run immediately again, if the previous run polled 1 or more messages.

false

MEDIUM

camel.source.endpoint.initialDelay

Milliseconds before the first poll starts.

1000L

MEDIUM

camel.source.endpoint.repeatCount

Specifies a maximum limit of number of fires. So if you set it to 1, the scheduler will only fire once. If you set it to 5, it will only fire five times. A value of zero or negative means fire forever.

0L

MEDIUM

camel.source.endpoint.runLoggingLevel

The consumer logs a start/complete log line when it polls. This option allows you to configure the logging level for that. One of: [TRACE] [DEBUG] [INFO] [WARN] [ERROR] [OFF].

Enum values:

  • TRACE

  • DEBUG

  • INFO

  • WARN

  • ERROR

  • OFF

"TRACE"

MEDIUM

camel.source.endpoint.scheduledExecutorService

Allows for configuring a custom/shared thread pool to use for the consumer. By default each consumer has its own single threaded thread pool.

MEDIUM

camel.source.endpoint.scheduler

To use a cron scheduler from either camel-spring or camel-quartz component. Use value spring or quartz for built in scheduler.

"none"

MEDIUM

camel.source.endpoint.schedulerProperties

To configure additional properties when using a custom scheduler or any of the Quartz, Spring based scheduler.

MEDIUM

camel.source.endpoint.startScheduler

Whether the scheduler should be auto started.

true

MEDIUM

camel.source.endpoint.timeUnit

Time unit for initialDelay and delay options. One of: [NANOSECONDS] [MICROSECONDS] [MILLISECONDS] [SECONDS] [MINUTES] [HOURS] [DAYS].

Enum values:

  • NANOSECONDS

  • MICROSECONDS

  • MILLISECONDS

  • SECONDS

  • MINUTES

  • HOURS

  • DAYS

"MILLISECONDS"

MEDIUM

camel.source.endpoint.useFixedDelay

Controls if fixed delay or fixed rate is used. See ScheduledExecutorService in JDK for details.

true

MEDIUM

camel.source.endpoint.shuffle

To shuffle the list of files (sort in random order).

false

MEDIUM

camel.source.endpoint.sortBy

Built-in sort by using the File Language. Supports nested sorts, so you can have a sort by file name and as a 2nd group sort by modified date.

MEDIUM

camel.source.endpoint.sorter

Pluggable sorter as a java.util.Comparator class.

MEDIUM

camel.component.file.bridgeErrorHandler

Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions occurred while the consumer is trying to pickup incoming messages, or the likes, will now be processed as a message and handled by the routing Error Handler. By default the consumer will use the org.apache.camel.spi.ExceptionHandler to deal with exceptions, that will be logged at WARN or ERROR level and ignored.

false

MEDIUM

camel.component.file.autowiredEnabled

Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching type, which then gets configured on the component. This can be used for automatic configuring JDBC data sources, JMS connection factories, AWS Clients, etc.

true

MEDIUM

The camel-file source connector has no converters out of the box.

The camel-file source connector supports 1 transforms out of the box, which are listed below.

  • org.apache.camel.kafkaconnector.file.transformers.FileTransforms

The camel-file source connector has no aggregation strategies out of the box.