Tika
JVM since1.0.0 Native since1.0.0
Parse documents and extract metadata and text using Apache Tika.
What’s inside
-
Tika component, URI syntax:
tika:operation
Please refer to the above link for usage and configuration details.
Maven coordinates
Or add the coordinates to your existing project:
<dependency>
<groupId>org.apache.camel.quarkus</groupId>
<artifactId>camel-quarkus-tika</artifactId>
</dependency>
Check the User guide for more information about writing Camel Quarkus applications.
Camel Quarkus limitations
Parameters tikaConfig
and tikaConfigUri
are not available in quarkus camel tika extension. Configuration
can be changed only via application.properties
.
While you can use any of the available Tika parsers in JVM mode, only some of those are supported in native mode - see the Quarkus Tika guide.
Use of the Tika parser without any configuration will initialize all available parsers. Unfortunately as some of them don’t work in the native mode, the whole execution will fail.
In order to make the Tika parser work in the native mode, selection of parsers for initialization should be used.
-
quarkus.tika.parsers
Comma separated list of parsers (abbreviations). There are two predefined parsers:pdf
andodf
. -
quarkus.tika.parser.*
Adds new parser abbreviation to be used with previous property. Value is the full class of the parser.
Example of application.properties
:
quarkus.tika.parsers = pdf,odf,office
quarkus.tika.parser.office = org.apache.tika.parser.microsoft.OfficeParser
For more information about selecting parsers see the Quarkus Tika guide.
You may need to add the quarkus-awt
extension to build the native image. For more information, see Quarkus Tika guide.