What is Tika?
Apache Tika is a tool for extracting metadata from various file types such as PDF. In my case, I am running it in a Docker container and connected it to my Paperless-ngx instance along with Gotenberg for the ability to parse and convert Office documents such as “.doc”, “.xlsx”, and “.odt” as well as emails (“.eml” files).
Note
Refer to Optional Services in the Paperless-ngx documentation for more information.