What is Tika?

Apache Tika is a tool for extracting metadata from various file types such as PDF. In my case, I am running it in a Docker container and connected it to my Paperless-ngx instance along with Gotenberg for the ability to parse and convert Office documents such as “.doc”, “.xlsx”, and “.odt” as well as emails (“.eml” files).