While data programmers may be experts at importing and manipulating text, image, and video content, this tool was designed to support the import of any document set for further processing. The tool recognizes page layout, and extracts text as well as images!
This tool could be a great place to start creating a document-based dataset for your pet project (or your employer’s pet project!).
Read more technical details at https://arxiv.org/abs/2103.15348
Also, check out this Pytorch-based implementation! https://paperswithcode.com/paper/layoutparser-a-unified-toolkit-for-deep#code