Inv3D

A high-resolution 3D invoice dataset for template-guided single-image document unwarping

FZI Research Center for Information Technology, Karlsruhe, Germany
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Abstract

Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval.

In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention.

Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available on this website.


GeoTrTemplate

We present a model named GeoTrTemplate, which leverages a-priori known templates in document dewarping. Our model extends the GeoTr model by Feng et al. [1].


Figure: We take a photo and the corresponding document template as RGB images as input and generate image representations. These representations are combined using a transformer architecture and subsequently upsampled to create the backward mapping. Ultimately, the backward map is applied to the source image, resulting in a geometrically normalized image.




Inv3D Dataset

We present a novel high-resolution dataset with template information, 3D renderings, a multiplicity of supervision signal maps, and backward transforms to enable designated learning of structural features for image dewarping.
  • 25,000 images with full 2D and 3D annotations
  • created using 100 HTML templates for a wide layout variety
  • fully randomized realistic content
Randomly sampled image from Inv3D.



Inv3DReal Dataset

We introduce a real-world dataset, to measure the performance of dewarping models under realistic conditions. Inv3DReal consists of 360 pictures displaying printed and altered invoices taken by a smartphone camera under different lighting conditions and backgrounds.
  • six different deformations
    • perspective
    • curled
    • fewfold
    • multifold
    • crumples easy
    • crumples hard
  • three different settings
    • bright
    • colored
    • shadow
Randomly sampled image from Inv3DReal.



Downloads

Samples

Inv3D sample 0000141.3 MBDownload
Inv3D sample 0000246.1 MBDownload
Inv3D sample 0000342.9 MBDownload

Inv3D

Meta data1.7 MBLink
Test split131.7 GBLink
Validation split128.1 GBLink
Train split part 1 of 4149.8 GBLink
Train split part 2 of 4150.9 GBLink
Train split part 3 of 4149.5 GBLink
Train split part 4 of 4149.9 GBLink

Inv3DReal

Inv3DReal part 1 of 265.4 MBDownload
Inv3DReal part 2 of 272.5 MBDownload



BibTeX


@article{Hertlein2023,
  title        = {Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping},
  author       = {Hertlein, Felix and Naumann, Alexander and Philipp, Patrick},
  year         = 2023,
  month        = {Apr},
  day          = 29,
  journal      = {International Journal on Document Analysis and Recognition (IJDAR)},
  doi          = {10.1007/s10032-023-00434-x},
  ISSN         = {1433-2825},
  url          = {https://doi.org/10.1007/s10032-023-00434-x}
}  

                



References

[1] Feng, H., Wang, Y., Zhou, W., et al.: Doctr: Document image transformer for geometric unwarping and illumination correction. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 273–281 (2021a)