Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping

Abstract

Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval.

In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention.

Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available on this website.

GeoTrTemplate

We present a model named GeoTrTemplate, which leverages a-priori known templates in document dewarping. Our model extends the GeoTr model by Feng et al. [1].

Figure: We take a photo and the corresponding document template as RGB images as input and generate image representations. These representations are combined using a transformer architecture and subsequently upsampled to create the backward mapping. Ultimately, the backward map is applied to the source image, resulting in a geometrically normalized image.

Inv3D Dataset

We present a novel high-resolution dataset with template information, 3D renderings, a multiplicity of supervision signal maps, and backward transforms to enable designated learning of structural features for image dewarping.

25,000 images with full 2D and 3D annotations
created using 100 HTML templates for a wide layout variety
fully randomized realistic content

Flat document

Flat information delta

Flat template

Flat text mask

Ground truth tags

Warped document

Warped albedo map

Warped angle map

Warped curvature map

Warped depth map

Warped normal map

Warped reconstruction map

Warped text mask

Warped UV map

Warped world coordinates

Backward mapping

Randomly sampled image from Inv3D.

Inv3DReal Dataset

We introduce a real-world dataset, to measure the performance of dewarping models under realistic conditions. Inv3DReal consists of 360 pictures displaying printed and altered invoices taken by a smartphone camera under different lighting conditions and backgrounds.

six different deformations
- perspective
- curled
- fewfold
- multifold
- crumples easy
- crumples hard
three different settings
- bright
- colored
- shadow

Crumples hard & bright setting

Crumples hard & color setting

Crumples hard & shadow setting

Crumples easy & bright setting

Crumples easy & color setting

Crumples easy & shadow setting

Multifold & bright setting

Multifold & color setting

Multifold & shadow setting

Fewfold & bright setting

Fewfold & color setting

Fewfold & shadow setting

Curled & bright setting

Curled & color setting

Curled & shadow setting

Perspective & bright setting

Perspective & color setting

Perspective & shadow setting

Randomly sampled image from Inv3DReal.

Downloads

Samples

Inv3D sample 00001	41.3 MB	Download
Inv3D sample 00002	46.1 MB	Download
Inv3D sample 00003	42.9 MB	Download

Inv3D

Meta data	1.7 MB	Link
Test split	131.7 GB	Link
Validation split	128.1 GB	Link
Train split part 1 of 4	149.8 GB	Link
Train split part 2 of 4	150.9 GB	Link
Train split part 3 of 4	149.5 GB	Link
Train split part 4 of 4	149.9 GB	Link

Inv3DReal

Inv3DReal part 1 of 2	65.4 MB	Download
Inv3DReal part 2 of 2	72.5 MB	Download

BibTeX


@article{Hertlein2023,
  title        = {Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping},
  author       = {Hertlein, Felix and Naumann, Alexander and Philipp, Patrick},
  year         = 2023,
  month        = {Apr},
  day          = 29,
  journal      = {International Journal on Document Analysis and Recognition (IJDAR)},
  doi          = {10.1007/s10032-023-00434-x},
  ISSN         = {1433-2825},
  url          = {https://doi.org/10.1007/s10032-023-00434-x}
}

References

[1] Feng, H., Wang, Y., Zhou, W., et al.: Doctr: Document image transformer for geometric unwarping and illumination correction. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 273–281 (2021a)

Inv3D

A high-resolution 3D invoice dataset for template-guided single-image document unwarping

Abstract

GeoTrTemplate

Inv3D Dataset

Inv3DReal Dataset

Downloads

Samples

Inv3D

Inv3DReal

BibTeX

References