High-Quality Real-Time Rendering
Using Subpixel Sampling Reconstruction

AAAI 2024
Boyu Zhang1, 3, Hongliang Yuan2, Mingyan Zhu3, 4,
Ligang Liu5, Jue Wang3
1University of California, Los Angeles, 2Xiaomi Corporation, 3Tencent AI Lab
4Tsinghua University, 5University of Science and Technology of China
Tip: drag the slider below to compare the images before and after our method.


Abstract


Our subpixel reconstruction achieves superior image quality while being 1.5x faster.

Generating high-quality, realistic rendering images for real-time applications generally requires tracing a few samples-per-pixel (spp) and using deep learning-based approaches to denoise the resulting low-spp images. Existing denoising methods necessitate a substantial time expenditure when rendering at high resolutions due to the physically-based sampling and network inference time burdens. In this paper, we propose a novel Monte Carlo sampling strategy to accelerate the sampling process and a corresponding denoiser, subpixel sampling reconstruction (SSR), to obtain high-quality images. Extensive experiments demonstrate that our method significantly outperforms previous approaches in denoising quality and reduces overall time costs, enabling real-time rendering capabilities at 2K resolution.




SSR Method Overview



reconstruction

SSR includes the temporal feature accumulator (TFA) and the reconstruction network. TFA consists of two networks, each with two convolution layers that have a spatial support of 3×3 pixels. One network accepts all features and mask of current frame as input and outputs reference embedding. The other computes embeddings for the current features ft and warped previous features ft-1. These two embeddings are then pixel-wise multiplied to the reference embedding and then through softmax(·) to get α and β (α + β=1) blending factors for current features and previous features. Our reconstruction network extends U-Net with skip connections, predicts two coarse-scale images at the first two decoder stages, rather than predicting dense features at these stages. Note that all frames are demodulated by albedo.



Subpixel MC denoising datasets



BI BE Sponza
Angel diningroom warmroom

We utilized a vulkan-based hybrid ray tracer to generate our subpixel sampling dataset. To optimize our approach for application in games and advanced virtual rendering, we conducted distinct training sessions for each 3D scene instead of collective training. This approach is in concordance with the paradigm employed in NVIDIA DLSS. The training process was carried out across six distinct scenes, BistroInterior, BistroExterior, Sponza, Diningroom, Angel and Warmroom, contain more than one million triangles and transparency, diffuse, specular, and soft shadow effects. All scenes include 100 to 1000 frames with a resolution of 1024×2048. We also rendered a validation set of 10 frames and a 50 frames test set for each scene. The ground truth image is rendered at 32768-spp for reference.

BibTeX

@article{zhang2023high,
  title={High-Quality Real-Time Rendering Using Subpixel Sampling Reconstruction},
  author={Zhang, Boyu and Yuan, Hongliang and Zhu, Mingyan and Liu, Ligang and Wang, Jue},
  journal={arXiv preprint arXiv:2301.01036v2},
  year={2023}
}

or

@article{zhang2023high,
  title={High-Quality Supersampling via Mask-reinforced Deep Learning for Real-time Rendering},
  author={Zhang, Boyu and Yuan, Hongliang and Zhu, Mingyan and Liu, Ligang and Wang, Jue},
  journal={arXiv preprint arXiv:2301.01036v1},
  year={2023}
}