OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data
BMVC 2021

Centre for Synthetic Biology,
Department of Electrical Engineering and Information Technology,
Department of Biology,
Technische Universität Darmstadt


Abstract

Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net's performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

Video 1. Brain tumour segmentation results of OSS-Net (config. C) on the BraTS 2020 dataset. Brain tumour prediction in yellow and label in green. 2D MRI slice (Tc1 modality) overlaid with the corresponding voxelized prediction or label on the left and the corresponding extracted mesh on the right.

Video

Method

To overcome the lack of local information in our OSS-Net occupancy encoder, we extend the original learnable mapping to

with a local observation as an additional input. The local observation is a local 3D patch sampled from the global observation centered at the 3D location which is encoded to a local latent representation.

overview

Figure 1. Architecture of the OSS-Net, with downscaled global volume, local 3D patches, and 3D locations (in orange) as inputs. The 3D CNN encoder (in yellow) extracts a global latent vector (in green). The patch encoder produces n local latent vectors (in green). The global and local latent vectors as well as the 3D locations are concatenated and fed into five residual fully connected blocks with conditional batch normalization ( & predicted parameters) to produce n occupancy probability predictions (in purple). Bestviewed in color.

Our modified inference approach utilises the low-resolution dense prediction of the encoder as the initial state of the octree. This results in faster inference since fewer locations have to be evaluated by the OSS-Net decoder.

overview

Figure 2. Our improved dense segmentation extraction approach in 2D. Initial upsampled and thresholded segmentation of the 3D CNN encoder in green, occupied coordinates in red, unoccupied coordinates in blue, coordinates to be evaluated in gray, and the current voxelized segmentation in pink.

Experimental Results

Table 1. Semantic segmentation results of our approaches and baselines on validation data.

overview


Table 2. GPU memory consumption of our networks and baselines. Inference GPU memory usage of the network evaluation step for different number of sampled locations.

overview

Conclusion

OSS-Net combines the strong segmentation performance of the voxelised CNN performance baseline with the memory efficiency of the original O-Net, enabling accurate, fast, and memory-efficient 3D semantic segmentation that can scale to high resolutions.

Acknowledgements

We thank Marius Memmel and Nicolas Wagner for the insightful discussions, Alexander Christ and Tim Kircher for giving feedback on the first draft, and Markus Baier as well as Bastian Alt for aid with the computational setup.
This work was supported by the Landesoffensive für wissenschaftliche Exzellenz as part of the LOEWE Schwerpunkt CompuGene. H.K. acknowledges support from the European Re- search Council (ERC) with the consolidator grant CONSYN (nr. 773196). O.C. is supported by the Alexander von Humboldt Foundation Philipp Schwartz Initiative.

Citation


Design / source code from Jon Barron's Mip-NeRF / Michaël Gharbi's website

Copyright © Christoph Reich 2022