Generative Adversarial framework for CT image denoising using Self-Attentive Residual UNet and Patch GAN Discriminator

Swapna Katta; Prabhishek Singh; Deepak Garg; Manoj Diwakar

Katta S, Singh P, Garg D, Diwakar M. Generative Adversarial framework for CT image denoising using Self-Attentive Residual UNet and Patch GAN Discriminator. Biomed Pharmacol J 2026;19(1).

Manuscript received on :24-05-2025
Manuscript accepted on :10-11-2025
Published online on: 11-02-2026

Plagiarism Check: Yes
Reviewed by: Dr. Heamn Noori Abduljabbar
Second Review by: Dr. Narasimha Murthy
Final Approval by: Dr. Patorn Piromchai

How to Cite | Publication History

Views:

Visited 209 times, 1 visit(s) today

PDF Downloads:

93

Generative Adversarial framework for CT image denoising using Self-Attentive Residual UNet and Patch GAN Discriminator

Swapna Katta¹, Prabhishek Singh^2*, Deepak Garg¹and Manoj Diwakar³

¹School of Computer Science and Artificial Intelligence, SR University, Warangal, India

²School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India

³Department of CSE, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India

Corresponding author email: prabhisheksingh1988@gmail.com

DOI : https://dx.doi.org/10.13005/bpj/3379

Abstract

Computed Tomography (CT) is an indispensable tool to identify various health conditions. In Low Dose (LDCT) imaging, lower radiation is frequently chosen to minimize the impact of radiation on the human body, but it results in degraded image quality and subsequently generates noise in CT images. CT images are regularly influenced by Gaussian noise. The noise in CT images obscures fine anatomical details, thereby impacting accurate diagnostic precision. This is a challenging task for traditional image denoising approaches to maintain trade-off between reducing noise and preserving image information. Hence in LDCT images, lowering the radiation while enhancing the image quality is a critical challenge in clinical images. The main study explored the application of a GAN-based model that included the use of a self-attentive UNet generator to extract both local and global contextual data with a Patch GAN discriminator. The discriminator is evaluated to find the realism of local patches to maintain fine structural details. The quantitative measures such as the average Peak Signal to Noise Ratio (PSNR) value of 35.15 dB, and Structural Similarity Index Measure (SSIM) value of 0.92 at noise variance (σ = 10) to examine the effectiveness of the model with respect to visual quality and clarity. The GAN-based model shows superior performance than ResNet50, UNet, and DnCNN models, showing that the GAN-based model is an optimistic denoising technique in LDCT images to suppress noise in LDCT images.

Keywords

Deep learning; GAN; Gaussian noise; Image denoising; LDCT imaging

Download this article as:

Copy the following to cite this article:

Katta S, Singh P, Garg D, Diwakar M. Generative Adversarial framework for CT image denoising using Self-Attentive Residual UNet and Patch GAN Discriminator. Biomed Pharmacol J 2026;19(1).

Copy the following to cite this URL:

Katta S, Singh P, Garg D, Diwakar M. Generative Adversarial framework for CT image denoising using Self-Attentive Residual UNet and Patch GAN Discriminator. Biomed Pharmacol J 2026;19(1). Available from: https://bit.ly/4cwvvNZ

Introduction

In recent years, CT has grown as an essential practical medical imaging technique for generating slice images of the inner anatomy to identify abnormalities such as cancer, fractures and tumors. CT imaging is a noninvasive technique that plays a multifaceted role in obtaining fine anatomical details using high temporal spatial resolution, and is effectively utilized to help pathological diagnosis and multiple treatment disciplines. However, the potential risks of X-ray ionizing radiation have triggered public issue.¹As per the known ALARA (As Low-as-Reasonably Achievable) standard, the reduction of X-radiation dose is a research hotspot in various medical imaging modalities, and the radiation dose might be reduced by minimizing X-ray intensity using criteria such as lessen the radiation exposure duration and tube current. The diminution of X-ray emission resulting noisy CT images. The noise in low dose CT imaging can be categorized as Gaussian and Poisson noise.²Gaussian noise can occur due to the random variations in pixel intensity values and follows a Gaussian distribution, resulting in edge smoothness and uniform interference. Poisson noise can be identified by statistical behavior of X-ray photons, introduces mild yet noticeable grainy effect in the CT images. The random nature of pixel values generates noisy CT images that affects the accurate medical diagnosis. Therefore, it is necessary to minimize the noise in lower dose images by using different denoising methods.^3,4

Various image denoising techniques have been categorized as spatial domain filtering, transform domain filtering, and deep learning-based methods. Spatial filtering methods are applied directly to pixel values across a local neighborhood region. Spatial filtering methods like Gaussian, Wiener, and anisotropic diffusion, and total variation are effectively used to minimize noise and other minute artifacts .^5-7 However, these methods often suffer from excessive noise smoothing while preserving the edges, leading to a loss of anatomical features that affect diagnostic precision. Among these, Total variation was used to solve sparse data issues in the CT images. In spatial domain filters, Block Matching 3D (BM3D) filtering and K-Singular Value Decomposition (KSVD) are popular methods. BM3D is a nonlocal, patch-based denoising method that is used in both spatial and transform domains, whereas KSVD is a dictionary learning and sparse-based adaptability and high-quality method for noise mitigation in lower dose images.^8,9 Regardless of their strong performance, these techniques are often affected by the limitations of various kinds of noise or distortion in CT images, which exhibit nonstationary properties that can weaken their denoising performance. Transform domain filtering methods like wavelet transform, curvelet transform, and shearlet transform, decompose images into various frequency scales to suppress noise in various domains.^10,11 Among these methods, the shearlet transform has superior anisotropic features and multidirectional capability to effectively handle noise and artifacts. However, improper thresholding can introduce ring and Gibbs artifacts around the edges of CT images.^12,13

Recently, Deep learning techniques have depicted superior outcomes in denoising LDCT imaging .^14,15 DL-based techniques utilize neural networks to identify complex noise patterns and correlations between image pixels, enabling them to adjust dynamically to varying noise conditions. DL-based techniques typically employ a training process that involves feeding a large dataset of noisy CT images and their associated noise-free counterparts to a neural network with objective of optimizing the network to diminish the variation between its predicted denoised and clean CT image.

Several deep learning models in particular CNN with different architectures, like Residual Encoder Decoder CNN (RED-CNN) and cascaded CNN, are integrated with different loss functions, such as MSE, Perceptual-loss, and adversarial-loss functions, and are effectively utilized during the learning stage of denoising model.^16,17These architectures are used to identify the complex nature of the denoising model, and loss functions, mainly recognize how effectively model learns from related data. In general, GAN-based denoising models can generate superior results by effectively utilizing distribution transformation and overcoming the voxel-level regression MSE, which can effectively mitigate the over smoothing of LDCT images. Hybrid denoising methods are fundamental to CT image denoising, as human vision perceived quality does not depend on pixel-by-pixel variations but is instead sensitive to contextual and structural features. To overcome this, WGAN-based noise suppression was integrated with a pretrained VGG perceptual loss function. The VGG network is trained on standard RGB images, the use of its perceptual loss function in LDCT images often generates cross-hatch artifacts, and a degradation of crucial details.¹⁸

A novel hybrid GAN method optimizes LDCT image quality using a Generator network to convert noisy images into clean images, whereas the discriminator distinguishes between real and synthetic (generated) outputs. This adversarial approach enhances imaging quality and contributes to higher diagnostic precision.¹⁹ The hybrid approach DU-GAN effectively utilizes GANs and UNet-based discriminators to enhance imaging quality. However, this model is computationally intensive.²⁰ Progressive Wasserstein Generative Adversarial Network (PWGAN)is the novel hybrid approach, including a weighted structural sensitivity hybrid loss function to enhance noise suppression in LDCT images.²¹ The hybrid Dual Encoder Single Decoder (DESD) is a combination of a generator, and pyramid nonlocal attention module to perform feature correlation and auxiliary shallow and deep feature process modules to improve feature extraction. However, this model is highly sensitive to hyperparameters.²² Artifact and detail attention GAN utilizes a multichannel generator to suppress noise, other artifacts, edge details, and multiscale discriminators to enhance discriminative power. However, the computational cost is high.²³

Related work

Recently, GAN-based techniques have become a strong framework for LDCT image denoising, providing superior performance in maintaining edge details and structural information compared with conventional deep learning methods. This portion reviews various deep learning-based approaches, and existing GAN related approaches for noise suppression and image enhancement in LDCT images. The Saidulu et al.²⁴proposed Asymmetric Convolution-based GAN framework (ACG-Net), which is a GAN-based LDCT image denoising approach that includes 1D-Asymmetric convolutions to improve directional features and dynamic attention to fuse multiscale data to maintain both local and global pixel correlations, as well as Neural Structure Preserving Loss (NSPL) to maintain the integrity of anatomical structures. The method provides enhanced structural fidelity and higher perceptual quality to improve denoising performance in LDCT images. However, the model is complex, and demanding computational requirements may inhibits its significance in real time medical situations. Kuraning et al.²⁵proposed a CycleGAN-based LDCT denoising technique that integrates a UNet Generator for detail preservation and Patch-GAN discriminator for texture realism to reduce noise in CT images. This method efficiently handles noise mitigation and detail preservation to enhance the overall imaging quality, thereby improving diagnostic precision. However, this method is computationally intensive, which may lead to difficulties in real-time clinical settings.

Abuya et al.²⁶introduced a novel hybrid approach, termed Adversarial Content-Noise Complementary Learning (ACNCL), which combines multiple noise reduction approaches such as DnCNN, UNet, CA-AGF, and Discrete Wavelet transform (DWT) within a GAN framework. The method further utilizes a patch-GAN discriminator to preserve fine image features and thereby improve overall visual quality. The ACNCL distinguishes noise and anatomical features to ensure effective noise minimization while maintaining structural information. The method showed superior denoising performance on CT and MRI datasets to improve image clarity for facilitating prior tumor detection. However, this method incurs computational overhead owing to the use of multiple denoising approaches. Lokhande et al.²⁷implemented a deep dense GAN is integrated with loss function for unsupervised blind noise mitigation, efficiently handling diverse noise types without using paired noisy-clean image datasets. This method effectively utilizes deep and Generative GAN with a novel loss function to suppress noise at various intensity levels. The novel method provides effective noise reduction, fine detail preservation, robust performance in handling complex noise patterns, and artifact removal, thereby facilitating better diagnostic precision.

He et al.²⁸introduced the Structural Semantic Enhancement Network (SSEN) to improve the image sharpness by effectively utilizing a 5×5 sobel operator with dense connections and asymmetric convolutional layers to enhance structural-feature extraction. It preserves both fine image details and structural patterns more efficiently. The compound loss function, combining L1 loss, Multiscale SSIM loss, and multiscale perceptual-loss, was carried out to restore structural and perceptual integrity, yielding better PSNR and SSIM performance. However, the advanced model structure incurs higher computational costs and requires careful parameter tuning. Wang et al.²⁹assessed the various advanced GAN-based techniques for suppressing noise while conserving fine details in LDCT images. This review outlines the main architectural enhancements, such as conditional GANs (C-GANs), Super-resolution GANs (SRGANs), and Cycle-GANs, and Quantitative evaluation of various denoising models using PSNR, SSIM, and Learned-Perceptual-Image-Patch-Similarity (LPIPS) metrics. The GAN architectures show superior denoising, but limitations might exist in reliability, and synthetic artifact generation still exists.

Jiao et al.³⁰implemented a frequency division GAN integrated with an encoder and dual decoder architecture, which provides noise suppression by implementing the isolation of high-frequency coefficients for effective denoising while preserving tissue edge details and textures. The dual decoder recovers lost details, while the denoising process and multiscale inception discriminator improve feature-level discrimination. This method achieved superior noise reduction. However, a complex architectural design might result in higher computational costs and increased training times. Wang et al.³¹suggested that a novel Residual Structure and Cooperative Attention framework is based on GAN (RCA-GAN). This method incorporates a residual learning and cooperative attention module, integrated with a multimodal loss function to enhance the effectiveness of the noise reduction process. The model demonstrated superior noise suppression and fine detail preservation. However, this method is also computationally intensive.

Recently, Naser et al.³² proposed a Denoising GAN-based network for CT image denoising, incorporating a Recursive Residual Group-based generator network with Selective Kernel-Feature-Fusion (SKFF), and a Patch GAN discriminator. This network effectively enhances feature refinement, texture modelling, and hierarchical context-capture in Cardiac Magnetic Resonance imaging (CMR). This model achieved higher PSNR and SSIM values across varying noise intensities, demonstrating the performance of RRG method to suppress noise while preserving fine image details. While their model proved effective for CMR images, our study focuses on LDCT image denoising, which presents distinct challenges due to different noise characteristics, anatomical structures, and diagnostic constraints. To address these constraints, we leverage a self-attention based UNet generator and a Patch GAN discriminator, optimized with adversarial and L1 fidelity loss, to ensure preservation of anatomical details and diagnostic fidelity in LDCT images.

Materials and Methods

GAN-based methodology

The GAN-based denoising technique supports paired images such as noisy and clean images, and converts noisy CT images into denoised ones. Fig.1 depicts the generalized methodology used in image denoising to improve image quality.³³The GAN-based technique was initially applied to CT images corrupted with an Additive Gaussian noise. The Gaussian noise exhibits a normal distribution having a mean value of zero and a standard deviation, is widely indicated as a bell-shaped curve.

Mathematically, it is represented as:

I_j (a,b) is the noisy CT image

I_j (a,b) is the clean CT image

N_j (a,b)is the Additive Gaussian noise

(a, b) represents the pixel positions in the CT image

To demonstrate the denoising efficiency of the GAN-based model for LDCT imaging, a “Large covid-19 CT-Scan slice Dataset” was applied to measure the quality of the denoised CT images.³⁴ This dataset comprises clean CT images collected from different patients, each image with a spatial resolution of 256 x 256 pixels. A total of 7593 CT images, of which 6074 are allocated for training and 1519 are intended for testing. The model was fed with a pair of clean and Gaussian noisy images. A comparison between different denoising methods, such as ResNet50³⁵, UNet³⁶, and DnCNN³⁷, with a GAN-based model to identify the denoising performance of different methods to improve overall imaging quality.

Figure 1: General GAN-based denoising methodology.

Denoising method	Avg. PSNRNoise variance(σ)		Avg. SSIMNoise variance(σ)
Denoising method	10	20	10	20
ResNet50³⁵	31.21	30.20	0.89	0.85
UNet³⁶	32.45	31.50	0.91	0.90
DnCNN³⁷	34.16	32.10	0.91	0.89
GAN-based method	35.15	34.53	0.92	0.91