Speckle noise, mechano-physical noise, and environmental noise are inevitably introduced in digital holographic coherent imaging, which seriously affects the quality of phase maps, and the removal of non-Gaussian statistical noise represented by speckle noise has been a challenging problem. In the past few years, deep learning methods based on convolutional neural networks (CNNs) have made good progress in removing Gaussian noise. However, they tend to fail when these deep networks designed for Gaussian noise removal are used to remove speckle noise. Recently, numerous studies have employed CNNs to address the issue of degraded speckle images, yielding encouraging results. Nevertheless, the degradation of speckle noise that is simulated in isolation is limited and insufficient to encompass the increasingly complex DHI noise environment. This paper presents what we believe to be a novel approach to simulating complex noise environments by multiplexing simulated Gaussian noise and speckle noise. The noise resulting from aliasing does not adhere to the statistical laws of the noise prior to aliasing, which poses a more challenging task for the noise-reduction algorithms utilized in neural networks. Consequently, in conjunction with the capacity of the Swin Transformer to model multi-scale features, this paper proposes a DHI speckle denoising approach based on Swin-UNet. In this paper, Gaussian, speckle, and blending noise datasets with different noise densities are constructed for training and testing by numerical simulation, and generalizability tests are performed on 1,100 randomly selected open-source holographic tomography (HT) noise images at Warsaw University of Technology and 25 speckle images selected from DATABASE. All test results are quantitatively evaluated by three evaluation metrics: mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). All convolutional neural network (CNN) algorithms are evaluated qualitatively based on the number of parameters, floating point operations, and denoising time. The results of the comparison demonstrate that the denoising algorithm presented in this paper exhibits greater stability, accuracy, and generalizability.