articles+ search results
16,932 articles+ results
1 - 20
Next
1. A novel blind tamper detection and localization scheme for multiple faces in digital images [2023]
-
Rasha Thabit
- IET Image Processing, Vol 17, Iss 14, Pp 3938-3958 (2023)
- Subjects
-
multiple faces authentication, multiple faces detection, multiple faces security, tamper detection for multiple faces, tamper localization for multiple faces, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Face image manipulation detection (FIMD) is a research area of great interest, widely applicable in fields requiring data security and authentication. Existing FIMD schemes aim to identify manipulations in digital face images, but they possess individual strengths and limitations. Most schemes can only detect specific manipulations under certain conditions, leading to variable success rates across different images. The literature lacks emphasis on detecting manipulations involving multiple faces. This paper introduces a novel blind tamper detection and localization scheme specifically designed for multiple faces in digital images. The proposed multiple faces manipulation detection (MFMD) scheme consists of two stages: face detection and selection, and image watermarking. Through extensive experiments, the MFMD scheme's performance has been evaluated on various multiple‐face images, considering embedding capacity, payload, watermarked image quality, time complexity, and manipulation detection ability. The results demonstrate the MFMD scheme's efficacy in detecting different types of manipulations for multiple faces in images. Furthermore, the watermarked images exhibit high visual quality, even when multiple faces are present. The scheme's efficiency recommends it for practical applications, especially in sharing personal images over unsecured networks. This research advances FIMD techniques by addressing the neglected area of multiple‐face manipulation detection. With improved accuracy, faster processing times, and resilience against various manipulations, the MFMD scheme offers valuable capabilities for enhancing data security and authentication in real‐world scenarios.
- Full text View on content provider's site
2. Feature matching of remote‐sensing images based on bilateral local–global structure consistency [2023]
-
Qing‐Yan Chen and Da‐Zheng Feng
- IET Image Processing, Vol 17, Iss 14, Pp 3909-3926 (2023)
- Subjects
-
feature matching, image registration, local structure, mismatch removal, signature quadratic form distance, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The goal of feature matching is to establish accurate correspondences between feature points in different images depicting the same scene. To address the polymorphism of local structures, the authors propose a mismatch removal method using bilateral local–global structural consistency. This method incorporates the problem of mismatch removal into the framework of graph matching, constructs a global affinity matrix using local structural similarity and global affine transformation consistency, and optimizes it using a constrained integer quadratic programming method. To comprehensively describe the local structure, the signature quadratic form distance (SQFD) is used to measure the consistency of the neighbourhood structure. Specifically, the weights of edges are constructed based on the SQFD of the local structure, while the matching correctness of nodes and edges between the two graphs is described using local vector similarity. Furthermore, the consistency of the global affine transformation is evaluated by assessing the consistency of the local neighbourhood affine transformation between different corresponding point pairs. In estimating the local affine transformation, a bilateral correction is performed using a total least‐squares (TLS) algorithm to measure the similarity of nodes between the two different graphs. Experimental results demonstrate that the proposed algorithm outperforms state‐of‐the‐art methods in terms of accuracy and effectiveness.
- Full text View on content provider's site
-
Jin Liu, Zan Li, Qiguang Miao, Peihan Qi, and Danyang Wang
- IET Image Processing, Vol 17, Iss 14, Pp 4028-4043 (2023)
- Subjects
-
image processing, image watermarking, signal processing, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Blind watermark extraction in discrete cosine transform (DCT) domain has a wide application prospect as well as a challenging subject. The imperceptibility of watermark signal makes watermark extraction a weak signal reception issue in essence. For DCT coefficients of host image generally disobey Gaussian distribution, at which the performance of linear correlated reception is no longer optimal. Aiming at this, a novel blind watermark extraction scheme combining the uncorrelated reception with adaptive bistable stochastic resonance (ABSR) technique is proposed. First, by block DCT transformation for host image, an additive watermark embedding algorithm is introduced, in which the watermarked image can be converted to one dimensional time domain weak signal (binary watermark image) reception under additive Laplacian noise (selected DCT coefficients). On this basis, through the key technology research on quantitatively cooperative resonance relationship under Laplacian noise, the ABSR system can be implemented by bistable system parameters self‐adaptive adjustment, in which the ABSR system output signal will be enhanced rather than be weakened by random noise. Finally, the ABSR‐based watermark extraction scheme is investigated, and both the visual effect, bit error ratio performance and robustness of proposed scheme are testified to be superior to that of traditional uncorrelated extraction.
- Full text View on content provider's site
4. Synchronization of Boolean networks with chaos‐driving and its application in image cryptosystem [2023]
-
Peng‐Fei Yan, Hao Zhang, Chuan Zhang, Rui‐Yun Chang, and Yu‐Jie Sun
- IET Image Processing, Vol 17, Iss 14, Pp 4176-4189 (2023)
- Subjects
-
biology computing, chaos, compressed sensing, cryptography, image coding, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract This paper proposes a Boolean network model with high dimensional chaos driving and investigates the synchronization of the chaos‐driven Boolean network with a semi‐tensor product. In order to protect the privacy and ensure the security of image transmission, the synchronization results are utilized in the image cryptosystem to achieve compression and encryption. First, the driving chaos system is coupled with multiple local systems and synchronized with the transmitted encrypted signals. Second, the Boolean network is driven and synchronized with derived chaos signals. Finally, images are encrypted and compressed with chaos‐driven Boolean network signals in the transmitter, and then decrypted and recovered with synchronized chaos and Boolean network signals in the recipient. Because of the complexities of high dimensional chaos and Boolean network, the proposed cryptosystem has good security in the secure communication and image process.
- Full text View on content provider's site
-
LiXia Xue, JunHui Shen, RongGui Wang, and Juan Yang
- IET Image Processing, Vol 17, Iss 14, Pp 4190-4201 (2023)
- Subjects
-
image reconstruction, image resolution, image restoration, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Recently, convolutional neural network (CNN) has shown great power in single image super resolution (SISR) reconstruction, and achieving significant improvements over traditional methods. Despite the great success of these CNN‐based methods, direct application of these methods to some edge devices is impractical due to the large computational overhead required. To address this problem, a novel, lightweight SISR network focusing on speed and accuracy, called the multi‐path feedback fusion network (MFFN), has been designed in this paper. Specifically, in order to extract features more effectively, the authors propose a novel fusion attention feedback block (FAFB) as the main building block of MFFN. The FAFB consists of a backbone branch and several hierarchical branches. The backbone branch is composed of stacked enhanced pixel attentional blocks (EPAB), which are responsible for incremental deep feature learning on the feature map. And the hierarchical branches are responsible for extracting feature maps with different sizes of receptive fields and fusing these feature maps with the features extracted from the trunk branches to achieve multi‐scale feature learning, which the authors refer to this design as the multi‐scale fusion block (MSFB). Extra enhancement information (EIE) is added to each EPAB input, which enables the backbone branch to learn more effectively. On the other hand, the outputs of the cascade branches are further complemented by an additional feedback fusion enhancement block (FFEB) before being fused with the output of the trunk branches to achieve more comprehensive and accurate feature learning. Numerous experiments have shown that MFFN achieves higher accuracy than other state‐of‐the‐art methods on benchmark test sets.
- Full text View on content provider's site
-
Xiaosong Li, Yanxia Wu, Yan Fu, Lidan Zhang, and Ruize Hong
- IET Image Processing, Vol 17, Iss 14, Pp 3927-3937 (2023)
- Subjects
-
convolutional neural networks, image recognition, object detection, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The bus passenger detection algorithm is a key component of a public transportation bus management system. The detection techniques based on the convolutional neural network have been widely used in bus passenger detection. However, they require high memory and computational requirements, which hinder the deployment of bus passenger detectors in the bus system. In this paper, a lightweight bus passenger detection model based on YOLOv5 is introduced. To make the model more lightweight, the inner and outer cross‐stage bottleneck modules, called ICB and OCB, respectively, are proposed. The proposed module reduces the quantity of parameter and floating point operations and increases the detection speed. In addition, the neighbour feature attention pooling is adopted to improve detection accuracy. The performance of the lightweight model on the bus passenger dataset is empirically demonstrated. The experiment results demonstrate that the proposed model is lightweight and efficient. Compared lightweight YOLOv5n with the original algorithm, the model weight is reduced by 31% to 2.6M, and the detection speed is increased by 6% to 40FPS without an accuracy drop.
- Full text View on content provider's site
-
Yaozhen Yu, Hao Zhang, and Fei Yuan
- IET Image Processing, Vol 17, Iss 14, Pp 4142-4158 (2023)
- Subjects
-
computer vision, image processing, length measurement, measurement systems, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Accurate fish size measurement in breeding areas is crucial for the fishing industry. Unlike acoustic methods with high equipment cost and low measurement accuracy, current image‐based methods offer a promising alternative. However, these image‐based methods still face challenges in selecting measurement points. To address this issue and achieve precise measurements of individual fish, this paper introduces an automatic fish size measurement method based on key point detection. We established a Fish‐Keypoints dataset and utilized deep learning techniques for the detection of fish and their key points. Using a binocular camera system, we reconstruct a three‐dimensional coordinate system to measure key points at the fish's head and tail, facilitating fish length calculation. The detection model achieves an accuracy of 85.1% in key point detection. The proposed method is tested in both land and underwater environments, demonstrating a relative measurement error of approximately 7% for fish in pools. This confirms the proposed method's ability to accurately detect measurement points, offering superior accuracy compared to other methods.
- Full text View on content provider's site
8. An adaptive enhancement method based on stochastic parallel gradient descent of glioma image [2023]
-
Hongfei Wang, Xinhao Peng, ShiQing Ma, Shuai Wang, Chuan Xu, and Ping Yang
- IET Image Processing, Vol 17, Iss 14, Pp 3976-3985 (2023)
- Subjects
-
image denoising, image enhancement, parallel algorithms, artefacts suppression, contrast improvement, histogram modification, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Brain tumour diagnosis is significant for both physicians and patients, but the low contrast and the artefacts of MRI glioma images always affect the diagnostic accuracy. The existing mainstream image enhancement methods are insufficient in improving contrast and suppressing artefacts simultaneously. To enrich the field of glioma image enhancement, this research proposed a glioma image enhancement method based on histogram modification and total variational using stochastic parallel gradient descent (SPGD) algorithm. Firstly, this method modifies the cumulative distribution function on the image histogram and performs gamma correction on the image according to the modified histogram to obtain a contrast‐enhanced image. Then, the method suppresses the artefacts of glioma images by total variational and wavelet denoising algorithm. To get better enhancement images, the optimal parameters in the proposed method are searched by the SPGD algorithm. The statistical studies performed on 580 real glioma images demonstrate that the authors’ approach can outperform the existing mainstream image enhancement methods. The results show that the proposed method increases the discrete entropy of the image by 8.9% and the contrast by 2.8% compared to original images. The enhanced images are produced by the proposed method with a natural appearance, appealing contrast, less degradation, and reasonable detail preservation.
- Full text View on content provider's site
-
Xuebin Xu, Meng Lei, Dehua Liu, Muyu Wang, and Longbin Lu
- IET Image Processing, Vol 17, Iss 14, Pp 4129-4141 (2023)
- Subjects
-
computer vision, convolutional neural nets, image segmentation, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Lung segmentation is an essential step in a computer‐aided diagnosis system for chest radiographs. The lung parenchyma is first segmented in pulmonary computer‐aided diagnosis systems to remove the interference of non‐lung regions while increasing the effectiveness of the subsequent work. Nevertheless, most medical image segmentation methods nowadays use U‐Net and its variants. These variant networks perform poorly in segmentation to detect smaller structures and cannot accurately segment boundary regions. A multi‐interaction feature fusion network model based on Kiu‐Net is presented in this paper to address this problem. Specifically, U‐Net and Ki‐Net are first utilized to extract high‐level and detailed features of chest images, respectively. Then, cross‐residual fusion modules are employed in the network encoding stage to obtain complementary features from these two networks. Second, the global information module is introduced to guarantee the segmented region's integrity. Finally, in the network decoding stage, the multi‐interaction module is presented, which allows to interact with multiple kinds of information, such as global contextual information, branching features, and fused features, to obtain more practical information. The performance of the proposed model was assessed on both the Montgomery County (MC) and Shenzhen datasets, demonstrating its superiority over existing methods according to the experimental results.
- Full text View on content provider's site
-
Xudong Zhang, Liwen Cui, Zhiguo Fan, Rui Sun, and Yang Li
- IET Image Processing, Vol 17, Iss 14, Pp 4116-4128 (2023)
- Subjects
-
computer vision, focusing, image restoration, light polarisation, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Underwater images suffer from color distortion, low clarity and halos problems due to light absorption, particle scattering and non‐uniform illumination. To address these degradation issues, a multi‐cues underwater image restoration algorithm combined with light field technology is proposed. First, based on Epipolar Plane Image, the light field cue transmittance containing depth information is calculated. Then, according to the turbidity of the underwater image, the light field cue transmittance and the polarization cue transmittance are fused to obtain the multi‐cues transmittance, which can effectively reduce the effect of particle scattering and color bias. Finally, the background light is estimated through the all‐focus operation, which can effectively overcome the distortion of an underwater single image and simultaneously reduce the halo phenomenon. Experimental results show that the method achieves the best results evaluated by UCIQE, UIQM, PSNR, and SSIM, and the restored color under the method is closer to the actual image than other underwater restoration methods.
- Full text View on content provider's site
11. Small object detection based on hierarchical attention mechanism and multi‐scale separable detection [2023]
-
Yafeng Zhang, Junyang Yu, Yuanyuan Wang, Shuang Tang, Han Li, Zhiyi Xin, Chaoyi Wang, and Ziming Zhao
- IET Image Processing, Vol 17, Iss 14, Pp 3986-3999 (2023)
- Subjects
-
image recognition, object detection, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The ability of modern detectors to detect small targets is still an unresolved topic compared to their capability of detecting medium and large targets in the field of object detection. Accurately detecting and identifying small objects in the real‐world scenario suffer from sub‐optimal performance due to various factors such as small target size, complex background, variability in illumination, occlusions, and target distortion. Here, a small object detection method for complex traffic scenarios named deformable local and global attention (DLGADet) is proposed, which seamlessly merges the ability of hierarchical attention mechanisms (HAMs) with the versatility of deformable multi‐scale feature fusion, effectively improving recognition and detection performance. First, DLGADet introduces the combination of multi‐scale separable detection and multi‐scale feature fusion mechanism to obtain richer contextual information for feature fusion while solving the misalignment problem of classification and localisation tasks. Second, a deformation feature extraction module (DFEM) is designed to address the deformation of objects. Finally, a HAM combining global and local attention mechanisms is designed to obtain discriminative features from complex backgrounds. Extensive experiments on three datasets demonstrate the effectiveness of the proposed methods. Code is available at https://github.com/ACAMPUS/DLGADet
- Full text View on content provider's site
-
Yuqing Chen and Zhitao Guo
- IET Image Processing, Vol 17, Iss 14, Pp 4014-4027 (2023)
- Subjects
-
image denoising, image processing, image restoration, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The transformer, a type of neural architecture, has demonstrated exceptional performance improvements in vision and natural language tasks. While overcoming the disadvantages of limited perceptual field and non‐adaptive input content exhibited in CNNs, the computational complexity of the Transformer model increases quadratically with spatial resolution. As such, this model is not frequently employed in image processing tasks such as image denoising, and there is a shortage of studies that investigate ultrasonic image multiplication speckle removal. In light of this, we present TranSpeckle, an effective and efficient despeckle architecture that employs Multi‐Dconv Head Transposed Attention and Dconv Feed‐Forward Network as the core components of its Transformer block. Multiple Transformer blocks are then utilized to implement a hierarchical encoder‐decoder network. TranSpeckle architecture considerably reduces the computational complexity of feature maps while also effectively capturing long‐range pixel interactions and local context information. In this study, an edge protection module is combined to augment the edges of ultrasound images. The module incorporates extracted image edge features into the TranSpeckle architecture, which ameliorates the issue of edge information loss engendered by the image despeckling process. Extensive experimental results clearly show that our proposed network outperforms state‐of‐the‐art methods in terms of quantitative metrics and visual quality.
- Full text View on content provider's site
-
Yuchao Tang, Shirong Deng, Jigen Peng, and Tieyong Zeng
- IET Image Processing, Vol 17, Iss 14, Pp 4044-4060 (2023)
- Subjects
-
image denoising, impulse noise, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Image restoration with impulse noise is an important task in image processing. Taking into account the statistical distribution of impulse noise, the ℓ1‐norm data fidelity and total variation (ℓ1TV) model has been widely used in this area. However, the ℓ1TV model usually performs worse when the noise level is high. To overcome this drawback, several nonconvex models have been proposed. In this paper, an efficient iterative algorithm is proposed to solve nonconvex models arising in impulse noise. Compared to existing algorithms, the proposed algorithm is a completely explicit algorithm in which every subproblem has a closed‐form solution. The key idea is to transform the original nonconvex models into an equivalent constrained minimization problem with two separable objective functions, where one is differentiable but nonconvex. As a consequence, the proximal linearized alternating direction method of multipliers is employed to solve it. Extensive numerical experiments are presented to demonstrate the efficiency and effectiveness of the proposed algorithm.
- Full text View on content provider's site
-
Weidong Pan, Anhu Li, Yusheng Wu, Zhaojun Deng, and Xingsheng Liu
- IET Image Processing, Vol 17, Iss 14, Pp 4159-4175 (2023)
- Subjects
-
image fusion, image processing, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Image stitching is an important way to achieve large‐field high‐resolution imaging. The inconsistencies in brightness and structure and defects in ghosting, blurring and misalignment between images, which are inevitable and difficult to eliminate, make a challenge to image stitching, due to the external lighting environment and changes in camera pose and parameters. Here, a novel method is proposed to search for the optimal seamline based on the fast marching method, which can stitch large parallax images with high quality. A feature weight map is first formed based on the similarity in colour, edge, texture and saliency of the images. Then it is used as the cost value of the seamline to search for the optimal seamline by fast marching method. The results show that this new method is more efficient to reduce defects, such as ghosting, misalignment and chromatic aberration, and realize high quality image stitching compared with traditional stitching tools and methods, which provides a new perspective for image stitching technology.
- Full text View on content provider's site
-
Yunping Zheng, Bowen Yang, and Mudar Sarem
- IET Image Processing, Vol 17, Iss 14, Pp 4076-4088 (2023)
- Subjects
-
image processing, image representation, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Region extraction is usually used by many computer vision tasks as a pre‐processing step to extract image features. However, how to efficiently extract effective regions remains a challenging problem. In this paper, inspired by the non‐symmetry and anti‐packing pattern representation model (NAM) and the FatRegion algorithm, a fast NAM‐based region extraction algorithm which is called FNRegion is proposed. A NAM‐based homogeneous block generation algorithm is first presented to represent an image as a combination of multiple homogeneous blocks, each of which is a square region with visually indistinguishable intra‐region colour difference. Then, these homogeneous blocks are merged into larger regions according to their colour and shape information. To group these regions into larger ones in order to progressively build a region tree, a distance function is defined using variety of regional information to measure the distance between adjacent regions. Also, a multi‐feature region merging algorithm with linear complexity both in time and space is presented.The proposed algorithm has been evaluated on multiple public datasets in comparison with the state‐of‐the‐art region extraction algorithms. The experimental results show that in the case of almost the same or even less running time as other fast region extraction algorithms, the proposed algorithm is able to extract higher‐quality regions.
- Full text View on content provider's site
16. A robust and clinically applicable deep learning model for early detection of Alzheimer's [2023]
-
Md Masud Rana, Md Manowarul Islam, Md. Alamin Talukder, Md Ashraf Uddin, Sunil Aryal, Naif Alotaibi, Salem A. Alyami, Khondokar Fida Hasan, and Mohammad Ali Moni
- IET Image Processing, Vol 17, Iss 14, Pp 3959-3975 (2023)
- Subjects
-
brain, cancer, diseases, tumours, deep learning, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Alzheimer's disease, often known as dementia, is a severe neurodegenerative disorder that causes irreversible memory loss by destroying brain cells. People die because there is no specific treatment for this disease. Alzheimer's is most common among seniors 65 years and older. However, the progress of this disease can be reduced if it can be diagnosed earlier. Recently, artificial intelligence has instilled hope in the diagnosis of Alzheimer's disease by performing sophisticated analyses on extensive patient datasets, enabling the identification of subtle patterns that may elude human experts. Researchers have investigated various deep learning and machine learning models to diagnose this disease at an early stage using image datasets. In this paper, a new Deep learning (DL) methodology is proposed, where MRI images are fed into the model after applying various pre‐processing techniques. The proposed Alzheimer's disease detection approach adopts transfer learning for multi‐class classification using brain MRIs. The MRI Images are classified into four categories: mild dementia (MD), moderate dementia (MOD), very mild dementia (VMD), and non‐dementia (ND). The model is implemented and extensive performance analysis is performed. The finding shows that the model obtains 97.31% accuracy. The model outperforms the state‐of‐the‐art models in terms of accuracy, precision, recall, and F‐score.
- Full text View on content provider's site
-
Xiaolin Kong, Tao Gao, Ting Chen, and Jing Zhang
- IET Image Processing, Vol 17, Iss 14, Pp 4102-4115 (2023)
- Subjects
-
image denoising, image enhancement, image processing, image reconstruction, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Rain removal is very important for many applications in computer vision, and it is a challenging problem due to its ill‐posed nature, especially for single‐image deraining. In order to remove rain streaks more thoroughly, as well as to retain more details, a progressive dilation dense residual fusion network is proposed. The entire network is designed in a cascade manner with multiple fusion blocks. The fusion block consists of a dilation dense residual block (DDRB) and a dense residual feature fusion block (DRFFB), where DDRB is created for feature extraction and DRFFB is mainly designed for feature fusion operation. Meanwhile, detail compensation memory mechanism (DCMM) is leveraged between each of two cascade modules to retain more background details. Compared with previous state‐of‐the‐art methods, extensive experiments show that the proposed method can achieve better results, in terms of rain streaks removal and background details preservation. Furthermore, the authors’ network also shows its superiority for image noise removal.
- Full text View on content provider's site
-
Ping Yuan, Chunling Fan, and Chuntang Zhang
- IET Image Processing, Vol 17, Iss 14, Pp 4061-4075 (2023)
- Subjects
-
AKAZE, BEBLID, image stitching, panoramic image, seam elimination, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract Deep‐sea image is of great significance for exploring seabed resources. However, the information of a single image is limited. Besides, deep‐sea image with low contrast and colour distortion further restricts useful feature extraction. To address the issues above, this paper presents a multi‐channel fusion and accelerated‐KAZE (AKAZE) feature detection algorithm for deep‐sea image stitching. First, the authors restore deep‐sea image in LAB colour space and RGB colour space, respectively; in LAB space, the authors use homomorphic filtering in L colour channel, and in RGB space, the authors adopt multi‐scale Retinex with chromaticity preservation algorithm to adjust the colour information. Then, the authors blend two pre‐processed images with dark channel prior weighted coefficient. After that, the authors detect feature points with the AKAZE algorithm and obtain feature descriptors with Boosted Efficient Binary Local Image Descriptor. Finally, the authors match the feature points and warp deep‐sea images to obtain the stitched image. Experimental results demonstrate that the authors’ method generates high‐quality stitched image with minimized seam. Compared with state‐of‐the‐art algorithms, the proposed method has better quantitative evaluation, visual stitching results, and robustness.
- Full text View on content provider's site
19. Cuboid‐Net: A multi‐branch convolutional neural network for joint space‐time video super resolution [2023]
-
Congrui Fu, Hui Yuan, Hongji Xu, Hao Zhang, and Liquan Shen
- IET Image Processing, Vol 17, Iss 14, Pp 4089-4101 (2023)
- Subjects
-
image enhancement, image resolution, video signal processing, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The demand for high‐resolution videos has been consistently rising across various domains, propelled by continuous advancements in societal. Nonetheless, limitations in imaging and economic factors often result in obtaining low‐resolution images. The currently available space‐time video super‐resolution methods often fail to fully exploit the information existing within the spatio‐temporal domain. To address this problem, the issue is tackled by conceptualizing the input low‐resolution video as a cuboid structure. An innovative methodology called “Cuboid‐Net”, which incorporates a multi‐branch convolutional neural network, is introduced. Cuboid‐Net is designed to collectively enhance the spatial and temporal resolutions of videos, enabling the extraction of rich and meaningful information across both spatial and temporal dimensions. Specifically, the input video is taken as a cuboid to generate different directional slices as input for different branches of the network. The proposed network contains four modules, that is, a multi‐branch‐based hybrid feature extraction module, a multi‐branch‐based reconstruction module, a first‐stage quality enhancement module, and a second‐stage cross frame quality enhancement module for interpolated frames only. Experimental results demonstrate that the proposed method is not only effective for spatial and temporal super‐resolution of video but also for spatial and angular super‐resolution of light field.
- Full text View on content provider's site
-
Yunqi He, Liqiu Chen, and Honghu Pan
- IET Image Processing, Vol 17, Iss 14, Pp 4000-4013 (2023)
- Subjects
-
identification, image retrieval, video retrievals, Photography, TR1-1050, Computer software, and QA76.75-76.765
- Abstract
-
Abstract The image‐to‐video (I2V) person re‐identification (Re‐ID) is a cross‐modality pedestrian retrieval task, whose crux is to reduce the large modality discrepancy between images and videos. To this end, this paper proposes to predict the following video frames from a single image. Thus, the I2V person Re‐ID can be transformed to video‐to‐video (V2V) Re‐ID. Considering that predicting video frames from a single image is an ill‐posed problem, this paper proposes two strategies to improve the quality of the predicted videos. First, a pose‐guided video prediction pipeline is proposed. The given single image and pedestrian pose are encoded via image encoder and pose encoder, respectively; then, the image feature and pose feature are concatenated as the input of the video decoder. The authors minimize the difference between the predicted video and true video, and simultaneously minimize the difference between the true pose and predicted pose. Second, the conditional adversarial training strategy is employed to generate high‐quality video frames. Specifically, the discriminator takes the source image as condition and distinguishes whether the input frames are fake or true following frames of the source image. Experimental results demonstrate that the pose‐guided adversarial video prediction can effectively improve accuracy of I2V Re‐ID.
- Full text View on content provider's site
Catalog
Books, media, physical & digital resources
Guides
Course- and topic-based guides to collections, tools, and services.
1 - 20
Next