2022
Permanent URI for this collectionhttps://hdl.handle.net/20.500.14570/3152
Browse
Search Results
Item One-shot Facial Expression Reenactment using 3D Morphable Models(2022) Vei, RomanThe recent advance in generative adversarial networks has shown promising results in solving the problem of head reenactment. It aims to generate novel images with altered poses and emotions while preserving the identity of a human head from a single photo. Current approaches have limitations, making them inapplicable for real-world applications. Specifically, most algorithms are computationally expensive, have no apparent tools for manual image manipulation, require audio or take multiple input images to generate novel images. Our method addresses the single-shot face reenactment problem with an end-toend algorithm. The proposed method utilizes head 3D morphable model (3DMM) parameters to encode identity, pose, and expression. With the proposed approach, the pose and emotion of a person on an image is changed by manipulating its 3DMM parameters. Our work consists of a face mesh prediction network and a GAN-based renderer. A predictor is a neural network with simple encoder architecture that regresses 3D mesh parameters. A renderer is a GAN network with warping and rendering submodules that renders images from a single source image and target image 3DMM parameters. This work proposes a novel head reenactment framework that is computationally efficient and uses 3DMM parameters that are easy to alter, making the proposed method applicable in real-life applications. It is first to our knowledge approach that simultaneously solves two of these problems: 3DMM parameters prediction and face reenactment, and benefits from both.Item Generalizing texture transformers for super-resolution and inpainting(2022) Romanus, TeodorThe new multi-camera smartphones and recent advancements in generalized Machine Learning models make it possible to bring new types of photo editing neural networks to the market. This thesis covers methods of image enhancement with texture transfer. The known high-resolution regions (reference) can be utilized to restore degraded areas of an image. The task of restoring partially degraded images can be defined “partial super-resolution.” The task of restoring missing parts of images is called inpainting. We propose to use the novel Texture Transformer Network for Image Super-Resolution (TTSR) to solve the partial super-resolution and inpainting tasks. The fully convolutional networks are unable to copy image patches. This inability forces the model to store textures using the train weights. The usage of the attention mechanism allows taking advantage of joint feature learning in low-resolution and high-resolution parts of images simultaneously, in which deep feature correspondences can be discovered by attention. This approach exhibits an accurate transfer of texture features. The experiments confirm that the TTSR network can be used to solve the partial super-resolution and inpainting tasks simultaneously. Modifications of the network (different embedding sizes, soft-attention, trainable projections) study the architecture capacity to solve the specified tasks. The evaluation of results includes comparing the TTSR network with an inpainting network for the inpainting task.Item Multi-temporal Satellite Imagery Panoptic Segmentation of Agricultural Land in Ukraine(2022) Petruk, MarianRemote sensing of the Earth using satellites helps analyze the Earth’s resources, monitor local land surface changes, and study global climate changes. In particular, farmland information helps farmers in decision-making, planning and increases productivity to achieve better agro-ecological conditions. In this work, we primarily focus on panoptic segmentation of agricultural land, a combination of two parts: 1) delineation of parcels (instance segmentation) and 2) classification of parcel crop type (semantic segmentation). Second, we explore how multi-temporal satellite imagery data compares to a single image query in segmentation performance. Third, we conduct experiments using the recent advances in Deep Learning and Computer Vision that improve the performance of such systems. Finally, we show the performance of the state-of-the-art panoptic segmentation algorithm on the agricultural land of Ukraine, where the farmland market has just opened.Item Reinforcement Learning Agents in Procedurally-generated Environments with Sparse Rewards(2022) Nahirnyi, OleksiiSolving sparse-reward environments is one of the most considerable challenges for state-of-the-art (SOTA) Reinforcement Learning (RL). Recent usage of sparse-rewards in procedurally-generated environments (PGE) to more adequately measure agent’s generalization capabilities via randomization makes this challenge even harder. Despite some progress of newly created exploration-based algorithms in MiniGrid PGEs, the task remains open for research in terms of improving sample complexity. We contribute to solving this task by creating a new formulation of exploratory intrinsic reward. We base this formulation on a thorough review and categorization of other methods in this area. Agent that optimizes an RL objective with such a formulation performs better than SOTA methods in some small or medium sized PGEs.Item Polyp detection and segmentation from endoscopy images(2022) Kokshaikyna, MariiaEndoscopy is a widely used clinical procedure for the detection of different diseases in internal gastrointestinal tract’s organs such as the stomach and colon. Modern endoscopes allow getting high-quality video during the procedure. Computer-assisted methods might support medical specialists in detecting or segmenting anomaly regions on the picture. Many datasets are available and methods to detect polyp regions have been proposed. One kind of task is polyps segmentation on images and videos. The best results in semantic segmentation of polyps are now achieved with fully supervised approaches. In this thesis, we describe experiments with CaraNet model. We checked robustness on cross-validation on several publicly available datasets and small private dataset, tried a few modifications of attention layer in order to improve performance, presented and discussed results.Item Brain age prediction based on EEG records(2022) Klymenko, MykolaAmbulatory EEG is a widespread test used in hospitals for the neurological evaluation of patients. EEG waveforms are typically reviewed by a trained neurologist to classify EEG into clinical categories. Methodologically, there is a need to classify EEG recordings automatically. Ideally, the classification models should be interpretable, able to deal with EEG of varying durations, and robust to various artifacts. We aimed to test and validate a framework for EEG classification, which satisfies such requirements by symbolizing EEG signals and adapting a method previously proposed in natural language processing (NLP).We considered an extensive sample of routine clinical EEG (n=5’850), with a wide range of ages between 0 and 100 years old. We symbolized the multi-variate EEG times series and applied a byte-pair encoding (BPE) algorithm to extract a dictionary of the most frequent patterns (tokens) reflecting the variability of EEG waveforms. To demonstrate the performance of such an approach, we used newly-reconstructed EEG features to predict the biological age of patients with Random Forest. We also correlated the relative frequencies of tokens with age. We found that the age prediction model achieved the mean absolute error of 15.9 in years. The correlation between actual and predicted age was 0.56. The most significant correlations between the frequencies of tokens and age were observed at frontal and occipital EEG channels. Our findings demonstrate the feasibility of an approach based on applying NLP methods to time series classification. Notably, the proposed algorithms could be instrumental in classifying clinical EEG with minimal preprocessing and sensitivity to the appearance of short events, such as epileptic spikes.Item Object detection in automotive vehicle domain based on real and synthetic data(2022) Ilechko, RomanIn recent years, we have seen an incredible increase in deep learning. With increasing interest, we also have an increasing number of bold and even revolutionary studies that drive progress and boost models performance. Despite the fact that the number of large and high-quality data-sets is growing rapidly, we could often observe that models need even more data for many domains and tasks. Usually, additional data is needed not only for giant models. Even though the domains like autonomous vehicles, which mainly focus on lightweight models, require extra data. We should state that sometimes data labeling is not a panacea. Especially for autonomous vehicles, as the data provided must have a great variety and low error risk. The additional synthetic could be an excellent booster for existing approaches or even a must-have part of training data. For example, simulators give the ability to manage the scene’s complexity by controlling the number of objects, their size, and their interaction with the environment, which could be very helpful for such tasks as object detection. Nowadays, the researcher should intuitively balance the ratio of natural and generated data simultaneously, considering the possibility of gaps between the two domains. Despite the fact that the mentioned task is not evident, constraints like model size and count of classes could bring additional unclarity. In this paper, we precisely analyze the impact of synthetic data on the training process, cover possible training strategies, and provide guidance on defining the amount of artificial data with existing constraints.Item Large-scale product classification for efficient matching in procurement systems(2022) Hrysha, IhorWe consider the problem of recommending relevant suppliers given detailed request context in a procurement setting. The fundamental recommendation in procurement systems is that a single query has potentially hundreds of relevant suppliers associated. A complicating factor is that, for most suppliers, we do not have a complete listing of product and service offerings, in contrast with most literature in the space of product search. An additional difficulty is introduced by the fact that queries are generated by users operating within large procurement organizations, each building queries in idiosyncratic but internally consistent ways, and each organizing activities according to a unique internal product taxonomy. The central research question that we aim to address is: can we utilize this vast but inconsistently structured set of product data that allows us to derive semantic meaning across users and contexts? We propose several fully and semi-supervised approaches and benchmark them using a proprietary dataset that includes large-scale procurement data as well as supplier-provided catalogs. Finally, and uniquely, we experimentally validate the performance of our preferred model in a live production setting.Item Mobile Object Tracking with Siamese Neural Network(2022) Borsuk, VasylVisual object tracking is one of the most fundamental research topics in computer vision that aims to obtain the target object’s location in a video sequence given the object’s initial state in the first video frame. The recent advance of deep neural networks, specifically Siamese networks, has led to significant progress in visual object tracking. Despite being accurate and achieving high results on academic benchmarks, current state-of-the-art approaches are compute-intensive and have a large memory footprint that cannot satisfy the strict performance requirements of realworld applications. This work focuses on designing a novel lightweight framework for resource-efficient and accurate visual object tracking. Additionally, we introduce a new tracker efficiency benchmark and protocol where efficiency is defined in terms of both energy consumption and execution speed on edge devices.