Biblio
In this study, it was aimed to recognize the emotional state from facial images using the deep learning method. In the study, which was approved by the ethics committee, a custom data set was created using videos taken from 20 male and 20 female participants while simulating 7 different facial expressions (happy, sad, surprised, angry, disgusted, scared, and neutral). Firstly, obtained videos were divided into image frames, and then face images were segmented using the Haar library from image frames. The size of the custom data set obtained after the image preprocessing is more than 25 thousand images. The proposed convolutional neural network (CNN) architecture which is mimics of LeNet architecture has been trained with this custom dataset. According to the proposed CNN architecture experiment results, the training loss was found as 0.0115, the training accuracy was found as 99.62%, the validation loss was 0.0109, and the validation accuracy was 99.71%.
This paper proposes an advanced scheme of message security in 3D cover images using multiple layers of security. Cryptography using AES-256 is implemented in the first layer. In the second layer, edge detection is applied. Finally, LSB steganography is executed in the third layer. The efficiency of the proposed scheme is measured using a number of performance metrics. For instance, mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), mean absolute error (MAE) and entropy.
ARtect is an Augmented Reality application developed with Unity 3D, which envisions an educational interactive and immersive tool for architects, designers, researchers, and artists. This digital instrument renders the competency to visualize custom-made 3D models and 2D graphics in interior and exterior environments. The user-friendly interface offers an accurate insight before the materialization of any architectural project, enabling evaluation of the design proposal. This practice could be integrated into learning architectural design process, saving resources of printed drawings, and 3D carton models during several stages of spatial conception.
The availability of commercial fully immersive virtual reality systems allows the proposal and development of new applications that offer novel ways to visualize and interact with multidimensional neuroimaging data. We propose a system for the visualization and interaction with Magnetic Resonance Imaging (MRI) scans in a fully immersive learning environment in virtual reality. The system extracts the different slices from a DICOM file and presents the slices in a 3D environment where the user can display and rotate the MRI scan, and select the clipping plane in all the possible orientations. The 3D environment includes two parts: 1) a cube that displays the MRI scan in 3D and 2) three panels that include the axial, sagittal, and coronal views, where it is possible to directly access a desired slice. In addition, the environment includes a representation of the brain where it is possible to access and browse directly through the slices with the controller. This application can be used both for educational purposes as an immersive learning tool, and by neuroscience researchers as a more convenient way to browse through an MRI scan to better analyze 3D data.
Neural Style Transfer based on convolutional neural networks has produced visually appealing results for image and video data in the recent years where e.g. the content of a photo and the style of a painting are merged to a novel piece of digital art. In practical engineering development, we utilize 3D objects as standard for optimizing digital shapes. Since these objects can be represented as binary 3D voxel representation, we propose to extend the Neural Style Transfer method to 3D geometries in analogy to 2D pixel representations. In a series of experiments, we first evaluate traditional Neural Style Transfer on 2D binary monochromatic images. We show that this method produces reasonable results on binary images lacking color information and even improve them by introducing a standardized Gram matrix based loss function for style. For an application of Neural Style Transfer on 3D voxel primitives, we trained several classifier networks demonstrating the importance of a meaningful convolutional network architecture. The standardization of the Gram matrix again strongly contributes to visually improved, less noisy results. We conclude that Neural Style Transfer extended by a standardization of the Gram matrix is a promising approach for generating novel 3D voxelized objects and expect future improvements with increasing graphics memory availability for finer object resolutions.
This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network. By doing so, despite using common approaches, we achieve a competent recognition accuracy on Real-World Affective Faces (RAF) database and Static Facial Expressions in the Wild (SFEW 2.0) compared with the state-of-the-art reports. To the best of our knowledge, this is the first time such a deep learning combination of 3D and 2D facial modalities is presented in the context of in-the-wild FER.
This work presents the design and implementation of a large curved display system in a virtual reality (VR) environment that supports visualization of 2D datasets (e.g., images, buttons and text). By using this system, users are allowed to interact with data in front of a wide field of view and gain a high level of perceived immersion. We exhibit two use cases of this system, including (1) a virtual image wall as the display component of a 3D user interface, and (2) an inventory interface for a VR-based educational game. The use cases demonstrate capability and flexibility of curved displays in supporting varied purposes of data interaction within virtual environments.
Cross-modal hashing, which searches nearest neighbors across different modalities in the Hamming space, has become a popular technique to overcome the storage and computation barrier in multimedia retrieval recently. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary code representation, applying exhaustive search in a large-scale dataset is impractical for the real-time purpose, and the Hamming distance computation suffers inaccurate results. In this paper, we propose a novel index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme exploits a few binary bits of the hash code as the index code. Based on the index code representation, we construct an inverted index structure to accelerate the retrieval efficiency and train a neural network to improve the indexing accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boosts the performance over the benchmark datasets and hash methods.
A 2D-Compressive Sensing and hyper-chaos based image compression-encryption algorithm is proposed. The 2D image is compressively sampled and encrypted using two measurement matrices. A chaos based measurement matrix construction is employed. The construction of the measurement matrix is controlled by the initial and control parameters of the chaotic system, which are used as the secret key for encryption. The linear measurements of the sparse coefficients of the image are then subjected to a hyper-chaos based diffusion which results in the cipher image. Numerical simulation and security analysis are performed to verify the validity and reliability of the proposed algorithm.
``Style transfer'' among images has recently emerged as a very active research topic, fuelled by the power of convolution neural networks (CNNs), and has become fast a very popular technology in social media. This paper investigates the analogous problem in the audio domain: How to transfer the style of a reference audio signal to a target audio content? We propose a flexible framework for the task, which uses a sound texture model to extract statistics characterizing the reference audio style, followed by an optimization-based audio texture synthesis to modify the target content. In contrast to mainstream optimization-based visual transfer method, the proposed process is initialized by the target content instead of random noise and the optimized loss is only about texture, not structure. These differences proved key for audio style transfer in our experiments. In order to extract features of interest, we investigate different architectures, whether pre-trained on other tasks, as done in image style transfer, or engineered based on the human auditory system. Experimental results on different types of audio signal confirm the potential of the proposed approach.
With the advent of QR readers and mobile phones the use of graphical codes like QR codes and data matrix code has become very popular. Despite the noise like appearance, it has the advantage of high data capacity, damage resistance and fast decoding robustness. The proposed system embeds the image chosen by the user to develop visually appealing QR codes with improved decoding robustness using BCH algorithm. The QR information bits are encoded into luminance value of the input image. The developed Picode can inspire perceptivity in multimedia applications and can ensure data security for instances like online payments. The system is implemented on Matlab and ARM cortex A8.
This article deals with the estimation of magnet losses in a permanent-magnet motor inserted in a nut-runner. This type of machine has interesting features such as being two-pole, slot-less and running at a high speed (30000 rpm). Two analytical models were chosen from the literature. A numerical estimation of the losses with 2D Finite Element Method was carried out. A detailed investigation of the effect of simulation settings (e.g., mesh size, time-step, remanence flux density in the magnet, superposition of the losses, etc.) was performed. Finally, calculation of losses with 3D-FEM were also run in order to compare the calculated losses with both analytical and 2D-FEM results. The estimation of the losses focuses on a range of frequencies between 10 and 100 kHz.