Visible to the public Biblio

Filters: Keyword is Videos  [Clear All Filters]
2020-12-01
Sun, P., Yin, S., Man, W., Tao, T..  2018.  Research of Personalized Recommendation Algorithm Based on Trust and User's Interest. 2018 International Conference on Robots Intelligent System (ICRIS). :153—156.

Most traditional recommendation algorithms only consider the binary relationship between users and projects, these can basically be converted into score prediction problems. But most of these algorithms ignore the users's interests, potential work factors or the other social factors of the recommending products. In this paper, based on the existing trustworthyness model and similarity measure, we puts forward the concept of trust similarity and design a joint interest-content recommendation framework to suggest users which videos to watch in the online video site. In this framework, we first analyze the user's viewing history records, tags and establish the user's interest characteristic vector. Then, based on the updated vector, users should be clustered by sparse subspace clust algorithm, which can improve the efficiency of the algorithm. We certainly improve the calculation of similarity to help users find better neighbors. Finally we conduct experiments using real traces from Tencent Weibo and Youku to verify our method and evaluate its performance. The results demonstrate the effectiveness of our approach and show that our approach can substantially improve the recommendation accuracy.

2020-11-02
Xiong, Wenjie, Shan, Chun, Sun, Zhaoliang, Meng, Qinglei.  2018.  Real-time Processing and Storage of Multimedia Data with Content Delivery Network in Vehicle Monitoring System. 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM). :1—4.

With the rapid development of the Internet of vehicles, there is a huge amount of multimedia data becoming a hidden trouble in the Internet of Things. Therefore, it is necessary to process and store them in real time as a way of big data curation. In this paper, a method of real-time processing and storage based on CDN in vehicle monitoring system is proposed. The MPEG-DASH standard is used to process the multimedia data by dividing them into MPD files and media segments. A real-time monitoring system of vehicle on the basis of the method introduced is designed and implemented.

2020-07-16
Biancardi, Beatrice, Wang, Chen, Mancini, Maurizio, Cafaro, Angelo, Chanel, Guillaume, Pelachaud, Catherine.  2019.  A Computational Model for Managing Impressions of an Embodied Conversational Agent in Real-Time. 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). :1—7.

This paper presents a computational model for managing an Embodied Conversational Agent's first impressions of warmth and competence towards the user. These impressions are important to manage because they can impact users' perception of the agent and their willingness to continue the interaction with the agent. The model aims at detecting user's impression of the agent and producing appropriate agent's verbal and nonverbal behaviours in order to maintain a positive impression of warmth and competence. User's impressions are recognized using a machine learning approach with facial expressions (action units) which are important indicators of users' affective states and intentions. The agent adapts in real-time its verbal and nonverbal behaviour, with a reinforcement learning algorithm that takes user's impressions as reward to select the most appropriate combination of verbal and non-verbal behaviour to perform. A user study to test the model in a contextualized interaction with users is also presented. Our hypotheses are that users' ratings differs when the agents adapts its behaviour according to our reinforcement learning algorithm, compared to when the agent does not adapt its behaviour to user's reactions (i.e., when it randomly selects its behaviours). The study shows a general tendency for the agent to perform better when using our model than in the random condition. Significant results shows that user's ratings about agent's warmth are influenced by their a-priori about virtual characters, as well as that users' judged the agent as more competent when it adapted its behaviour compared to random condition.

2020-07-03
Kakadiya, Rutvik, Lemos, Reuel, Mangalan, Sebin, Pillai, Meghna, Nikam, Sneha.  2019.  AI Based Automatic Robbery/Theft Detection using Smart Surveillance in Banks. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). :201—204.

Deep learning is the segment of artificial intelligence which is involved with imitating the learning approach that human beings utilize to get some different types of knowledge. Analyzing videos, a part of deep learning is one of the most basic problems of computer vision and multi-media content analysis for at least 20 years. The job is very challenging as the video contains a lot of information with large differences and difficulties. Human supervision is still required in all surveillance systems. New advancement in computer vision which are observed as an important trend in video surveillance leads to dramatic efficiency gains. We propose a CCTV based theft detection along with tracking of thieves. We use image processing to detect theft and motion of thieves in CCTV footage, without the use of sensors. This system concentrates on object detection. The security personnel can be notified about the suspicious individual committing burglary using Real-time analysis of the movement of any human from CCTV footage and thus gives a chance to avert the same.

2020-04-13
M.R., Anala, Makker, Malika, Ashok, Aakanksha.  2019.  Anomaly Detection in Surveillance Videos. 2019 26th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW). :93–98.
Every public or private area today is preferred to be under surveillance to ensure high levels of security. Since the surveillance happens round the clock, data gathered as a result is huge and requires a lot of manual work to go through every second of the recorded videos. This paper presents a system which can detect anomalous behaviors and alarm the user on the type of anomalous behavior. Since there are a myriad of anomalies, the classification of anomalies had to be narrowed down. There are certain anomalies which are generally seen and have a huge impact on public safety, such as explosions, road accidents, assault, shooting, etc. To narrow down the variations, this system can detect explosion, road accidents, shooting, and fighting and even output the frame of their occurrence. The model has been trained with videos belonging to these classes. The dataset used is UCF Crime dataset. Learning patterns from videos requires the learning of both spatial and temporal features. Convolutional Neural Networks (CNN) extract spatial features and Long Short-Term Memory (LSTM) networks learn the sequences. The classification, using an CNN-LSTM model achieves an accuracy of 85%.
Nalamati, Mrunalini, Kapoor, Ankit, Saqib, Muhammed, Sharma, Nabin, Blumenstein, Michael.  2019.  Drone Detection in Long-Range Surveillance Videos. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). :1–6.

The usage of small drones/UAVs has significantly increased recently. Consequently, there is a rising potential of small drones being misused for illegal activities such as terrorism, smuggling of drugs, etc. posing high-security risks. Hence, tracking and surveillance of drones are essential to prevent security breaches. The similarity in the appearance of small drone and birds in complex background makes it challenging to detect drones in surveillance videos. This paper addresses the challenge of detecting small drones in surveillance videos using popular and advanced deep learning-based object detection methods. Different CNN-based architectures such as ResNet-101 and Inception with Faster-RCNN, as well as Single Shot Detector (SSD) model was used for experiments. Due to sparse data available for experiments, pre-trained models were used while training the CNNs using transfer learning. Best results were obtained from experiments using Faster-RCNN with the base architecture of ResNet-101. Experimental analysis on different CNN architectures is presented in the paper, along with the visual analysis of the test dataset.

2019-09-04
Liang, J., Jiang, L., Cao, L., Li, L., Hauptmann, A..  2018.  Focal Visual-Text Attention for Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. :6135–6143.
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.
2019-08-12
Benzer, R., Yildiz, M. C..  2018.  YOLO Approach in Digital Object Definition in Military Systems. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT). :35–37.

Today, as surveillance systems are widely used for indoor and outdoor monitoring applications, there is a growing interest in real-time generation detection and there are many different applications for real-time generation detection and analysis. Two-dimensional videos; It is used in multimedia content-based indexing, information acquisition, visual surveillance and distributed cross-camera surveillance systems, human tracking, traffic monitoring and similar applications. It is of great importance for the development of systems for national security by following a moving target within the scope of military applications. In this research, a more efficient solution is proposed in addition to the existing methods. Therefore, we present YOLO, a new approach to object detection for military applications.

2019-01-16
Varshney, G., Bagade, S., Sinha, S..  2018.  Malicious browser extensions: A growing threat: A case study on Google Chrome: Ongoing work in progress. 2018 International Conference on Information Networking (ICOIN). :188–193.

Browser extensions are a way through which third party developers provide a set of additional functionalities on top of the traditional functionalities provided by a browser. It has been identified that the browser extension platform can be used by hackers to carry out attacks of sophisticated kinds. These attacks include phishing, spying, DDoS, email spamming, affiliate fraud, mal-advertising, payment frauds etc. In this paper, we showcase the vulnerability of the current browsers to these attacks by taking Google Chrome as the case study as it is a popular browser. The paper also discusses the technical reason which makes it possible for the attackers to launch such attacks via browser extensions. A set of suggestions and solutions that can thwart the attack possibilities has been discussed.

2018-11-19
Gupta, A., Johnson, J., Alahi, A., Fei-Fei, L..  2017.  Characterizing and Improving Stability in Neural Style Transfer. 2017 IEEE International Conference on Computer Vision (ICCV). :4087–4096.

Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods. However, real-time methods are highly unstable resulting in visible flickering when applied to videos. In this work we characterize the instability of these methods by examining the solution set of the style transfer objective. We show that the trace of the Gram matrix representing style is inversely related to the stability of the method. Then, we present a recurrent convolutional network for real-time video style transfer which incorporates a temporal consistency loss and overcomes the instability of prior methods. Our networks can be applied at any resolution, do not require optical flow at test time, and produce high quality, temporally consistent stylized videos in real-time.

Huang, H., Wang, H., Luo, W., Ma, L., Jiang, W., Zhu, X., Li, Z., Liu, W..  2017.  Real-Time Neural Style Transfer for Videos. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). :7044–7052.

Recent research endeavors have shown the potential of using feed-forward convolutional neural networks to accomplish fast style transfer for images. In this work, we take one step further to explore the possibility of exploiting a feed-forward network to perform style transfer for videos and simultaneously maintain temporal consistency among stylized video frames. Our feed-forward network is trained by enforcing the outputs of consecutive frames to be both well stylized and temporally consistent. More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames. To calculate the temporal loss during the training stage, a novel two-frame synergic training mechanism is proposed. Compared with directly applying an existing image style transfer method to videos, our proposed method employs the trained network to yield temporally consistent stylized videos which are much more visually pleasant. In contrast to the prior video style transfer method which relies on time-consuming optimization on the fly, our method runs in real time while generating competitive visual results.

2018-02-15
Yonetani, R., Boddeti, V. N., Kitani, K. M., Sato, Y..  2017.  Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption. 2017 IEEE International Conference on Computer Vision (ICCV). :2059–2069.

We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data. This framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. We utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. To overcome the high computational cost of homomorphic encryption of high-dimensional classifiers, we (1) impose sparsity constraints on local classifier updates and (2) propose a novel efficient encryption scheme named doublypermuted homomorphic encryption (DPHE) which is tailored to sparse high-dimensional data. DPHE (i) decomposes sparse data into its constituent non-zero values and their corresponding support indices, (ii) applies homomorphic encryption only to the non-zero values, and (iii) employs double permutations on the support indices to make them secret. Our experimental evaluation on several public datasets shows that the proposed approach achieves comparable performance against state-of-the-art visual recognition methods while preserving privacy and significantly outperforms other privacy-preserving methods.

2018-01-23
McDuff, D., Soleymani, M..  2017.  Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions. 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017). :339–345.

We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people.

2017-03-08
Paone, J., Bolme, D., Ferrell, R., Aykac, D., Karnowski, T..  2015.  Baseline face detection, head pose estimation, and coarse direction detection for facial data in the SHRP2 naturalistic driving study. 2015 IEEE Intelligent Vehicles Symposium (IV). :174–179.

Keeping a driver focused on the road is one of the most critical steps in insuring the safe operation of a vehicle. The Strategic Highway Research Program 2 (SHRP2) has over 3,100 recorded videos of volunteer drivers during a period of 2 years. This extensive naturalistic driving study (NDS) contains over one million hours of video and associated data that could aid safety researchers in understanding where the driver's attention is focused. Manual analysis of this data is infeasible; therefore efforts are underway to develop automated feature extraction algorithms to process and characterize the data. The real-world nature, volume, and acquisition conditions are unmatched in the transportation community, but there are also challenges because the data has relatively low resolution, high compression rates, and differing illumination conditions. A smaller dataset, the head pose validation study, is available which used the same recording equipment as SHRP2 but is more easily accessible with less privacy constraints. In this work we report initial head pose accuracy using commercial and open source face pose estimation algorithms on the head pose validation data set.

2015-05-04
Dirik, A.E., Sencar, H.T., Memon, N..  2014.  Analysis of Seam-Carving-Based Anonymization of Images Against PRNU Noise Pattern-Based Source Attribution. Information Forensics and Security, IEEE Transactions on. 9:2277-2290.

The availability of sophisticated source attribution techniques raises new concerns about privacy and anonymity of photographers, activists, and human right defenders who need to stay anonymous while spreading their images and videos. Recently, the use of seam-carving, a content-aware resizing method, has been proposed to anonymize the source camera of images against the well-known photoresponse nonuniformity (PRNU)-based source attribution technique. In this paper, we provide an analysis of the seam-carving-based source camera anonymization method by determining the limits of its performance introducing two adversarial models. Our analysis shows that the effectiveness of the deanonymization attacks depend on various factors that include the parameters of the seam-carving method, strength of the PRNU noise pattern of the camera, and an adversary's ability to identify uncarved image blocks in a seam-carved image. Our results show that, for the general case, there should not be many uncarved blocks larger than the size of 50×50 pixels for successful anonymization of the source camera.

2015-05-01
Harish, P., Subhashini, R., Priya, K..  2014.  Intruder detection by extracting semantic content from surveillance videos. Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. :1-5.

Many surveillance cameras are using everywhere, the videos or images captured by these cameras are still dumped but they are not processed. Many methods are proposed for tracking and detecting the objects in the videos but we need the meaningful content called semantic content from these videos. Detecting Human activity recognition is quite complex. The proposed method called Semantic Content Extraction (SCE) from videos is used to identify the objects and the events present in the video. This model provides useful methodology for intruder detecting systems which provides the behavior and the activities performed by the intruder. Construction of ontology enhances the spatial and temporal relations between the objects or features extracted. Thus proposed system provides a best way for detecting the intruders, thieves and malpractices happening around us.