Biblio
Nowadays, video streaming over HTTP is one of the most dominant Internet applications, using adaptive video techniques. Network assisted approaches have been proposed and are being standardized in order to provide high QoE for the end-users of such applications. SAND is a recent MPEG standard where DASH Aware Network Elements (DANEs) are introduced for this purpose. As web-caches are one of the main components of the SAND architecture, the location and the connectivity of these web-caches plays an important role in the user's QoE. The nature of SAND and DANE provides a good foundation for software controlled virtualized DASH environments, and in this paper, we propose a cache location algorithm and a cache migration algorithm for virtualized SAND deployments. The optimal locations for the virtualized DANEs is determined by an SDN controller and migrates it based on gathered statistics. The performance of the resulting system shows that, when SDN and NFV technologies are leveraged in such systems, software controlled virtualized approaches can provide an increase in QoE.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.
The use of real-time video streaming is increasing day-by-day, and its security has become a serious issue now. Video encryption is a challenging task because of its large frame size. Video encryption can be done with symmetric key as well as asymmetric key encryption. Among different asymmetric key encryption technique, ECC performs better than other algorithms like RSA in terms of smaller key size and faster encryption and decryption operation. In this work, we have analyzed the performance of 18 different ECC curves and suggested some suitable curves for real-time video encryption.
Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.
In context of Industry 4.0 Augmented Reality (AR) is frequently mentioned as the upcoming interface technology for human-machine communication and collaboration. Many prototypes have already arisen in both the consumer market and in the industrial sector. According to numerous experts it will take only few years until AR will reach the maturity level to be deployed in productive applications. Especially for industrial usage it is required to assess security risks and challenges this new technology implicates. Thereby we focus on plant operators, Original Equipment Manufacturers (OEMs) and component vendors as stakeholders. Starting from several industrial AR use cases and the structure of contemporary AR applications, in this paper we identify security assets worthy of protection and derive the corresponding security goals. Afterwards we elaborate the threats industrial AR applications are exposed to and develop an edge computing architecture for future AR applications which encompasses various measures to reduce security risks for our stakeholders.
A high definition(HD) wide dynamic video surveillance system is designed and implemented based on Field Programmable Gate Array(FPGA). This system is composed of three subsystems, which are video capture, video wide dynamic processing and video display subsystem. The images in the video are captured directly through the camera that is configured in a pattern have long exposure in odd frames and short exposure in even frames. The video data stream is buffered in DDR2 SDRAM to obtain two adjacent frames. Later, the image data fusion is completed by fusing the long exposure image with the short exposure image (pixel by pixel). The video image display subsystem can display the image through a HDMI interface. The system is designed on the platform of Lattice ECP3-70EA FPGA, and camera is the Panasonic MN34229 sensor. The experimental result shows that this system can expand dynamic range of the HD video with 30 frames per second and a resolution equal to 1920*1080 pixels by real-time wide dynamic range (WDR) video processing, and has a high practical value.
In today's world, we are surrounded by variety of computer vision applications e.g. medical imaging, bio-metrics, security, surveillance and robotics. Most of these applications require real time processing of a single image or sequence of images. This real time image/video processing requires high computational power and specialized hardware architecture and can't be achieved using general purpose CPUs. In this paper, a FPGA based generic canny edge detector is introduced. Edge detection is one of the basic steps in image processing, image analysis, image pattern recognition, and computer vision. We have implemented a re-sizable canny edge detector IP on programmable logic (PL) of PYNQ-Platform. The IP is integrated with HDMI input/output blocks and can process 1080p input video stream at 60 frames per second. As mentioned the canny edge detection IP is scalable with respect to frame size i.e. depending on the input frame size, the hardware architecture can be scaled up or down by changing the template parameters. The offloading of canny edge detection from PS to PL causes the CPU usage to drop from about 100% to 0%. Moreover, hardware based edge detector runs about 14 times faster than the software based edge detector running on Cortex-A9 ARM processor.
In recent years, more and more multimedia data are generated and transmitted in various fields. So, many encryption methods for multimedia content have been put forward to satisfy various applications. However, there are still some open issues. Each encryption method has its advantages and drawbacks. Our main goal is expected to provide a solution for multimedia encryption which satisfies the target application constraints and performs metrics of the encryption algorithm. The Advanced Encryption Standard (AES) is the most popular algorithm used in symmetric key cryptography. Furthermore, chaotic encryption is a new research direction of cryptography which is characterized by high initial-value sensitivity and good randomness. In this paper we propose a hybrid video cryptosystem which combines two encryption techniques. The proposed cryptosystem realizes the video encryption through the chaos and AES in CTR mode. Experimental results and security analysis demonstrate that this cryptosystem is highly efficient and a robust system for video encryption.
In part I of a three-part series on active surveillance using depth-sensing technology, this paper proposes an algorithm to identify outdoor intrusion activities by monitoring skeletal positions from Microsoft Kinect sensor in real-time. This algorithm implements three techniques to identify a premise intrusion. The first technique observes a boundary line along the wall (or fence) of a surveilled premise for skeletal trespassing detection. The second technique observes the duration of a skeletal object within a region of a surveilled premise for loitering detection. The third technique analyzes the differences in skeletal height to identify wall climbing. Experiment results suggest that the proposed algorithm is able to detect trespassing, loitering and wall climbing at a rate of 70%, 85% and 80% respectively.
Video surveillance has been widely adopted to ensure home security in recent years. Most video encoding standards such as H.264 and MPEG-4 compress the temporal redundancy in a video stream using difference coding, which only encodes the residual image between a frame and its reference frame. Difference coding can efficiently compress a video stream, but it causes side-channel information leakage even though the video stream is encrypted, as reported in this paper. Particularly, we observe that the traffic patterns of an encrypted video stream are different when a user conducts different basic activities of daily living, which must be kept private from third parties as obliged by HIPAA regulations. We also observe that by exploiting this side-channel information leakage, attackers can readily infer a user's basic activities of daily living based on only the traffic size data of an encrypted video stream. We validate such an attack using two off-the-shelf cameras, and the results indicate that the user's basic activities of daily living can be recognized with a high accuracy.
Multimedia authentication is an integral part of multimedia signal processing in many real-time and security sensitive applications, such as video surveillance. In such applications, a full-fledged video digital rights management (DRM) mechanism is not applicable due to the real time requirement and the difficulties in incorporating complicated license/key management strategies. This paper investigates the potential of multimedia authentication from a brand new angle by employing hardware-based security primitives, such as physical unclonable functions (PUFs). We show that the hardware security approach is not only capable of accomplishing the authentication for both the hardware device and the multimedia stream but, more importantly, introduce minimum performance, resource, and power overhead. We justify our approach using a prototype PUF implementation on Xilinx FPGA boards. Our experimental results on the real hardware demonstrate the high security and low overhead in multimedia authentication obtained by using hardware security approaches.
In this paper we present a framework for Quality of Information (QoI)-aware networking. QoI quantifies how useful a piece of information is for a given query or application. Herein, we present a general QoI model, as well as a specific example instantiation that carries throughout the rest of the paper. In this model, we focus on the tradeoffs between precision and accuracy. As a motivating example, we look at traffic video analysis. We present simple algorithms for deriving various traffic metrics from video, such as vehicle count and average speed. We implement these algorithms both on a desktop workstation and less-capable mobile device. We then show how QoI-awareness enables end devices to make intelligent decisions about how to process queries and form responses, such that huge bandwidth savings are realized.
The deployment of Voice over Internet Protocol (VoIP) in place of traditional communication facilities has helped in huge reduction in operating costs, as well as enabled adoption of next generation communication services-based IP. At the same time, cyber criminals have also started intercepting environment and creating challenges for law enforcement system in any Country. At this instant, we propose a framework for the forensic analysis of the VoIP traffic over the network. This includes identifying and analyzing of network patterns of VoIP- SIP which is used for the setting up a session for the communication, and VoIP-RTP which is used for sending the data. Our network forensic investigation framework also focus on developing an efficient packet reordering and reconstruction algorithm for tracing the malicious users involved in conversation. The proposed framework is based on network forensics which can be used for content level observation of VoIP and regenerate original malicious content or session between malicious users for their prosecution in the court.
With the pretty prompt growth in Internet content, the main usage pattern of internet is shifting from traditional host-to-host model to content dissemination model. To support content distribution, content delivery networks (CDNs) gives an ad-hoc solution and some of future internet projects suggest a clean-slate design. Web applications have become one of the fundamental internet services. How to effectively support the popular browser-based web application is one of keys to success for future internet projects. This paper proposes the IDNet-based web applications. IDNet consists of id/locator separation scheme and domain-insulated autonomous network architecture (DIANA) which redesign the future internet in the clean slate basis. We design and develop an IDNet Browser based on the open source Qt. IDNet browser enables ID fetching and rendering by both `idp:/' schemes URID (Universal Resource Identifier) and `http:/' schemes URI in HTML The experiment shows that it can well be applicable to the IDNet test topology.
The main usage pattern of internet is shifting from traditional host-to-host central model to content dissemination model. It leads to the pretty prompt growth in Internet content. CDN and P2P are two mainstream techmologies to provide streaming content services in the current Internet. In recent years, some researchers have begun to focus on CDN-P2P-hybrid architecture and ISP-friendly P2P content delivery technology. Web applications have become one of the fundamental internet services. How to effectively support the popular browser-based web application is one of keys to success for future internet projects. This paper proposes ID based browser with caching in IDNet. IDNet consists of id/locator separation scheme and domain-insulated autonomous network architecture (DIANA) which redesign the future internet in the clean slate basis. Experiment shows that ID web browser with caching function can support how to disseminate content and how to find the closet network in IDNet having identical contents.
In this paper, we investigate the performance of multiple-input multiple-output aided coded interleave division multiple access (IDMA) system for secured medical image transmission through wireless communication. We realize the MIMO profile using four transmit antennas at the base station and three receive antennas at the mobile station. We achieve bandwidth efficiency using discrete wavelet transform (DWT). Further we implement Arnold's Cat Map (ACM) encryption algorithm for secured medical transmission. We consider celulas as medical image which is used to differentiate between normal cell and carcinogenic cell. In order to accommodate more users' image, we consider IDMA as accessing scheme. At the mobile station (MS), we employ non-linear minimum mean square error (MMSE) detection algorithm to alleviate the effects of unwanted multiple users image information and multi-stream interference (MSI) in the context of downlink transmission. In particular, we investigate the effects of three types of delay-spread distributions pertaining to Stanford university interim (SUI) channel models for encrypted image transmission of MIMO-IDMA system. From our computer simulation, we reveal that DWT based coded MIMO- IDMA system with ACM provides superior picture quality in the context of DL communication while offering higher spectral efficiency and security.
The enormous size of video data of natural scene and objects is a practical threat to storage, transmission. The efficient handling of video data essentially requires compression for economic utilization of storage space, access time and the available network bandwidth of the public channel. In addition, the protection of important video is of utmost importance so as to save it from malicious intervention, attack or alteration by unauthorized users. Therefore, security and privacy has become an important issue. Since from past few years, number of researchers concentrate on how to develop efficient video encryption for secure video transmission, a large number of multimedia encryption schemes have been proposed in the literature like selective encryption, complete encryption and entropy coding based encryption. Among above three kinds of algorithms, they all remain some kind of shortcomings. In this paper, we have proposed a lightweight selective encryption algorithm for video conference which is based on efficient XOR operation and symmetric hierarchical encryption, successfully overcoming the weakness of complete encryption while offering a better security. The proposed algorithm guarantees security, fastness and error tolerance without increasing the video size.
Watermarking is a recently developed technique which is currently dominating the world of security and digital processing in order to ensure the protection of digitized trade. The purpose of this work is twofold. It is firstly to establish a state of the art that goes through the existing watermarking methods and their performances. And secondly to design, implement and evaluate a new watermarking solution that aims to optimize the compromise robustness-invisibility-capacity. The proposed approach consists on applying a frequency watermarking based on singular value decomposition (SVD) and exploiting the mosaic made from all video frames as well as inserting a double signature in order to increase watermarking algorithm capacity.
Traffic from mobile wireless networks has been growing at a fast pace in recent years and is expected to surpass wired traffic very soon. Service providers face significant challenges at such scales including providing seamless mobility, efficient data delivery, security, and provisioning capacity at the wireless edge. In the Mobility First project, we have been exploring clean slate enhancements to the network protocols that can inherently provide support for at-scale mobility and trustworthiness in the Internet. An extensible data plane using pluggable compute-layer services is a key component of this architecture. We believe these extensions can be used to implement in-network services to enhance mobile end-user experience by either off-loading work and/or traffic from mobile devices, or by enabling en-route service-adaptation through context-awareness (e.g., Knowing contemporary access bandwidth). In this work we present details of the architectural support for in-network services within Mobility First, and propose protocol and service-API extensions to flexibly address these pluggable services from end-points. As a demonstrative example, we implement an in network service that does rate adaptation when delivering video streams to mobile devices that experience variable connection quality. We present details of our deployment and evaluation of the non-IP protocols along with compute-layer extensions on the GENI test bed, where we used a set of programmable nodes across 7 distributed sites to configure a Mobility First network with hosts, routers, and in-network compute services.
The emergence of new technologies, in addition with the popularization of mobile devices and wireless communication systems, demands a variety of requirements that current Internet is not able to comply adequately. In this scenario, the innovative information-centric Entity Title Architecture (ETArch), a Future Internet (FI) clean slate approach, was design to efficiently cope with the increasing demand of beyond-IP networking services. Nevertheless, despite all ETArch capabilities, it was not projected with reliable networking functions, which limits its operability in mobile multimedia networking, and will seriously restrict its scope in Future Internet scenarios. Therefore, our work extends ETArch mobility control with advanced quality-oriented mobility functions, to deploy mobility prediction, Point of Attachment (PoA) decision and handover setup meeting both session quality requirements of active session flows and current wireless quality conditions of neighbouring PoA candidates. The effectiveness of the proposed additions were confirmed through a preliminary evaluation carried out by MATLAB, in which we have considered distinct applications scenario, and showed that they were able to outperform the most relevant alternative solutions in terms of performance and quality of service.