Biblio
Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help to monitor the status of facilities. The performance of the video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. The automation of the video surveillance system is still not satisfying regarding many problems such as the accuracy of the detector, bandwidth consumption, storage usage, etc. This scientific paper mainly focuses on a video surveillance system using Convolutional Neural Networks (CNN), IoT and cloud. The system contains multi nods, each node consists of a microprocessor(Raspberry Pi) and a camera, the nodes communicate with each other using client and server architecture. The nodes can detect humans using a pretraining MobileNetv2-SSDLite model and Common Objects in Context(COCO) dataset, the captured video will stream to the main node(only one node will communicate with cloud) in order to stream the video to the cloud. Also, the main node will send an SMS notification to the security team to inform the detection of humans. The security team can check the videos captured using a mobile application or web application. Operating the Object detection model of Deep learning will be required a large amount of the computational power, for instance, the Raspberry Pi with a limited in performance for that reason we used the MobileNetv2-SSDLite model.
In this paper, a merged convolution neural network (MCNN) is proposed to improve the accuracy and robustness of real-time facial expression recognition (FER). Although there are many ways to improve the performance of facial expression recognition, a revamp of the training framework and image preprocessing renders better results in applications. When the camera is capturing images at high speed, however, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of human facial expression. To solve this problem, we propose a statistical method for recognition results obtained from previous images, instead of using the current recognition output. Experimental results show that the proposed method can satisfactorily recognize seven basic facial expressions in real time.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
Many mobile apps, including augmented-reality games, bar-code readers, and document scanners, digitize information from the physical world by applying computer-vision algorithms to live camera data. However, because camera permissions for existing mobile operating systems are coarse (i.e., an app may access a camera's entire view or none of it), users are vulnerable to visual privacy leaks. An app violates visual privacy if it extracts information from camera data in unexpected ways. For example, a user might be surprised to find that an augmented-reality makeup app extracts text from the camera's view in addition to detecting faces. This paper presents results from the first large-scale study of visual privacy leaks in the wild. We build CamForensics to identify the kind of information that apps extract from camera data. Our extensive user surveys determine what kind of information users expected an app to extract. Finally, our results show that camera apps frequently defy users' expectations based on their descriptions.
A high definition(HD) wide dynamic video surveillance system is designed and implemented based on Field Programmable Gate Array(FPGA). This system is composed of three subsystems, which are video capture, video wide dynamic processing and video display subsystem. The images in the video are captured directly through the camera that is configured in a pattern have long exposure in odd frames and short exposure in even frames. The video data stream is buffered in DDR2 SDRAM to obtain two adjacent frames. Later, the image data fusion is completed by fusing the long exposure image with the short exposure image (pixel by pixel). The video image display subsystem can display the image through a HDMI interface. The system is designed on the platform of Lattice ECP3-70EA FPGA, and camera is the Panasonic MN34229 sensor. The experimental result shows that this system can expand dynamic range of the HD video with 30 frames per second and a resolution equal to 1920*1080 pixels by real-time wide dynamic range (WDR) video processing, and has a high practical value.