Home

Welcome to AICV lab

Artificial Intelligence & Computer Vision

About us :

AICV Lab  in AI college at National Chiao Tung Uerversity is founded in 2019 and directed by Dr. Jun-Wei Hsieh. The lab dedicates to intelligent image/video processing, multimedia standards, and relevant technologies. At present, 13 full-time graduate students and 2 postdoctoral research fellows associate the research group.

Lab Activities :

< more albums >

Project Demo :

< more demos >

Lab Research :

  • Artificial Intelligence,AI
  • Deep Learning
  • Smart Drone
  • Intelligent Transport Systems, ITS
  • Reinforcement Learning, RL
  • Neural Architecture Search, NAS
  • Vehicle Make and Model Recognition
  • Kinect-Based Online Handwriting Recognition System
  • People Counting System 
  • Behavior Analysis
  • License Plate Detection and Recognition System

Recent Research :

CSPNet : A New Backbone that can Enhance Learning Capability of CNN

CVPR 2020 workshop
https://arxiv.org/abs/1911.11929

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet.

Modeling and Recognizing Action Contexts in Persons Using Sparse Representation

Journal of Visual Communication and Image Representation Volume 30, July 2015, Pages 252–265

This paper proposes a novel dynamic sparse representation-based classification scheme to treat the problem of interaction action analysis between persons using sparse representation. The occlusion problem and the difficulty to model complicated interactions are the major challenges in person-to-person action analysis. To address the occlusion problem, the proposed scheme represents an action sample in an over-complete dictionary whose base elements are the training samples themselves. This representation is naturally sparse and makes errors (caused by different environmental changes like lighting or occlusions) sparsely appear in the training library. Because of the sparsity, it is robust to occlusions and lighting changes. The difficulty of complicated action modeling can be tackled by adding more examples to the over-complete dictionary. Thus, even though the interaction relations are complicated, the proposed method still works successfully to recognize them and can be easily extended to analyze action events among multiple persons.

Human movement analysis around a view circle using time-order similarity distributions

Journal of Visual Communication and Image Representation Volume 30, July 2015, Pages 22–34

This paper presents a new behavior classification system to analyze human movements around a view circle using time-order similarity distributions. To maintain the view in-variance, an action is represented not only from its spatial domain but also its temporal domain. After that, a novel alignment scheme is proposed for aligning each action to a fixed view. With the best view, the task of behavior analysis becomes a string matching problem. One novel idea proposed in this paper is to code a posture using not only its best matched key posture but also other unmatched key postures to form various similarity distributions. Then, recognition of two actions becomes a problem of matching two time-order distributions which can be very effectively solved by comparing their KL distance via a dynamic programming scheme.

Vehicle make and model recognition using sparse representation and symmetrical SURFs

 Pattern Recognition Volume 48, Issue 6, June 2015, Pages 1979–1998

This paper presents a new symmetrical SURF descriptor to detect vehicles on roads and then proposes a novel sparsity-based classification scheme to recognize their makes and models. First, for vehicle detection, this paper proposes a symmetry transformation on SURF points to detect all possible matching pairs of symmetrical SURF points. Then, each desired ROI of vehicle can be located very accurately from the set of symmetrical matching pairs through a projection technique. The advantages of this scheme are no need of background subtraction and its extreme efficiency in real-time detection tasks. After that, two challenges in vehicle make and model recognition (MMR) should be addressed, i.e., the multiplicity and ambiguity problems. The multiplicity problem stems from one vehicle model often having different model shapes on the road. The ambiguity problem means vehicles even made from different companies often share similar shapes. To treat the two problems, a dynamic sparse representation scheme is proposed to represent a vehicle model in an over-complete dictionary whose base elements are the training samples themselves. With the dictionary, a novel Hamming distance classification scheme is proposed to classify vehicle makes and models to detailed classes. Because of the sparsity of the representation and the nature of Hamming code highly tolerant to noise, different vehicle makes and models can be recognized with high accuracy.