Tutorial: Understanding human actions with 2D and 3D sensors
Download a pdf for outline

Zicheng Liu (Microsoft Research Redmond) and Junsong Yuan (Nanyang Technological University)

Time: 9:00AM – 12:20 PM, April 22, 2013 (Monday)

Outline:

9:00AM - 10:30AM - Part 1: Understanding Human Actions in Videos
Presenter: Dr. Junsong Yuan
      1. Video-based action representation and matching
            a. spatio-temporal interest point (STIP) features
            b. other types of video feature
            c. Mutual information maximization for action matching
      2. Detection of human action and activities
            a. Spatio-temporal window search for action localization
            b. Beyond spatio-temporal window localization
            c. Propagative Hough-voting for action localization
      3. Datasets and case studies
            a. Action detection in crowded scenes
            b. Action search by example
            c. Action recognition and prediction

10:30AM-10:50AM - Break

10:50AM -12:20 PM - Part 2: Understanding Human Actions with 3-D sensors
Presenter: Dr. Zicheng Liu
      1. Introduction
            a. Activities
                i. Hand gesture, action, activity
            b. 3D sensors
                i. Laser scanners
                ii. Structure light systems
                iii. Time-of-flight cameras
            c. Depth maps
                i. Noises, holes, foreground/background occlusions
            d. Skeleton tracking and its limitations
      2. Features
            a. Skeleton-based features
                i. Joint angle trajectory
                ii. EigenJoints
                iii. SMIJ
                iv. Ho3DJoints
                v. Fourier temporal pyramid
            b. Depthmap based features
                i. HOG
                ii. Bag of 3D points
                iii. Spacetime Occupancy Patterns
                iv. DMM-HOG
                v. Local occupancy pattern
                vi. Local depth pattern
                vii. Histogram of oriented 3D normal
                viii. Histogram of 3D facets
            c. RGB + depth
      3. Hand segmentation and feature extraction
      4. Recognition paradigms
            a. Direct classification (global descriptors)
            b. Bag of feature framework (interest points + local descriptors)
            c. Actionlet ensemble
            d. Random occupancy patterns
            e. Online recognition
                i. Temporal segmentation
                ii. Action graph
      5. Datasets and experiment results
            a. MSR Action 3D dataset
            b. MSR Daily Activity dataset
            c. MSR Gesture 3D dataset
            d. RGBD-HuDaAct dataset