PSYC 80103  Human and Computer Vision

Instructors:  Professor Tony Ro (Psychology) and  Professor Zhigang Zhu (Computer Science) 

Wednesdays, 4:15 PM to 6:15 PM, Room 3308
The CUNY Graduate Center

Course Update Information 

August 30 (Wednesday), 2017. First class meet of this course. 

Course Objectives

Although seeing usually seems effortless, the ability to process visual information relies upon sophisticated biological and in silico hardware. For example, more than half of the primate brain is involved with visual processing, and it is only within the past 10 years that computers and algorithms have gotten powerful enough to “recognize” visual images with a reasonable degree of accuracy. This course will cover historical and contemporary knowledge of human and computer vision.


Given the interdisciplinary nature of the course, there are no prerequisites. However, students who have a strong foundation in neuroscience and computer science will benefit most from this course.

Course Syllabus and Tentative Schedule (mm/dd)

Week 1 (Ro) Visual neuroanatomy: from retina to primary visual cortex (V1) - 08/30

Hubel & Wiesel, 1977

Week 2 (Zhu & Ro) Depth perception, 3D, RGB-D cameras (slides) - 09/06

Finlayson, Zhang, & Golomb, 2017;  Livingstone & Hubel, 1988

Week 3 (Ro) Visual neuroanatomy: From V1 to higher cerebral cortex - 09/13

Felleman & Van Essen, 1991;

Week 4 (Ro) Attentional biasing, training - 09/27  (September 20 NO CLASS)

Corbetta & Shulman, 2011; Itti, Koch, & Niebur, 1998

Week 5 (Zhu) Deep neural networks I: From SVM to DBN  (slides)-  10/04

Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik, 1992;

Hinton, Geoffrey E., 2009; Salakhutdinov, Ruslan, and Geoffrey E. Hinton, 2009; Bengio, Yoshua, 2009;
Hinton & Salakhutdinov, 2006; Norman, Polyn, Detre, & Haxby, 2006

Week 6 (Zhu) Deep neural networks II: From Linear Regression to CNN, RNN and LSTM  (slides)- 10/11

LeCun, Yann, and Yoshua Bengio, 1995;
Krizhevsky, Sutskever, & Hinton, 2012;
Yamins & DiCarlo, 2016

Week 7 (Zhu & Ro) Motion detection and perception, optic flow (slides) - 10/18

Huk, Dougherty, & Heeger, 2002;

Week 8 (Ro & Zhu) Color perception (slides)- 10/25 

Conway, Moeller, & Tsao, 2007

Week 9 (Ro) Object recognition -11/01
Kanwisher, 2010;

Week 10 (Zhu) Multimodal classification with SVM, DBN and CNN  (slides) - 11/08

Dinerstein, Sabra, Jonathan Dinerstein, and Dan Ventura, 2007; Wang, Anran, et al., 2015;

Srivastava, Nitish, and Ruslan Salakhutdinov, 2014; Srivastava, Nitish, and Ruslan Salakhutdinov, 2012; Ngiam, Jiquan, et al., 2011;

Week 11 (Ro) Human face perception - 11/15
Grill-Spector, Weiner, Kay, & Gomez, 2017;

Week 12 (Zhu & Ro) Face and emotion recognition with deep models
(slides) - 11/22

Li, Abtahi and Zhu, 2017; Li, Abtahi, Zhu and Yin, 2017; Li, Abtahi. Tsangouri and Zhu, 2016;
Li, Abtahi, Zhu, 2015; Li, Su, Li and Zhu, 2015;

Weeks 13 (Ro & Zhu) Visual substitution devices and prosthetics
- 11/29

Week 14 (Students) Reading and Project Presentations  - 12/06

Reading List

  1. Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127.
  2. Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik. "A training algorithm for optimal margin classifiers." Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992.
  3. Conway, B. R., Moeller, S., & Tsao, D. Y. (2007). Specialized color modules in macaque extrastriate cortex. Neuron, 56(3), 560–573.
  4. Corbetta, M., & Shulman, G. L. (2011). Spatial neglect and attention networks. Annual Review of Neuroscience, 34, 569–599.
  5. Dinerstein, Sabra, Jonathan Dinerstein, and Dan Ventura. "Robust multi-modal biometric fusion via multiple SVMs." Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on. IEEE, 2007.
  6. Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex, 1(1), 1–47.
  7. Finlayson, N. J., Zhang, X., & Golomb, J. D. (2017). Differential patterns of 2D location versus depth decoding along the visual hierarchy. NeuroImage, 147, 507–516.
  8. Grill-Spector, K., Weiner, K. S., Kay, K., & Gomez, J. (2017). The Functional Neuroanatomy of Human F ace Perception. Annual Review of Vision Science.
  9. Hinton, Geoffrey E. "Deep belief networks." Scholarpedia 4.5 (2009): 5947.
  10. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science (New York, N.Y.), 313(5786), 504–507.
  11. Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture. Functional architecture of macaque monkey visual cortex. Proc R Soc Lond B Biol Sci, 198(1130), 1–59.
  12. Huk, A. C., Dougherty, R. F., & Heeger, D. J. (2002). Retinotopy and functional subdivision of human areas MT and MST. J Neurosci, 22(16), 7195–205.
  13. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254-1259.
  14. Krizhevsky, A., Sutskever, I , Hinton, GE , Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097-1105, 2012
  15. LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995).
  16. Kanwisher, N. (2010). Functional specificity in the human brain: a window into the functional architecture of the mind. Proceedings of the National Academy of Sciences of the United States of America,107(25),11163–11170.
  17.  Li, W.,  F. Abtahi, Z. Zhu. Action Unit Detection with Region Adaptation, Multi­labeling Learning and Optimal Temporal Fusing, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), July 21­26, 2017, Honolulu, Hawaii, USA
  18. Li, W., F. Abtahi, Z. Zhu, L. Yin. EAC-Net: A Region-based Deep Enhancing and Cropping Approach for Facial Action Unit Detection. The 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), May 30 –June 3, 2017 in Washington, DC
  19. Li, W.,  F. Abtahi, C. Tsangouri and Z. Zhu, Towards an "In-the-Wild" Emotion Dataset Using a Game-based Framework. In Affect in the Wild Workshop, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16) Workshops (CVPRW)
  20.  Li, W.,  F. Abtahi, Z. Zhu, A deep feature based multi-kernel learning approach for video emotion recognition, Emotion Recognition in the Wild (EmotiW) Challenge 2015, the 17th ACM International Conference on Multimodal Interaction (ICMI 2015), Seattle, USA. November 9-13th, 2015
  21. Li, W., Z. Su, M. Li, Z. Zhu. A Deep-Learning Approach to Facial Expression Recognition with Candid Images. The 14th IAPR Conference on Machine Vision Applications (MVA 2015), Tokyo, May 18-22, 2015
  22. Livingstone, M. S., & Hubel, D. H. (1988). Segregation of form, color, movement and depth:  Anatomy, physiology and perception. Science, 240, 740–749.
  23. Ngiam, Jiquan, et al. "Multimodal deep learning." Proceedings of the 28th international conference on machine learning (ICML-11). 2011.
  24. Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.
  25. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann machines." International Conference on Artificial Intelligence and Statistics. 2009.
  26. Srivastava, Nitish, and Ruslan Salakhutdinov. "Learning representations for multimodal data with deep belief nets." International Conference on Machine Learning Workshop. 2012.
  27. Srivastava, Nitish, and Ruslan R. Salakhutdinov. "Multimodal learning with deep boltzmann machines." Journal of Machine Learning Research 15 (2014) 2949-2980.
  28. Wang, Anran, et al. "Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition." Multimedia, IEEE Transactions on 17.11 (2015): 1887-1898. 
  29. Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365.

Copyright @ Tony Ro and  Zhigang Zhu , Fall 2017