PSYC 80103 Human and Computer Vision
Instructors: Professor
Tony Ro (Psychology) and Professor Zhigang Zhu (Computer Science)
Wednesdays, 4:15 PM to 6:15 PM, Room
3308
The CUNY Graduate Center
Course Update
Information
August 30 (Wednesday), 2017. First class
meet of this course.
Course
Objectives
Although seeing usually seems effortless,
the ability to process visual information relies upon
sophisticated biological and in silico hardware. For example,
more than half of the primate brain is involved with visual
processing, and it is only within the past 10 years that
computers and algorithms have gotten powerful enough to
“recognize” visual images with a reasonable degree of
accuracy. This course will cover historical and contemporary
knowledge of human and computer vision.
Prerequisites
Given the interdisciplinary nature of the course, there are no
prerequisites. However, students who have a strong foundation in
neuroscience and computer science will benefit most from this
course.
Course
Syllabus and Tentative Schedule (mm/dd)
Week 1 (Ro) Visual
neuroanatomy: from retina to primary visual cortex (V1) -
08/30
Hubel & Wiesel, 1977
Week 2 (Zhu & Ro) Depth perception, 3D, RGB-D cameras (slides)
- 09/06
Finlayson, Zhang, & Golomb,
2017; Livingstone & Hubel, 1988
Week 3 (Ro) Visual neuroanatomy: From
V1 to higher cerebral cortex - 09/13
Felleman
& Van Essen, 1991;
Week 4 (Ro) Attentional
biasing, training - 09/27 (September
20 NO CLASS)
Corbetta
& Shulman, 2011; Itti, Koch, & Niebur, 1998
Week 5 (Zhu) Deep neural
networks I: From SVM to DBN (slides)-
10/04
Boser, Bernhard E.,
Isabelle M. Guyon, and Vladimir N. Vapnik, 1992;
Hinton, Geoffrey E., 2009; Salakhutdinov,
Ruslan, and Geoffrey E. Hinton, 2009; Bengio, Yoshua,
2009;
Hinton & Salakhutdinov, 2006; Norman, Polyn, Detre,
& Haxby, 2006
Week 6 (Zhu) Deep neural
networks II: From Linear Regression to CNN, RNN and LSTM
(slides)-
10/11
LeCun,
Yann, and Yoshua Bengio, 1995;
Krizhevsky, Sutskever, & Hinton, 2012;
Yamins & DiCarlo, 2016
Week 7 (Zhu & Ro) Motion detection and perception, optic
flow (slides)
- 10/18
Huk,
Dougherty, & Heeger, 2002;
Week 8 (Ro & Zhu) Color perception (slides)-
10/25
Conway,
Moeller, & Tsao, 2007
Week 9 (Ro) Object recognition -11/01
Kanwisher, 2010;
Week 10 (Zhu) Multimodal
classification with SVM, DBN and CNN (slides)
- 11/08
Dinerstein,
Sabra, Jonathan Dinerstein, and Dan Ventura, 2007; Wang,
Anran, et al., 2015;
Srivastava, Nitish, and
Ruslan Salakhutdinov, 2014; Srivastava, Nitish, and Ruslan
Salakhutdinov, 2012; Ngiam, Jiquan, et al., 2011;
Week 11 (Ro) Human face perception - 11/15
Grill-Spector,
Weiner, Kay, & Gomez, 2017;
Week 12 (Zhu & Ro) Face and emotion recognition with deep
models (slides)
- 11/22
Li,
Abtahi and Zhu, 2017; Li, Abtahi, Zhu and Yin, 2017; Li,
Abtahi. Tsangouri and Zhu, 2016;
Li, Abtahi, Zhu, 2015; Li, Su, Li and Zhu, 2015;
Weeks 13 (Ro & Zhu) Visual substitution devices and
prosthetics - 11/29
Week 14 (Students) Reading and Project Presentations -
12/06
Reading
List
- Bengio, Yoshua. "Learning deep architectures for AI."
Foundations and trends® in Machine Learning 2.1 (2009): 1-127.
- Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik.
"A training algorithm for optimal margin classifiers."
Proceedings of the fifth annual workshop on Computational
learning theory. ACM, 1992.
- Conway, B. R., Moeller, S., & Tsao, D. Y. (2007).
Specialized color modules in macaque extrastriate cortex.
Neuron, 56(3), 560–573.
https://doi.org/10.1016/j.neuron.2007.10.008
- Corbetta, M., & Shulman, G. L. (2011). Spatial neglect and
attention networks. Annual Review of Neuroscience, 34, 569–599.
https://doi.org/10.1146/annurev-neuro-061010-113731
- Dinerstein, Sabra, Jonathan Dinerstein, and Dan Ventura.
"Robust multi-modal biometric fusion via multiple SVMs."
Systems, Man and Cybernetics, 2007. ISIC. IEEE International
Conference on. IEEE, 2007.
- Felleman, D. J., & Van Essen, D. C. (1991). Distributed
hierarchical processing in the primate cerebral cortex. Cereb
Cortex, 1(1), 1–47.
- Finlayson, N. J., Zhang, X., & Golomb, J. D. (2017).
Differential patterns of 2D location versus depth decoding along
the visual hierarchy. NeuroImage, 147, 507–516.
https://doi.org/10.1016/j.neuroimage.2016.12.039
- Grill-Spector, K., Weiner, K. S., Kay, K., & Gomez, J.
(2017). The Functional Neuroanatomy of Human F ace Perception.
Annual Review of Vision Science.
https://doi.org/10.1146/annurev-vision-102016-061214
- Hinton, Geoffrey E. "Deep belief networks." Scholarpedia 4.5
(2009): 5947.
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the
dimensionality of data with neural networks. Science (New York,
N.Y.), 313(5786), 504–507.
https://doi.org/10.1126/science.1127647
- Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture.
Functional architecture of macaque monkey visual cortex. Proc R
Soc Lond B Biol Sci, 198(1130), 1–59.
- Huk, A. C., Dougherty, R. F., & Heeger, D. J. (2002).
Retinotopy and functional subdivision of human areas MT and MST.
J Neurosci, 22(16), 7195–205.
- Itti, L., Koch, C., & Niebur, E. (1998). A model of
saliency-based visual attention for rapid scene analysis. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
20(11), 1254-1259.
- Krizhevsky, A., Sutskever, I , Hinton, GE , Imagenet
classification with deep convolutional neural networks. Advances
in neural information processing systems, 1097-1105, 2012
- LeCun, Yann, and Yoshua Bengio. "Convolutional networks for
images, speech, and time series." The handbook of brain theory
and neural networks 3361.10 (1995).
- Kanwisher, N. (2010). Functional specificity in the human
brain: a window into the functional architecture of the mind.
Proceedings of the National Academy of Sciences of the United
States of America,107(25),11163–11170.
https://doi.org/10.1073/pnas.1005062107
- Li, W., F. Abtahi, Z. Zhu. Action Unit Detection
with Region Adaptation, Multilabeling Learning and Optimal
Temporal Fusing, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017), July 2126, 2017, Honolulu, Hawaii, USA
- Li, W., F. Abtahi, Z. Zhu, L. Yin. EAC-Net: A Region-based
Deep Enhancing and Cropping Approach for Facial Action Unit
Detection. The 12th IEEE International Conference on Automatic
Face and Gesture Recognition (FG 2017), May 30 –June 3, 2017 in
Washington, DC
- Li, W., F. Abtahi, C. Tsangouri and Z. Zhu, Towards an
"In-the-Wild" Emotion Dataset Using a Game-based Framework. In
Affect in the Wild Workshop, 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR’16) Workshops (CVPRW)
- Li, W., F. Abtahi, Z. Zhu, A deep feature based
multi-kernel learning approach for video emotion recognition,
Emotion Recognition in the Wild (EmotiW) Challenge 2015, the
17th ACM International Conference on Multimodal Interaction
(ICMI 2015), Seattle, USA. November 9-13th, 2015
- Li, W., Z. Su, M. Li, Z. Zhu. A Deep-Learning Approach to
Facial Expression Recognition with Candid Images. The 14th IAPR
Conference on Machine Vision Applications (MVA 2015), Tokyo, May
18-22, 2015
- Livingstone, M. S., & Hubel, D. H. (1988). Segregation of
form, color, movement and depth: Anatomy, physiology and
perception. Science, 240, 740–749.
- Ngiam, Jiquan, et al. "Multimodal deep learning." Proceedings
of the 28th international conference on machine learning
(ICML-11). 2011.
- Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V.
(2006). Beyond mind-reading: multi-voxel pattern analysis of
fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.
https://doi.org/10.1016/j.tics.2006.07.005
- Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Deep boltzmann
machines." International Conference on Artificial Intelligence
and Statistics. 2009.
- Srivastava, Nitish, and Ruslan Salakhutdinov. "Learning
representations for multimodal data with deep belief nets."
International Conference on Machine Learning Workshop. 2012.
- Srivastava, Nitish, and Ruslan R. Salakhutdinov. "Multimodal
learning with deep boltzmann machines." Journal of Machine
Learning Research 15 (2014) 2949-2980.
- Wang, Anran, et al. "Large-Margin Multi-Modal Deep Learning
for RGB-D Object Recognition." Multimedia, IEEE Transactions on
17.11 (2015): 1887-1898.
- Yamins, D. L. K., & DiCarlo, J. J. (2016). Using
goal-driven deep learning models to understand sensory cortex.
Nature Neuroscience, 19(3), 356–365.
https://doi.org/10.1038/nn.4244
Copyright @ Tony
Ro and Zhigang Zhu , Fall 2017