Project and Mentoring Page

Students have produced projects during independent studies, as part of one of my courses, or as lab projects. Examples include:

I have also informally given technical mentoring for Zahn Prize winners Jeremy Neiman (but not on his winning 2013 Zahn Project), Amali Nassereddine and Teona Lazashvili which won the Zahn Prize in 2014 and Shawn Augustine's 2015 Zahn Prize winning buildonthego.

Mentoring Guidelines

Frequently students want to work with me on data science or web development projects. Before you ask for an appointment, consider the following:

  1. Your time: You will also need to commit a significant amount of time if you want to get anything but frustration out of this experience. I can promise though that you will learn a lot if you commit to it. If you can't devote a significant amount of time to a working in the lab on a project you won't be able to make much progress. Anticipate spending somewhere between 5 and 20 hrs every week on lab work. With at least 4 hrs per week sitting in the lab on a regular schedule. If you can only come in once a week for an hour between classes, you are not going learn anything useful, don't bother. You will need to be around the lab to talk to other students, especially senior students who are familiar with the work.

    Some students looking for a project often try to "sample" working in multiple research labs at the same time. They may also also try to work largely from home and come in occasionally. They may also try to do this while balancing a full load of classes. Overloading like this has so far not been successful for any of students I have seen. The pattern is wild enthusiasm for 1 month followed by gradual increasing stress followed by vanishing as classes become more demanding.

  2. My time: I am usually in my lab, NAC 7/311, Monday through Thursday; early in the morning is often the best time to catch me. Make an appointment by emailing grossberg@cs.ccny.cuny.edu

    My inbox is often full and sometimes it takes me a while to get back, so you may have to follow up a few times. If you are expecting to be able to drift in late in the afternoon, I will often be busy or gone. Plan accordingly. Again, if you decide you want to take on a project, please take a careful evaluation of your ongoing time commitments. It is a big waste of everybody's time, and disheartening for you to start and leave. Be decisive and stick to it.

  3. Experience required: I almost never have well defined tasks that you can jump into with little or no experience. I often provide toy versions of the problem, but these are essentially "homework exercises" to familiarize you with the data and tools specific to a given project. If they take more than 1-2 weeks to complete, then there is no point to completing them as they are intended solely to get you quickly up to speed. While they are instructive for learning, they have often been done numerous times in the past and are therefore not projects in and of themselves.

  4. Progress and review: Project goals will often be vague; A fundamental difference between between research projects and homework exercises is that not only is the answer not in the back of the book, but we often start out asking the wrong question. One only finds this out by getting some kind of answer (quickly) and seeing if it is in the right direction. This is why short feedback loops are critical.

    You need to be prepared to:

    • do some research
    • build something
    • show me
    • get feedback

    That loop should ideally happen on a weekly basis. If necessary for your project, you will also be working with a senior student in the lab. You are responsible for keeping me updated on your progress; this can be accomplished through emails, meetings, repository check-ins, and other methods as appropriate for the project.

Project Guidelines

  1. Prior work: For any project you work on, you will want to thoroughly research previous work on the topic. If you find that it has been done before, you must have a good reason for why you are redoing it. Often, a tiny technical reason (like implementing the project in a new language) is not a good one. You need to improve on the existing work somehow: by adding important features, making it open source, reducing the resources it uses, improving the user interface, etc.

    In searching for prior work, make sure you don't just use limited keywords you are thinking of. Try to use synonyms and more general topics. Read articles, even on wikipedia, to get some idea of the area around your project topic; this will also yield more keywords to search for.

    Don't just rely on Google. Use other search engines as well such as Bing. Also explore: vlrc and iseek. For academic papers and research:

  2. Big Tools: Don't get lost in big tools. Some students will get very involved in a learning a system, a programming language or a library and never see their way out. Be critically skeptical about completely diving into some new thing because it is new and you think it will magically solve all your problems.

    Very often it solves some problems, but the ones it solves may not be core to what you want to get done. Sometimes and old and dusty library in fortran actually does almost exactly what you need. Figure out why you need this thing, for what, and how much of it you really need. Only learn as much as you need to.

    For example, if you are doing a data science project, then you don't need to learn everything about python software development. It may not even be absolutely essential that you master classes and it is likely you won't need to know python properties. In this situation, a general programming book for the language is a poor place to start. Look at the examples related to your task list and flip back to general references to figure out what you don't understand.

  3. Avoid the "Not Invented Here" Syndrome: NIH is often a problem with stronger developers who "just want to code." They don't want to spend hours learning some third party library. They also feel they will learn more if they try to build something from the ground up. At least in this context (projects here) you don't want to write a line of code that you don't have to; in other words, be lazy in the right way.

    For example: Because somebody has built a good machine learning library, that's a whole lot of code you won't have to debug. If it is an established library, it also often has dealt with some dead ends and mistakes you don't have to go down. If it doesn't do what you need it to, or has bugs or things the authors didn't think of, then you can always extend it and learn that way. If it is open source, you can also contribute back to the project, thereby improving a tool used by many people.

Technical Resources

We mostly work with some flavor of unix (Mac OSX included) in my lab. Unix skills are not optional.

Linux

We mostly use Ubuntu in our own lab but some servers have CentOS. The CS department uses CentOS. I recommend Ubuntu as it is very popular, supported, and generally well documented. Learn Command Line the Hard Way is a good tutorial on the typical way of working with the linux terminal (the shell).

Windows

Depending on the project, you may be able to work solely on Windows. If you do not need a full unix environment, Cygwin and git bash are excellent Linux terminal emulators and Anaconda Python is the easiest way to get a working scientific Python stack on windows.

Sometimes though, you will need to run a linux operating system on windows. You can use virtualbox to emulate your hardware and then use Ubuntu as your guest os. You can also find pre-built virtual box images. Alternatively you will get better performance if you dual boot, but that may involve tweaking your hardware.

OS X

OS X is a linux (BSD) based operating system; therefore all the typical unix terminal commands are built in. You should access them using the terminal or better yet install iTerm2. Also install the homebrew package maneger and avoid "fink" or macports, as they have become problematic over the years.

Software Carpentry

The software carpentry web site has excellent highly condensed materials for learning scientific programming. If you can, go through them quickly and do the hands on exercises before you come in for a project. Make sure you go through the sections on: - The Unix Shell - Version Control with Git - Programming with Python

Data Science

What need to know will depend on the project. It will probably help if you install Anaconda Python and become familar with using Jupyter notebooks.

A good place to start with scientific Python is the Scipy Lecture Notes. You are going to want to have the basics of numpy, scipy, and matplotlib, so go through: - chapter 1; all sections - chapter 2: 2.6 "Image manipulation and processing using Numpy and Scipy." - chapter 3: most of our projects involve some: - 3.1 pandas: Statistics in Python - 3.3 scikit-image: image processing - 3.6. scikit-learn: machine learning in Python.

Remote Sensing and Climate:

Most of our work that deals with remote sensing requires putting things on a map, so you should also know how to use the basemap plotting library. Go through: - Visualizing Earthquakes - Plotting Satellite images - Plotting netcdf data

If you want to deeper dive into basemap there is the basemap tutorial For some great free courses on climate and remote sensing check out MetEd.

Medical Imaging

For medical imaging work you will need to be able to read dicom files so please go through the pydicom tutorial. If you can get it installed, MedPy is a new and promising library.

Web Development

The best introduction would be my class CSc47300 Web Site Design (Web Development).

Most of the projects use a Python based backend, usually either Django or Flask. For the front end, we have mostly been using JQuery and D3.jsfor 2d Visualization, and Three.js for 3D. You should also be comfortable with html5/css3, and twitter bootstrap or foundation framework. An at least passing familiarity with React.js would also be beneficial.