7 Steps to Understanding Deep Learning

Deep learning is a branch of machine learning, employing numerous similar, yet distinct, deep neural network architectures to solve various problems in natural language processing, computer vision, and bioinformatics, among other fields. Deep learning has experienced a tremendous recent research resurgence, and has been shown to deliver state of the art results in numerous applications.

In essence, deep learning is the implementation of neural networks with more than a single hidden layer of neurons. This is, however, a very simplistic view of deep learning, and not one that is unanimously agreed upon. These “deep” architectures also vary quite considerably, with different implementations being optimized for different tasks or goals. The vast research being produced at such a constant rate is revealing new and innovative deep learning models at an ever-increasing pace.

Neural Network

Currently a white hot research topic, deep learning seems to be impacting all areas of machine learning and, by extension, data science. A look over recent papers in the relevant arXiv categories makes it easy to see that a large amount of what is being published is deep learning-related. Given the impressive results being produced, many researchers, practitioners, and laypeople alike are wondering if deep learning is the edge of “true” artificial intelligence.

This collection of reading materials and tutorials aims to provide a path for a deep neural networks newcomer to gain some understanding of this vast and complex topic. Though I do not assume any real understanding of neural networks or deep learning, I will assume your familiarity with general machine learning theory and practice to some degree. To overcome any deficiency you may have in the general areas of machine learning theory or practice you can consult the recent KDnuggets post 7 Steps to Mastering Machine Learning With Python. Since we will also see examples implemented in Python, some familiarity with the language will be useful. Introductory and review resources are also available in the previously mentioned post.

This post will utilize freely-available materials from around the web in a cohesive order to first gain some understanding of deep neural networks at a theoretical level, and then move on to some practical implementations. As such, credit for the materials referenced lie solely with the creators, who will be noted alongside the resources. If you see that someone has not been properly credited for their work, please alert me to the oversight so that I may swiftly rectify it.

A stark and honest disclaimer: deep learning is a complex and quickly-evolving field of both breadth and depth (pun unintended?), and as such this post does not claim to be an all-inclusive manual to becoming a deep learning expert; such a transformation would take greater time, many additional resources, and lots of practice building and testing models. I do, however, believe that utilizing the resources herein could help get you started on just such a path.

Step 1: Introducing Deep Learning

If you are reading this and interested in the topic, then you are probably already familiar with what deep neural networks are, if even at a basic level. Neural networks have a storied history, but we won’t be getting into that. We do, however, want a common high level of understanding to begin with.

First, have a look at the fantastic introductory videos from DeepLearning.tv. At the time of this writing there are 14 videos; watch them all if you like, but definitely watch the first 5, covering the basics of neural nets and some of the more common architectures.

Next, read over the NIPS 2015 Deep Learning Tutorial by Geoff Hinton, Yoshua Bengio, and Yann LeCun for an introduction at a slightly lower level.

To round out our first step, read the first chapter of Neural Networks and Deep Learning, the fantastic, evolving online book by Michael Nielsen, which goes a step further but still keeps things fairly light.

Step 2: Getting Technical

Deep neural nets rely on a mathematical foundation of algebra and calculus. While this post will not produce any theoretical mathematicians, gaining some understanding of the basics before moving on would be helpful.

First, watch Andrew Ng’s linear algebra review videos. While not absolutely necessary, for those finding they want something deeper on this subject, consult the Linear Algebra Review and Reference from Ng’s Stanford course, written by Zico Kolter and Chuong Do.

Then look at this Introduction to the Derivative of a Function video by Professor Leonard. The video is succinct, the examples are clear, and it provides some understanding of what is actually going on during backpropagation from a mathematical standpoint. More on that soon.

Next have a quick read over the Wikipedia entry for the Sigmoid function, a bounded differentiable function often employed by individual neurons in a neural network.

Finally, take a break from the maths and read this Deep Learning Tutorial by Google research scientist Quoc Le.

Gradient Descent

Step 3: Backpropagation and Gradient Descent

An important part of neural networks, including modern deep architectures, is the backward propagation of errors through a network in order to update the weights used by neurons closer to the input. This is, quite bluntly, from where neural networks derive their “power,” for lack of better term. Backpropagation for short (or even “backprop”), is paired with an optimization method which acts to minimize the weights that are subsequently distributed (via backpropagation), in order to minimize the loss function. A common optimization method in deep neural networks is gradient descent.

First, read these introductory notes on gradient descent by Marc Toussaint of the University of Stuttgart.

Next, have a look at this step by step example of backpropagation in action written by Matt Mazur.

Moving on, read Jeremy Kun‘s informative blog post on coding backpropagation in Python. Having a look over the complete code is also suggested, as is attempting to replicate it yourself.

Finally, read the second part of the Deep Learning Tutorial by Quoc Le, in order to get introduced to some specific common deep architectures and their uses.

Step 4: Getting Practical

Deep Learning Montage The specific neural network architectures that will be introduced in the following steps will include practical implementations using some of the most popular Python deep learning libraries present in research today. Since different libraries are, in some cases, optimized for particular neural network architectures, and have established footholds in certain fields of research, we will be making use of 3 separate deep learning libraries. This is not redundant; keeping up with the latest libraries for particular areas of practice is a critical part of learning. The following exercises will also allow you to evaluate different libraries for yourself, and form an intuition as to which to use for which problems.

At this point you are welcome to choose any library or combination of libraries to install, and move forward implementing those tutorials which pertain to your choice. If you are looking to try one library and use it to implement one of each of the following steps’ tutorials, I would recommend TensorFlow, for a few reasons. I will mention the most relevant (at least, in my view): it performs auto-differentiation, meaning that you (or, rather, the tutorial) does not have to worry about implementing backpropagation from scratch, likely making code easier to follow (especially for a newcomer).

I wrote about TensorFlow when it first came out in the post TensorFlow Disappoints – Google Deep Learning Falls Shallow, the title of which suggests that I had more disappointment with it than I actually did; I was primarily focused on its lack of GPU cluster-enabled network training (which is likely soon on its way). Anyhow, if you are interested in reading more about TensorFlow without consulting the whitepaper listed below, I would suggest reading my original article, and then following up with Zachary Lipton’s well-written piece, TensorFlow is Terrific – A Sober Take on Deep Learning Acceleration.

TensorFlow

Google’s TensorFlow is an all-purpose machine learning library based on data flow graph representation.

Theano

Theano is actively developed by the LISA group at the University of Montreal.

Caffe

Caffe is developed by the Berkeley Vision and Learning Center (BVLC) at UC Berkeley. While Theano and TensorFlow can be considered “general-purpose” deep learning libraries, Caffe, being developed by a computer vision group, is mostly thought of for just such tasks; however, it is also a fully general-purpose library for use building various deep learning architectures for different domains.

Keep in mind that these are not the only popular libraries in use today. In fact, there are many, many others to choose from, and these were selected based on the prevelance of tutorials, documentation, and acceptance among research in general.

Other deep learning library options include:

  • Keras – a high-level, minimalist Python neural network library for Theano and TensorFlow
  • Lasagne – lightweight Python library for atop Theano
  • Torch – Lua machine learning algorithm library
  • Deeplearning4j – open source, distributed deep learning library for Java and Scala
  • Chainer – a flexible, intuitive Python neural network library
  • Mocha – a deep learning framework for Julia

With libraries installed, we now move on to practical implementation.

Step 5: Convolutional Neural Nets and Computer Vision

Computer vision deals with the processing and understanding of images and its symbolic information. Most of the field’s recent breakthroughs have come from the use of deep neural networks. In particular, convolutional neural networks have played a very important role in computer vision of late.

Convolutional Neural Net

First, read this deep learning with computer vision tutorial by Yoshua Bengio, in order to gain an understanding of the topic.

Next, if you have TensorFlow installed, take a look at, and implement, this tutorial, which classifies CIFAR-10 images using a convolutional neural network.

If you have Caffe installed, as an alternative to the above tutorial (or alongside), implement a convolutional neural network in Caffe for classifying MNIST dataset images.

Here is a Theano tutorial which is roughly equivalent to the above Caffe exercise.

Afterward, read a seminal convolutional neural network paper by Krizhevsky, Sutskever, and Hinton for additional insight.

Step 6: Recurrent Nets and Language Processing

Natural language processing (NLP) is another domain which has seen benefits from deep learning. Concerned with understanding natural (human) languages, NLP has had a number of its most recent successes come by way of recurrent neural networks (RNN).

Andrej Karpathy has a fantastic blog post titled “The Unreasonable Effectiveness of Recurrent Neural Networks” which outlines the effectiveness of RNNs in training character-level language models. The code it references is written in Lua using Torch, so you can skip over that; the post is still useful on a purely conceptual level.

This tutorial implements a recurrent neural in TensorFlow for language modeling.

You can then use Theano and try your hand at this tutorial, which implements a recurrent neural network with word embeddings.

Finally, you can read Yoon Kim’s Convolutional Neural Networks for Sentence Classification for another application of CNNs in language processing. Denny Britz has a blog post titled “Implementing A CNN For Text Classification in TensorFlow,” which does just as it suggests using movie review data.

Step 7: Further Topics

The previous steps have progressed from theoretical to practical topics in deep learning. By installing and implementing convolutional neural nets and recurrent neural nets in the previous 2 steps, it is hoped that one has gained a preliminary appreciation for their power and functionality. As prevalent as CNNs and RNNs are, there are numerous other deep architectures in existence, with additional emerging from research on a regular basis.

There are also numerous other considerations for deep learning beyond those presented in the earlier theoretical steps, and as such, the following is a quick survey of some of these additional architectures and concerns.

For a further understanding of a particular type of recurrent neural network suited for time series prediction, the Long Short Term Memory Network, read this article by Christopher Olah.

This blog post by Denny Britz is a great tutorial on RNNs using LSTMs and Gated Recurrent Units (GRUs). See this paper for a further discussion of GRUs and LSTMs.

This clearly does not cover all deep learning architectures. Restrictive Boltzmann Machines are an obvious exclusion which comes to mind, as are autoencoders, and a whole series of related generative models including Generative Adversarial Networks. However, a line had to be drawn somewhere, or this post would continue ad infinitum.

For those interested in learning more about various deep learning architectures, I suggest this lengthy survey paper by Yoshua Bengio.

For our final number, and for something a bit different, have a look at A Statistical Analysis of Deep Learning by Shakir Mohamed of Google DeepMind. It is more theoretical and (surprise, statistical) than much of the other material we have encountered, but worth looking at for a different approach to familiar matter. Shakir wrote the series of articles over the course of 6 months, and is presented as testing wide-held beliefs, highlighting statistical connections, and the unseen implications of deep learning. There is a combined PDF of all posts as well.

It is hoped that enough information has been presented to give the reader an introductory overview of deep neural networks, as well as provide some incentive to move forward and learn more on the topic.

Original source: http://www.kdnuggets.com/2016/01/seven-steps-deep-learning.html/1

Leave a comment