7 Steps to Understanding Deep Learning

Deep learning is a branch of machine learning, employing numerous similar, yet distinct, deep neural network architectures to solve various problems in natural language processing, computer vision, and bioinformatics, among other fields. Deep learning has experienced a tremendous recent research resurgence, and has been shown to deliver state of the art results in numerous applications.

In essence, deep learning is the implementation of neural networks with more than a single hidden layer of neurons. This is, however, a very simplistic view of deep learning, and not one that is unanimously agreed upon. These “deep” architectures also vary quite considerably, with different implementations being optimized for different tasks or goals. The vast research being produced at such a constant rate is revealing new and innovative deep learning models at an ever-increasing pace.

Neural Network

Currently a white hot research topic, deep learning seems to be impacting all areas of machine learning and, by extension, data science. A look over recent papers in the relevant arXiv categories makes it easy to see that a large amount of what is being published is deep learning-related. Given the impressive results being produced, many researchers, practitioners, and laypeople alike are wondering if deep learning is the edge of “true” artificial intelligence.

This collection of reading materials and tutorials aims to provide a path for a deep neural networks newcomer to gain some understanding of this vast and complex topic. Though I do not assume any real understanding of neural networks or deep learning, I will assume your familiarity with general machine learning theory and practice to some degree. To overcome any deficiency you may have in the general areas of machine learning theory or practice you can consult the recent KDnuggets post 7 Steps to Mastering Machine Learning With Python. Since we will also see examples implemented in Python, some familiarity with the language will be useful. Introductory and review resources are also available in the previously mentioned post.

This post will utilize freely-available materials from around the web in a cohesive order to first gain some understanding of deep neural networks at a theoretical level, and then move on to some practical implementations. As such, credit for the materials referenced lie solely with the creators, who will be noted alongside the resources. If you see that someone has not been properly credited for their work, please alert me to the oversight so that I may swiftly rectify it.

A stark and honest disclaimer: deep learning is a complex and quickly-evolving field of both breadth and depth (pun unintended?), and as such this post does not claim to be an all-inclusive manual to becoming a deep learning expert; such a transformation would take greater time, many additional resources, and lots of practice building and testing models. I do, however, believe that utilizing the resources herein could help get you started on just such a path.

Step 1: Introducing Deep Learning

If you are reading this and interested in the topic, then you are probably already familiar with what deep neural networks are, if even at a basic level. Neural networks have a storied history, but we won’t be getting into that. We do, however, want a common high level of understanding to begin with.

First, have a look at the fantastic introductory videos from DeepLearning.tv. At the time of this writing there are 14 videos; watch them all if you like, but definitely watch the first 5, covering the basics of neural nets and some of the more common architectures.

Next, read over the NIPS 2015 Deep Learning Tutorial by Geoff Hinton, Yoshua Bengio, and Yann LeCun for an introduction at a slightly lower level.

To round out our first step, read the first chapter of Neural Networks and Deep Learning, the fantastic, evolving online book by Michael Nielsen, which goes a step further but still keeps things fairly light.

Step 2: Getting Technical

Deep neural nets rely on a mathematical foundation of algebra and calculus. While this post will not produce any theoretical mathematicians, gaining some understanding of the basics before moving on would be helpful.

First, watch Andrew Ng’s linear algebra review videos. While not absolutely necessary, for those finding they want something deeper on this subject, consult the Linear Algebra Review and Reference from Ng’s Stanford course, written by Zico Kolter and Chuong Do.

Then look at this Introduction to the Derivative of a Function video by Professor Leonard. The video is succinct, the examples are clear, and it provides some understanding of what is actually going on during backpropagation from a mathematical standpoint. More on that soon.

Next have a quick read over the Wikipedia entry for the Sigmoid function, a bounded differentiable function often employed by individual neurons in a neural network.

Finally, take a break from the maths and read this Deep Learning Tutorial by Google research scientist Quoc Le.

Gradient Descent

Step 3: Backpropagation and Gradient Descent

An important part of neural networks, including modern deep architectures, is the backward propagation of errors through a network in order to update the weights used by neurons closer to the input. This is, quite bluntly, from where neural networks derive their “power,” for lack of better term. Backpropagation for short (or even “backprop”), is paired with an optimization method which acts to minimize the weights that are subsequently distributed (via backpropagation), in order to minimize the loss function. A common optimization method in deep neural networks is gradient descent.

First, read these introductory notes on gradient descent by Marc Toussaint of the University of Stuttgart.

Next, have a look at this step by step example of backpropagation in action written by Matt Mazur.

Moving on, read Jeremy Kun‘s informative blog post on coding backpropagation in Python. Having a look over the complete code is also suggested, as is attempting to replicate it yourself.

Finally, read the second part of the Deep Learning Tutorial by Quoc Le, in order to get introduced to some specific common deep architectures and their uses.

Step 4: Getting Practical

Deep Learning Montage The specific neural network architectures that will be introduced in the following steps will include practical implementations using some of the most popular Python deep learning libraries present in research today. Since different libraries are, in some cases, optimized for particular neural network architectures, and have established footholds in certain fields of research, we will be making use of 3 separate deep learning libraries. This is not redundant; keeping up with the latest libraries for particular areas of practice is a critical part of learning. The following exercises will also allow you to evaluate different libraries for yourself, and form an intuition as to which to use for which problems.

At this point you are welcome to choose any library or combination of libraries to install, and move forward implementing those tutorials which pertain to your choice. If you are looking to try one library and use it to implement one of each of the following steps’ tutorials, I would recommend TensorFlow, for a few reasons. I will mention the most relevant (at least, in my view): it performs auto-differentiation, meaning that you (or, rather, the tutorial) does not have to worry about implementing backpropagation from scratch, likely making code easier to follow (especially for a newcomer).

I wrote about TensorFlow when it first came out in the post TensorFlow Disappoints – Google Deep Learning Falls Shallow, the title of which suggests that I had more disappointment with it than I actually did; I was primarily focused on its lack of GPU cluster-enabled network training (which is likely soon on its way). Anyhow, if you are interested in reading more about TensorFlow without consulting the whitepaper listed below, I would suggest reading my original article, and then following up with Zachary Lipton’s well-written piece, TensorFlow is Terrific – A Sober Take on Deep Learning Acceleration.


Google’s TensorFlow is an all-purpose machine learning library based on data flow graph representation.


Theano is actively developed by the LISA group at the University of Montreal.


Caffe is developed by the Berkeley Vision and Learning Center (BVLC) at UC Berkeley. While Theano and TensorFlow can be considered “general-purpose” deep learning libraries, Caffe, being developed by a computer vision group, is mostly thought of for just such tasks; however, it is also a fully general-purpose library for use building various deep learning architectures for different domains.

Keep in mind that these are not the only popular libraries in use today. In fact, there are many, many others to choose from, and these were selected based on the prevelance of tutorials, documentation, and acceptance among research in general.

Other deep learning library options include:

  • Keras – a high-level, minimalist Python neural network library for Theano and TensorFlow
  • Lasagne – lightweight Python library for atop Theano
  • Torch – Lua machine learning algorithm library
  • Deeplearning4j – open source, distributed deep learning library for Java and Scala
  • Chainer – a flexible, intuitive Python neural network library
  • Mocha – a deep learning framework for Julia

With libraries installed, we now move on to practical implementation.

Step 5: Convolutional Neural Nets and Computer Vision

Computer vision deals with the processing and understanding of images and its symbolic information. Most of the field’s recent breakthroughs have come from the use of deep neural networks. In particular, convolutional neural networks have played a very important role in computer vision of late.

Convolutional Neural Net

First, read this deep learning with computer vision tutorial by Yoshua Bengio, in order to gain an understanding of the topic.

Next, if you have TensorFlow installed, take a look at, and implement, this tutorial, which classifies CIFAR-10 images using a convolutional neural network.

If you have Caffe installed, as an alternative to the above tutorial (or alongside), implement a convolutional neural network in Caffe for classifying MNIST dataset images.

Here is a Theano tutorial which is roughly equivalent to the above Caffe exercise.

Afterward, read a seminal convolutional neural network paper by Krizhevsky, Sutskever, and Hinton for additional insight.

Step 6: Recurrent Nets and Language Processing

Natural language processing (NLP) is another domain which has seen benefits from deep learning. Concerned with understanding natural (human) languages, NLP has had a number of its most recent successes come by way of recurrent neural networks (RNN).

Andrej Karpathy has a fantastic blog post titled “The Unreasonable Effectiveness of Recurrent Neural Networks” which outlines the effectiveness of RNNs in training character-level language models. The code it references is written in Lua using Torch, so you can skip over that; the post is still useful on a purely conceptual level.

This tutorial implements a recurrent neural in TensorFlow for language modeling.

You can then use Theano and try your hand at this tutorial, which implements a recurrent neural network with word embeddings.

Finally, you can read Yoon Kim’s Convolutional Neural Networks for Sentence Classification for another application of CNNs in language processing. Denny Britz has a blog post titled “Implementing A CNN For Text Classification in TensorFlow,” which does just as it suggests using movie review data.

Step 7: Further Topics

The previous steps have progressed from theoretical to practical topics in deep learning. By installing and implementing convolutional neural nets and recurrent neural nets in the previous 2 steps, it is hoped that one has gained a preliminary appreciation for their power and functionality. As prevalent as CNNs and RNNs are, there are numerous other deep architectures in existence, with additional emerging from research on a regular basis.

There are also numerous other considerations for deep learning beyond those presented in the earlier theoretical steps, and as such, the following is a quick survey of some of these additional architectures and concerns.

For a further understanding of a particular type of recurrent neural network suited for time series prediction, the Long Short Term Memory Network, read this article by Christopher Olah.

This blog post by Denny Britz is a great tutorial on RNNs using LSTMs and Gated Recurrent Units (GRUs). See this paper for a further discussion of GRUs and LSTMs.

This clearly does not cover all deep learning architectures. Restrictive Boltzmann Machines are an obvious exclusion which comes to mind, as are autoencoders, and a whole series of related generative models including Generative Adversarial Networks. However, a line had to be drawn somewhere, or this post would continue ad infinitum.

For those interested in learning more about various deep learning architectures, I suggest this lengthy survey paper by Yoshua Bengio.

For our final number, and for something a bit different, have a look at A Statistical Analysis of Deep Learning by Shakir Mohamed of Google DeepMind. It is more theoretical and (surprise, statistical) than much of the other material we have encountered, but worth looking at for a different approach to familiar matter. Shakir wrote the series of articles over the course of 6 months, and is presented as testing wide-held beliefs, highlighting statistical connections, and the unseen implications of deep learning. There is a combined PDF of all posts as well.

It is hoped that enough information has been presented to give the reader an introductory overview of deep neural networks, as well as provide some incentive to move forward and learn more on the topic.

Original source: http://www.kdnuggets.com/2016/01/seven-steps-deep-learning.html/1

Deep Learning Libraries and Frameworks

At the end of 2015, all eyes were on the year’s accomplishments, as well as forecasting technology trends of 2016 and beyond. One particular field that has frequently been in the spotlight during the last year is deep learning, an increasingly popular branch of machine learning, which looks to continue to advance further and infiltrate into an increasing number of industries and sectors. Here are a list of Deep Learning libraries and frameworks that will gain momentum in 2016.
1. Theano is a python library for defining and evaluating mathematical expressions with numerical arrays. It makes it easy to write deep learning algorithms in python. On the top of the Theano many more libraries are built.
· Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for optimized tensor manipulation on GPU and CPU.
· Pylearn2 is a library that wraps a lot of models and training algorithms such as Stochastic Gradient Descent that are commonly used in Deep Learning. Its functional libraries are built on top of Theano
· Lasagne is a lightweight library to build and train neural networks in Theano. It is governed by simplicity, transparency, modularity, pragmatism , focus and restraint principles.
· Blocks a framework that helps you build neural network models on top of Theano.
2. Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Google’s DeepDream is based on Caffe Framework. This framework is a BSD-licensed C++ library with Python Interface.
3. nolearn contains a number of wrappers and abstractions around existing neural network libraries, most notably Lasagne, along with a few machine learning utility modules.
4. Gensim is deep learning toolkit implemented in python programming language intended for handling large text collections, using efficient algorithms.
5. Chainer bridge the gap between algorithms and implementations of deep learning. Its powerful, flexible and intuitive and is considered as the flexible framework for Deep Learning.
6. deepnet is a GPU-based python implementation of deep learning algorithms like Feed-forward Neural Nets, Restricted Boltzmann Machines, Deep Belief Nets, Autoencoders, Deep Boltzmann Machines and Convolutional Neural Nets.
7. Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.
8. CXXNET is fast, concise, distributed deep learning framework based on MShadow. It is a lightweight and easy extensible C++/CUDA neural network toolkit with friendly Python/Matlab interface for training and prediction.
9. DeepPy is a Pythonic deep learning framework built on top of NumPy.
10. DeepLearning is deep learning library, developed with C++ and python.
11. Neon is Nervana’s Python based Deep Learning framework.
12. ConvNet Convolutional neural net is a type of deep learning classification algorithms, that can learn useful features from raw data by themselves and is performed by tuning its weighs.
13. DeepLearnToolBox is a matlab/octave toolbox for deep learning and includes Deep Belief Nets, Stacked Autoencoders, convolutional neural nets.
14. cuda-convnet is a fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the backpropagation algorithm.
15. MatConvNet is a MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications. It is simple, efficient, and can run and learn state-of-the-art CNNs
16. eblearn is an open-source C++ library of machine learning by New York University’s machine learning lab, led by Yann LeCun. In particular, implementations of convolutional neural networks with energy-based models along with a GUI, demos and tutorials.
17. SINGA is designed to be general to implement the distributed training algorithms of existing systems. It is supported by Apache Software Foundation.
18. NVIDIA DIGITS is a new system for developing, training and visualizing deep neural networks. It puts the power of deep learning into an intuitive browser-based interface, so that data scientists and researchers can quickly design the best DNN for their data using real-time network behavior visualization.
19. Intel® Deep Learning Framework provides a unified framework for Intel® platforms accelerating Deep Convolutional Neural Networks.
20. N-Dimensional Arrays for Java (ND4J)is scientific computing libraries for the JVM. They are meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.
21. Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. It is designed to be used in business environments, rather than as a research tool.
22. Encog is an advanced machine learning framework which supports Support Vector Machines,Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported.
23. Convnet.js is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in a browser. No software requirements, no compilers, no installations, no GPUs, no sweat.
24. Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, fast scripting language, LuaJIT, and an underlying C/CUDA implementation. Torch is based on Lua programming language.
25. Mocha is a Deep Learning framework for Julia, inspired by the C++ framework Caffe. Efficient implementations of general stochastic gradient solvers and common layers in Mocha could be used to train deep / shallow (convolutional) neural networks, with (optional) unsupervised pre-training via (stacked) auto-encoders. Its best feature include Modular architecture, High-level Interface, portability with speed, compatibility and many more.
26. Lush(Lisp Universal Shell) is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large-scale numerical and graphic applications. It comes with rich set of deep learning libraries as a part of machine learning libraries.
27. DNNGraph is a deep neural network model generation DSL in Haskell.
28. Accord.NET is a .NET machine learning framework combined with audio and image processing libraries completely written in C#. It is a complete framework for building production-grade computer vision, computer audition, signal processing and statistics applications
29. darch package can be used for generating neural networks with many layers (deep architectures). Training methods includes a pre training with the contrastive divergence method and a fine tuning with common known training algorithms like backpropagation or conjugate gradient.
30. deepnet implements some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
Upcoming Deep Learning events in 2016:

List of Machine Learning tutorials

Following is the extensive list of machine learning tutorials. I will keep updating it.


Interview Resources

Artificial Intelligence

Genetic Algorithms


Useful Blogs

Resources on Quora

Kaggle Competitions WriteUp

Cheat Sheets


Linear Regression

Logistic Regression

Model Validation using Resampling

Deep Learning

Natural Language Processing

Computer Vision

Support Vector Machine

Reinforcement Learning

Decision Trees

Random Forest / Bagging



Stacking Models

Vapnik–Chervonenkis Dimension

Bayesian Machine Learning

Semi Supervised Learning


How can I be as great as Bill Gates, Steve Jobs, Elon Musk, Richard Branson?

Answer by Justine Musk:

Extreme success results from an extreme personality and comes at the cost of many other things. Extreme success is different from what I suppose you could just consider 'success', so know that you don't have to be Richard or Elon to be affluent and accomplished and maintain a great lifestyle. Your odds of happiness are better that way. But if you're extreme, you must be what you are, which means that happiness is more or less beside the point. These people tend to be freaks and misfits who were forced to experience the world in an unusually challenging way. They developed strategies to survive, and as they grow older they find ways to apply these strategies to other things, and create for themselves a distinct and powerful advantage. They don't think the way other people think. They see things from angles that unlock new ideas and insights. Other people consider them to be somewhat insane.

Be obsessed.

Be obsessed.

Be obsessed.

If you're not obsessed, then stop what you're doing and find whatever does obsess you. It helps to have an ego, but you must be in service to something bigger if you are to inspire the people you need to help you  (and make no mistake, you will need them). That 'something bigger' prevents you from going off into the ether when people flock round you and tell you how fabulous you are when you aren't and how great your stuff is when it isn't. Don't pursue something because you "want to be great". Pursue something because it fascinates you, because the pursuit itself engages and compels you. Extreme people combine brilliance and talent with an *insane* work ethic, so if the work itself doesn't drive you, you will burn out or fall by the wayside or your extreme competitors will crush you and make you cry.

Follow your obsessions until a problem starts to emerge, a big meaty challenging problem that impacts as many people as possible, that you feel hellbent to solve or die trying. It might take years to find that problem, because you have to explore different bodies of knowledge, collect the dots and then connect and complete them.

It helps to have superhuman energy and stamina. If you are not blessed with godlike genetics, then make it a point to get into the best shape possible. There will be jet lag, mental fatigue, bouts of hard partying, loneliness, pointless meetings, major setbacks, family drama, issues with the Significant Other you rarely see, dark nights of the soul, people who bore and annoy you, little sleep, less sleep than that. Keep your body sharp to keep your mind sharp. It pays off.

Learn to handle a level of stress that would break most people.

Don't follow a pre-existing path, and don't look to imitate your role models. There is no "next step". Extreme success is not like other kinds of success; what has worked for someone else, probably won't work for you. They are individuals with bold points of view who exploit their very particular set of unique and particular strengths. They are unconventional, and one reason they become the entrepreneurs they become is because they can't or don't or won't fit into the structures and routines of corporate life. They are dyslexic, they are autistic, they have ADD, they are square pegs in round holes, they piss people off, get into arguments, rock the boat, laugh in the face of paperwork. But they transform weaknesses in ways that create added advantage — the strategies I mentioned earlier — and seek partnerships with people who excel in the areas where they have no talent whatsoever.

They do not fear failure — or they do, but they move ahead anyway. They will experience heroic, spectacular, humiliating, very public failure but find a way to reframe until it isn't failure at all. When they fail in ways that other people won't, they learn things that other people don't and never will. They have incredible grit and resilience.

They are unlikely to be reading stuff like this. (This is *not* to slam or criticize people who do; I love to read this stuff myself.) They are more likely to go straight to a book: perhaps a biography of Alexander the Great or Catherine the Great* or someone else they consider Great. Surfing the 'Net is a deadly timesuck, and given what they know their time is worth — even back in the day when technically it was not worth that — they can't afford it.

I could go on, it's a fascinating subject, but you get the idea. I wish you luck and strength and perhaps a stiff drink should you need it.

* One person in the comments section appears not to know who Catherine the Great is, suggesting that this is "an utter lie" of mine + "feminist stupidity". But Catherine's ability to rise, and strategize around discrimination, holds interesting lessons for anyone.

How can I be as great as Bill Gates, Steve Jobs, Elon Musk, Richard Branson?

How do I begin analyzing data using Python?

Answer by William Chen:

Check out Harvard's free data science course.

The homeworks (with solutions) walk you through a number of data analysis, mining, scraping, manipulation problems with Python and iPython notebook!

Check out Coursera's free data science course

Link: Coursera

To specifically play with data science and python, check out their Twitter Sentiment Analysis in Python assignment.

Check out my more comprehensive answer at William Chen's answer to How do I become a data scientist?

I curate material on learning data science at Learn Data Science. Follow that blog or my blog at Storytelling with Statistics to get updated of new content!

How do I begin analyzing data using Python?

When companies such as Facebook, Google, YouTube,Twitter and Quora started off, were the founders aware from day one that they may change…

Answer by Dustin Moskovitz:

Of course not. It took at least 6 or 7.

Seriously though, Facebook was such a phenomenon *right away* at Harvard that 80% of the students were using it within the first week. It was very easy to see that there was nothing particularly special about Harvard that meant real-identity social networking would be successful there and not other places. We weren't completely confident that we would be the ones to replicate the model, but we were absolutely certain that some product like Facebook would become extremely popular and change the world in all the ways it has.

When companies such as Facebook, Google, YouTube,Twitter and Quora started off, were the founders aware from day one that they may change…