The Journey of Artificial Intelligence (Part II)

By Parul Shukla, Published on: 11th September 2018

“Well begun is half done”, a quote familiar to all of us. The journey of AI that started almost 70 years back has now entered an exciting phase often referred to as the ‘AI revolution’ due to advances in deep learning. What started as a simple question, “Are there imaginable digital computers which would do well in the imitation game?” has now reached a stage where machines are able to play complex games, such as Go, better than humans. We are witnessing a new dawn of technology where machines are able to generate realistic images, artwork, and converse in a human-like manner. This blog highlights some of the recent amazing ‘creations’ and applications of AI.

Speech synthesis

In this domain, there has been a lot of research into creating sounds from text, audio question-answering, voice search, and voice-activated assistants. Apple’s Siri, Google’s Assistant, Microsoft’s Cortana, Amazon’s Alexa are some of the major advancements in voice-assistants.

In another development, systems were able to add sounds to silent video clips potentially benefiting thousands of people affected by visual problems. Over 1000 video samples of different sounds were created with drumsticks striking various surfaces. A model based on deep Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN), is learned that associates a video frame with a database consisting of pre-recorded sounds. Given a video, for each frame, the sound that best matches the frame is played out.


In 2015, a computer program was able to defeat the then reigning 3-times European Champion, Mr. Fan Hui at a highly complex game of ‘Go’. The program, named ‘AlphaGo’, was able to defeat the champion by 5-0. In March 2016, history was made when AlphaGo’s 4-1 victory was watched by over 200 million people worldwide. This landmark achievement resulted in AlphaGo earning the highest certification in this field. While AlphaGo learned game playing using human amateurs and professional games, its successor, AlphaGo Zero learned to play the game of ‘Go’ by playing against itself repeatedly. AlphaGo Zero has become the strongest Go player by surpassing performances of previous versions. It is a landmark scientific development wherein a machine has not only learned to do a task traditionally considered to be a human territory but has also surpassed human performance.

By using deep reinforcement learning with CNN’s, machines are able to play Atari games in an almost human-like manner. The machines were even able to learn some tricks akin to humans. In the game of breakout, the algorithm was automatically able to learn that maximum score can be achieved by creating a tunnel. This key advancement was instrumental in the success of future reinforcement learning research in gaming. 

Machine translation

Machine translation refers to the task of automatic translation of words or phrases given in one language into another. The translation can be done from text or images. Stacked layers of Recurrent Neural Nets are used for translation from text, while in the case of images, a deep CNN is used to read and identify characters, to convert into text, which is then used with stacked RNNs for translation.

Image creation and analysis

Image understanding and analysis has been traditionally considered a complex task for machines. However, recent advancements in machine learning and deep learning, in particular, have rendered these tasks easier. Now, many applications such as colorization of black-n-white images, object recognition, image captioning and even image generation are possible. In image captioning, a sentence or a caption is generated which describes the contents of a given image. In case of semantic segmentation, all the pixels are given labels based on the object(s) to which they belong.

The success stories in this domain are largely attributed to the use of deep CNN’s and the access to faster hardware such as GPUs. For example, using CNN’s and RNNs, we can now automatically generate captions for different images. The following figures illustrate the results on image caption generation and semantic image segmentation.

Automatic Image caption generation (Andrej Karpathy et. al., 2015)
Automatic semantic segmentation of images (Image: Stanford CS231)

Deep CNN has even gone on to generate image ‘art’. The following figure presents the visualizations of the various features learned by a deep CNN.

Visualizations of features learned by CNN (Yosinski et al. , 2014)

Deep learning has been further used to generate images in an almost artistic way. Taking a content image and adding the style of another art image to it, results in a stylized image, as shown in the following figure:

Some more examples are artistic images created using deep neural nets:

These inspiring examples demonstrate wide applications of deep neural nets. However, it is in no way an exhaustive list of success stories of deep learning. Self-driving cars, robotics, analytical systems, drug discovery in the medical domain, natural language processing are some other domains that are greatly benefitting from the advancements in deep learning. In the near future, we shall see many more applications of deep learning in real life.

Leave a Reply

Your email address will not be published. Required fields are marked *