Author: ajit jaokar
2019 and 2020 have seen rapid strides in NLP
What’s driving the rapid strides in NLP and will this trend continue?
Here is a simple way to explain the rise and rise of NLP
Today, GPT-3 is displaying some amazing results. Some call it more like AGI (Artificial General Intelligence). Created by OpenAI with a large investment from Microsoft, GPT stands for Generative Pretrained Transformer
The three words offer a clue to the success and future trajectory of NLP
- Let’s start with ‘Transformer’. Introduced in 2017, the Transformer is a deep learning model designed for NLP. Like recurrent neural networks (RNNs), Transformers handle sequential data. However, unlike RNNs, due to the attention mechanism, Transformers do not require that the data be processed in a sequential manner. This allows for much more parallelization in Transformers (in comparison to RNNs). In turn, parallelization during training allows for training on larger datasets.
- This in turn has led to the second benefit of transformers i.e. the possibility of pre-trained models. This is similar to Transfer learning in CNNs and it allows you to build more complex models on top of existing models. The earliest example of this is BERT (Bidirectional Encoder Representations from Transformers). BERT itself led to other models trained in specific domains such as BioBERT: a pre-trained biomedical language representation model for biomedical text mining
- Finally, the model is Generative. GPT-3 is the best example of this. GPT-3 is a transformer based model trained on 45TB of text data and with 175 billion parameters . The generative ability of GPt-3 is magical – with everything from SQL queries to basic UI.
Conclusions
The Transformer mechanism is the main innovation driving NLP. Transformers enable new models to be built on the foundations of other models (like Transfer learning does for CNNs). As the ability to train on larger corpus grows, transformer based models like GPT will be more ‘magical’.
With contributions from Vineet Jaiswal
Image source: OpenAI