February 23, 2023

Giant AI: The Future of Natural Language Processing

Written by Arinco

A few days back, I was writing a custom data loader for a computer vision experiment using the PyTorch deep learning framework. Since it was a large dataset, I was looking for ways to optimise the data loading process.

Giving Artificial Intelligence a chance, I just typed my question into ChatGPT…

Within seconds, ChatGPT produced a complete guide to optimise my data loader and even a sample code snippet that I can get started with! It saved me at least an hour of extensive browsing time.

With this hype, I wanted to discuss the usage and underlying mechanics of ChatGPT: Large Language Models (LLMs) and the applicability of these giants in intelligent application development.

Introducing ChatGPT

ChatGPT is a conversational AI model developed by OpenAI.

It uses the GPT-3 architecture, which is based on Transformer neural networks.

It is also a large language model with about 175 billion parameters which has been trained on a huge corpus of text data (about 45TB) to generate human-like responses to text inputs. Most of the data used in training are harvested from the public internet. ChatGPT can perform a variety of language tasks such as answering questions, generating text, translating languages, and more.

ChatGPT is only a portion of a massive research. The underlying power is the ANN architecture GPT-3 which is the latest Large Language Model (LLM) available in the domain.

What are Large Language Models (LLMs)? Why are they important?

LLMs are deep learning algorithms that can recognise, summarise, translate, predict, and generate text and other content based on knowledge gained from massive datasets. As the name suggests, these language models are trained with enormous amounts of textual data using unsupervised learning. (Yes, there is no data labelling involved with this). BLOOM from Hugging Face, ESMFold from MetaAI, Gato by DeepMind, BERT from Google, MT-NLG from Nvidia & Microsoft, GPT-3 from OpenAI is some of the LLMs in the AI space.

Large language models are among the most successful applications of transformer models. They are not just for teaching machines human languages, but for understanding proteins, writing software code and much more.

Transformers : The main building block of GPT-3

Encoder decoder architecture — The encoder-decoder structure of the Transformer architecture
Taken from “Attention Is All You Need“

Transformers? Are we going to talk about Bumblebee here? no, not quite!

Transformers are a type of neural network architecture, like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNN) etc. designed for processing sequential data such as text, speech, or time-series data. They were introduced in the 2017 research paper “Attention is All You Need“. Transformers use self-attention mechanisms to process the input sequence and compute a weighted sum of the features at each position, allowing the model to efficiently process sequences of varying lengths and capture long-range dependencies. They have been successful in many natural language processing tasks such as machine translation and have become a popular choice in recent years.

For a deep learning enthusiast, this may sound familiar to the RNN architecture which is mostly used for learning sequential tasks. Unlike the RNNs, transformers can capture long-term dependencies which make them capable of complex natural language processing tasks.

GPT stands for “Generative Pre-trained Transformer.” As the name implies it is built with the blessing of transformers.

Interesting facts about GPT-3

GPT-3 is one successful innovation in the LLMs (It is not the only LLM in the world)
The GPT-3 model itself has no knowledge; its strength lies in its ability to predict the subsequent word(s) in a sequence. It is a pre-trained machine learning model. You cannot download or retrain the model since it is massive! (fine-tuning with our own data is possible).
GPT-3 is having closed-API access through OpenAI and you need an API key to access.
GPT-3 is good mostly for English language tasks.
A bit of downside: the outputs can be biased and abusive – since it is learning from the data fetched from public internet.

If you are really interested in learning the science behind GPT-3 I would recommend looking at the paper: Language Models are Few-Shot Learners

What about OpenAI – where does that come in?

The 2015-founded research organisation OpenAI created GPT-3 architecture.

GPT-3 is not the only interesting innovation from OpenAI. If you have seen AI-generated art which is created from natural language phrases as the input, it is most probably from DALL-E 2 neural network which is also from OpenAI.

OpenAI is a set of APIs (Application Programming Interface), and they can be easily adapted for developers in their intelligent application development tasks.

Check the OpenAI APIs here: https://beta.openai.com/overview

The use-cases of GPT-3

We all know ChatGPT is ground-breaking. Our focus should be exploring the approaches which we can use its underlying architecture (GPT-3) in application development.

Since the beginning of the deep neural networks, there has been a vast amount of research and innovation in the computer vision space. The neural networks like ResNet are ground-breaking and even surpass the human accuracy level in tasks like image classification with ImageNet dataset. We were getting the advantage of having pre-trained state-of-the-art networks for computer vision tasks without bothering on large training datasets.

The LLMs like GPT-3 address the gap of lack of such networks for natural language analysis tasks. Simply, it is a massive pre-trained knowledge base that can understand language.

There are many interesting use cases of GPT-3 as a language model, including but not limited to:

Dynamic chatbots for customer service, which provide more human-like interaction with users.
Intelligent document management by generating smart tagging/ paraphrasing, summarise textual documents.
Content generation for websites, new articles, educational materials etc.
Advance textual classification tasks.
Sentiment analysis.
Semantic search capabilities with natural language queries.
Text translation, keyword identification etc.
Programming code generation and code optimisation.

Since GPT-3 can be fine-tuned with a given set of training data, the possibilities are limitless given its natural language understanding capabilities. You can be creative and produce the next big idea which improves the productivity of your business.

How Azure OpenAI would benefit your business?

Azure OpenAI is a collaboration between Microsoft’s Azure cloud platform and OpenAI, aimed at providing cloud-based access to OpenAI’s innovative AI models and tools. The partnership provides a seamless platform for developers and organizations to build, deploy, and scale AI applications and services, leveraging the computing resources and technology of the Azure cloud.

Users can access the service through REST APIs, Python SDK, or the Azure OpenAI Service Studio, which is the web-based interface dedicated to OpenAI services.

In enterprise application development scenarios, using OpenAI services through Azure makes it much easier to integrate and ensure enterprise security standards.

Azure OpenAI opened for general availability very recently and I am sure there will be vast improvements in the coming weeks with the product.

Let us keep our eyes open and start innovating on ways in which we can use this superpower wisely.

If you want to know how to use the power of AI for your business, contact us at Arinco.

This article is also posted here