The Other NLP

Mark Honeychurch (August 9, 2021)

Recently I've been playing with some deep learning software - OpenAI's GPT-2 and GPT-3, and EleutherAI's GPT-J-6G. These are NLP algorithms. No, not that discredited garbage Neuro-Linguistic Programming - in this case NLP stands for Natural Language Processing.

The basic idea of these recent efforts in deep learning is to take a piece of software that has been written to guess the next word in a sequence, and train it on a huge corpus of data. It turns out that the internet is a great source of natural language, and a lot of it is very easy to scrape and feed into one of these algorithms. So these pieces of code are trained on lots and lots of internet text.

This training is very processor intensive, needing thousands of hours of time on modern PCs using specialised AI chips on expensive graphics cards. However, once the algorithm has been trained, the dataset that has been created is just a few hundred megabytes in size, and can be quickly loaded into memory - the training only needs to be done once. At this point the software can be used to predict the next word in a sequence, and can keep doing this - creating whole sentences and paragraphs that actually make grammatical and logical sense. We will see below what these general NLP algorithms can do.

The same software can also be fine tuned by giving it a smaller set of data. Using its ability to put together coherent sentences, the software can emulate the data set it's been fine tuned on. So far I've been working on a couple of fun skeptical projects with this, although I have more ideas.

Anyway, it's probably easiest if I just give you some brief intros and show you the kinds of results I've been getting. Enjoy!