We often think of artificial intelligence as a field bread of mathematicians and computer scientists. And while this has been true for many years, it is changing rapidly.
I come from a place of personal experience here. I am not a mathematician nor a computer scientist and I have been experimenting with artificial intelligence for the past couple of years. [With the help of a few talented colleagues] I intend to publish a paper leveraging AI to solve problems in my own field of organizational psychology sometime this year.
This is all thanks to the hard work of those mathematicians and computer scientists leading to the advent of pre-trained models. GPT-3 for example [which literally stands for generative pre-trained transformer 3], is a language model that anyone can begin using for research and experimentation right out of the box. Because it is capable of a wide array of natural language processing [NLP] tasks, its applications stretch across disciplines. Since its release, we have seen a slew of new AI tools become available. Most notably OpenAI Codex and Github Copilot, which [in my view] are the future of programming.
The beauty of what OpenAI has done is that you can simply instruct GPT-3 on what to do with some text, and it should understand what you want [given enough clarity–which can sometimes be an experiment in itself].
While there are multiple ways to go about this, the most optimal seems to be giving GPT-3 a few examples of what you want it to do and then having it recognize and repeat the pattern. This approach is called few-shot learning, and it is what GPT-3 has been designed to do straight out of the box.
GPT-3 is far from perfect, however. As realistic[?] a conversation you may have with the AI, we are still a ways away from artificial general intelligence [AGI]. Though some today may argue that any sufficiently advanced AGI would have some form of a GPT model layered into its neural network to be able to communicate through natural language.
The problem with GPT-3 today is that, as good as it might be out of the box, for most problems, it still needs some fine-tuning. And from here one quickly realizes how easily AI problems boil down to data problems.
To give you an example, tl;dr papers recently launched as a GPT-3 powered research paper summary tool. While the summaries promoted on Twitter seem useful, if you plug in a paper yourself, you may find that the output is not quite as helpful. Many will criticize the application for this, but it's really no fault of its own. This is a data problem. And currently, there is little to no data available to help GPT-3 do this more effectively.
I know this because I've built this very thing myself but gave up before ever launching a product. As cool as it would be to get this right, I just didn't have the patience to collect enough research paper summaries to use in a training set for fine-tuning GPT-3 [without scraping someone else work]. It's also a somewhat expensive problem. Research papers are long and usage costs can pile up quickly if one isn't mindful.
But GPT-3 is only one type of pre-trained model that recently became available to the public. VQGAN and CLIP are another set of transformer models that work together to generate images from text. Recently they've gained traction with a bit of help from the NFT community and projects like Botto. There are now many available open-source notebooks for experimenting with these models and generating images of your own. Today, anybody can be an AI artist.
Of course, art is just one application of models like VQGAN and CLIP. For example, while CLIP can take descriptive text and produce an image, it can also do the reverse: take an image and generate text to describe it.
This opens up the door to powerful new image search tools which have the potential for applications across industries. And the best part is, you don't need to develop any of these models on your own, all you need to learn is how to utilize and train them for your particular use case. And if the worst problem you encounter is needing to develop a training set from scratch, well, all you need is a little patience [or a lot in my case].
This is a huge leap from what was once only available to AI scientists to what is now available to practitioners across many disciplines with just a tad bit of technical know-how.
Why I think this is such a big deal can be summarized with the Medici effect:
Where the combination of skills and knowledge from multiple fields, cultures, or disciplines results in novel ideas, and ultimately innovations.
With the field of artificial intelligence becoming more readily available to hobbyists and multi-disciplinary specialists alike, the doors to what seems like a limitless plain of discovery have been opened.
This isn't to take away from the kinds of genius mathematicians and computer scientists who develop these models but to highlight that those are just two [arguably one] classes of people who think in a particular way. It's for that reason that diversity of thought is so important. As smart as any one person might be there is always value to be gained from a fresh perspective.
Thus, my challenge to you, readers and hobbyists: take some time to ponder your own fields. Ask yourself, what problems could be solved with AI? Then ask yourself this: what is stopping you from solving them?
The next big breakthrough is anyone's game.
Special thanks to OpenAI for all the work they are doing. Without them, AI hobbyists like myself would be without many of our favorite toys.
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners.
- GPT-3 is not free to use, but generally, costs are negligible [< a few cents per call] for most problems or experimentation. If this is a concern, consider exploring GPT-2 first.
- I've probably oversimplified how VQGAN+CLIP works here. It's important to highlight that these are separate models. CLIP is a pre-trained computer vision model that was introduced by OpenAI in their paper, Learning Transferable Visual Models From Natural Language Supervision. VQGAN is a general adversarial network that can generate novel images and was first introduced in the paper, Taming Transformers for High-resolution Image Synthesis. [At risk of oversimplifying again] they work together by taking text prompt for VQGAN to iterate on while CLIP guides the image generation by adjusting the model weights based on the degree of deviation from the initial prompt.
- Here are a couple of my favorite VQGAN+CLIP notebooks to get you started: AI Art Machine & Zoetrope 5.5
- I only discuss GPT-3, CLIP, and VQGAN here because I've used them before. There are many other open-source models available. The huggingface docs are a great place to start exploring these.