Generative artificial intelligence: a new world full of creativity

Editor’s note: The biggest difference between man and other creatures is that he can analyze and create, that is, he has advanced thinking ability. However, in recent ten years, with the joint promotion of models, computing power and data, artificial intelligence first slowly began to be good at analytical tasks represented by various recognition (voice, images, etc.), and recently began to emerge in creating perceptual and beautiful things, which is called generative artificial intelligence. This paper analyzes and looks forward to this trend. The article comes from compilation.

Humans are good at analyzing things. But the machine is even stronger. The machine can analyze a set of data and find out the patterns that exist in it and apply to a large number of use cases, whether these use cases are fraud or spam detection, predict the ETA (estimated arrival time) of delivery, or predict what TikTok video will be shown to you next. They are getting smarter and smarter in carrying out these tasks. This is the so-called "Analytical AI" or traditional artificial intelligence.

But humans are not only good at analyzing things-we are also good at creating. We can write poems, design products, develop games and write codes. Until recently, in creative work, machines have not had the opportunity to wrestle with humans-they can only engage in analytical and rote cognitive labor. But now machines are beginning to be good at creating emotional and beautiful things. This new category is called "Generative AI", that is, machines are generating new things instead of analyzing existing things.

Generative artificial intelligence is not only becoming faster and cheaper, but also creating things even better than those made by human beings in some cases. From social media to games, from advertising to architecture, from coding to graphic design, from product design to law, from marketing to sales, every industry that needs human original work is facing remodeling. Some functions of these industries may be completely replaced by generative artificial intelligence, and other functions are more likely to flourish under the action of the creative cycle with more frequent iterations brought by human-computer collaboration-but in a wide range of terminal markets, generative artificial intelligence should release better, faster and cheaper creativity. Our dream is that generative artificial intelligence will reduce the marginal cost of creation and knowledge work to zero, thus creating extremely high labor productivity and economic value-and correspondingly huge market value.

Generative artificial intelligence involves fields-knowledge work and creative work-involving billions of workers. Generative artificial intelligence can improve the efficiency and/or creativity of these workers by at least 10%: they can not only become faster and more efficient, but also be more capable than before. Therefore, generative artificial intelligence has the potential to generate trillions of dollars in economic value.

Generative artificial intelligence has the same "why now" as broader artificial intelligence: better model, more data and more calculation. The changes of this kind of artificial intelligence are changing with each passing day, and we can’t even capture them all, but it is worthwhile to summarize its recent history so as to put the present in a suitable background to understand.

More than five years ago, small models were considered as the "most advanced" models for understanding languages. These small models are good at analyzing tasks, and are deployed in various tasks, from forecasting delivery time to fraud classification. However, their performance is not good enough for general generation tasks. It is still a daydream to generate articles or codes equivalent to human level.

Google Research published a landmark paper (Attention is All You Need), which describes a new neural network architecture for natural language understanding, called transformers, which can generate high-quality language models, and at the same time, the model has higher parallelism, and the requirements for training time are obviously reduced. These models are small sample learners and can be customized for specific fields relatively easily.

As the model becomes larger and larger, its performance begins to be equivalent to that of human beings, and then it will surpass human beings, which is inevitable.

As the model becomes bigger and bigger, its performance begins to be equivalent to that of human beings, and then it will surpass human beings, which is inevitable. From 2015 to 2020, the amount of computation used to train these models has increased by six orders of magnitude, and the results in handwriting, speech and image recognition, reading comprehension and language understanding have exceeded the human performance benchmark. Among them, GPT-3 of OpenAI stands out: compared with GPT-2, the performance of GPT-3 model has made a great leap, providing attractive demonstrations of tasks from code generation to satirical joke writing on Twitter.

Although these basic studies have made progress, these models are not universal. They are huge and difficult to run (need to coordinate the GPU), can’t be widely accessed (unavailable or limited to closed beta), and the cost of using them as cloud services is high. Despite many limitations, the earliest application of generative artificial intelligence has begun to join the competition.

With the increasing scale of AI models, their performance has begun to exceed the main human performance benchmarks.

Computing becomes cheaper. New technologies, such as diffusion models, reduce the cost of training and running reasoning. The research community continues to develop better algorithms and larger models. The developer’s access right extends from closed beta to open beta, and in some cases it is even open source.

For developers who have been unable to access LLM (Large Language Model), the gate for exploration and application development has now been opened. Applications began to blossom everywhere.

With the consolidation of the platform layer, the model continues to become better/faster/cheaper, model access tends to be free and open source, the application layer is mature, and creativity is ready to go.

Just as mobile devices release the vitality of new apps through new functions such as GPS, camera and mobile connection, we expect that these large models will stimulate a new wave of generative artificial intelligence applications. Just as the inflection point of mobile opened the market for a few killer apps ten years ago, we expect killer applications of generative artificial intelligence to appear. The competition is in progress.

Just as mobile devices release the vitality of new apps through new functions such as GPS, camera and mobile connection, we expect that these large models will stimulate a new wave of generative artificial intelligence applications.

The following diagram outlines the platform layers that will support each category and the potential application types that can be developed based on them.

  • Text is the most advanced area. However, natural language is difficult to be correct, and quality is very important. Today, these models are very good at short/medium-length writing of general themes (but even so, they are generally used for iteration or as first drafts). Over time, as the model becomes better, you should expect to see higher quality output, longer form of content and better vertical content tuning.

  • As CoPilot of GitHub shows, code generation may have a significant impact on the productivity of developers in the short term. It will also make it easier for non-developers to get creative use of code.

  • Generation is a relatively new phenomenon, but now it has spread like a virus: the generated images shared on Twitter are more interesting than words! We are seeing the emergence of image generation models with different aesthetic styles and different technologies for editing and modifying generated images.

  • Speech synthesis has been around for some time (hello Siri! ), but consumer and enterprise applications are getting better and better. For high-end applications such as movies and podcasts, the threshold for instantly generating sounds that are not so mechanized is quite high. But just like images, today’s model provides a starting point for further refinement or final output of practical applications.

  • Video and 3D models rise rapidly in this curve. Everyone is excited about the potential of these models to release large-scale creative markets such as movies, games, VR, architecture and physical product design. As we said, research institutions are releasing basic 3D and video models.

  • Other fields: from audio and music to biology and chemistry (generating protein and molecules, does anyone know? ), many fields are developing basic models.

The following figure illustrates the progress of the basic model that we may expect to see, and the timetable for relevant applications to become possible. The situation in 2025 and beyond is just a guess.

Different types of generation are the development timeline prediction of artificial intelligence applications. Orange is the first attempt, yellow is about to be realized, and green is the prime time for application.

The following are some applications that we are quite excited about. But the practical applications are far more than those listed, and the creative applications imagined by founders and developers fascinate us.

  • Copywriting: In order to promote sales and marketing strategies and provide customer support, the demand for personalized network and e-mail content is increasing, which are perfect applications of language model. The short form and stylization of wording, coupled with the time and cost pressures of these teams, should drive the demand for automation and enhanced solutions.

  • Vertical writing assistants: most writing assistants today are horizontal; We believe that there is an opportunity to develop better generation applications for specific terminal market structures, such as legal contract writing and script writing. The product differentiation direction here is to fine-tune the model and UX mode for specific workflows.

  • Code generation: The current application makes developers more powerful and improves their productivity: nearly 40% of the code in the project with GitHub Copilot installed is generated by this code assistant. But the bigger opportunity may be to let consumers gain the ability to code. Learning how to give hints may become the ultimate high-level programming language.

  • Generative art: The whole world of art history and popular culture has been coded into these large models, and anyone can explore the themes and styles that took a lifetime to master before.

  • Games: The dream of doing this line is to create controllable complex scenes or models with natural language; There may be a long way to go to reach that final state, but some more direct options are more feasible in the short term, such as texture generation and skybox art

  • Media/Advertising: Imagine the potential if you can automate the company’s work and dynamically optimize advertising copy and creativity for consumers. This is an excellent opportunity for multimodal generation, which can combine sales information with complementary visual effects.

  • Design: Prototyping digital and physical products is a labor-intensive iterative process. High fidelity rendering from rough sketches and hints has become a reality. As the 3-D model becomes available, the generative design process will extend to manufacturing and production-from text to object. Your next iPhone application or sneakers may be designed by a machine.

  • Social media and digital communities: Are there any new ways to express yourself with generating tools? As consumers learn to create in public, new applications like Midjourney are creating new social experiences.

What will a generative artificial intelligence application look like? Here are some predictions.

The application of generative artificial intelligence is developed based on large models such as GPT-3 or Stable Diffusion. As these applications get more user data, they can fine-tune the model, so as to: 1) improve the quality/performance of the model for specific problem areas; 2) Reduce the model scale/cost.

We can think of the application of generative artificial intelligence as the UI layer and the "little brain" located on the large general model "big brain".

Nowadays, the application of generative artificial intelligence mainly exists as a plug-in of the existing software ecosystem. Code completion occurs in your IDE; Image generation takes place on Figma or Photoshop; Even Discord robots are tools to inject generative artificial intelligence into digital/social communities.

There are also a few independent generative artificial intelligence web applications, such as Jasper and Copy.ai for copywriting, Runway for video editing and Mem for taking notes.

Plug-ins may be an effective wedge for the development of applications. Using plug-ins may be a smart way to overcome the "chicken or egg" problem of user data and model quality (applications need to be distributed to get enough use to improve models; But to attract users, you need a good model. We have seen this distribution strategy pay off in other market categories, such as the consumer/social field.

Today, most demonstrations of generative artificial intelligence are "one-and-done": given an input, the machine will spit out an output, and you can keep this output, or choose to discard it and try again. However, the iterative symptoms of the model are getting stronger and stronger, that is, the output can be modified, optimized, upgraded and different generation results can be derived.

Nowadays, the output of generative artificial intelligence is used as a prototype or a first draft. This kind of application is very good at putting forward many different ideas, so that the creative process can continue (for example, different options of logo or architectural design), and they are also very good at making suggestions on the first draft that needs users’ detailed processing to reach the final state (for example, blog posts or code completion automatically). Partly supported by user data, as the model becomes more and more intelligent, we should expect these drafts to get better and better until they are good enough to be used as the final product.

The best generative artificial intelligence companies can create a sustainable competitive advantage by constantly pushing the flywheel of user participation/data and model performance. In order to win, the team must make the flywheel spin: 1) achieve excellent user participation → 2) transform more user participation into better model performance (timely improvement, fine-tuning of the model, and training data selected by users as markers) → 3) use excellent model performance to promote more user growth and participation. They may enter specific problem areas (for example, code, design, games) instead of trying to become a universal product for everyone. They may first be deeply integrated into the application to take advantage of the situation and distribute it, and then try to replace the existing application with artificial intelligence native workflow. It takes time to develop these applications in the right way to accumulate users and data, but we believe that the best applications will be sustainable and have the opportunity to become large-scale.

Although generative artificial intelligence has great potential, there are still many problems to be solved in business model and technology. Important issues such as copyright, trust and security, and cost are far from being solved.

Generative artificial intelligence has a long way to go. The platform layer has just begun to get better, but the application field has hardly started.

What needs to be clear is that we don’t need a large language model to write a Tolstoy novel in order to make the best use of generative artificial intelligence. Today, these models are enough to write the first draft of blog articles and create the prototype of logo and product interface. A lot of value can be created in the short to medium term.

The first wave of generative artificial intelligence applications is similar to the mobile app environment when the iPhone just came out-a bit gimmicky, unreliable, and the competitive differentiation and business model are still unclear. However, some of these applications give us a glimpse of what may happen in the future. Once you see a machine generating complex functional codes or beautiful images, it is hard to imagine that the future machine will not play a fundamental role in our work and creative means.

If we allow ourselves to dream about the situation decades later, it is easy to imagine that generative artificial intelligence will be deeply embedded in the future of our work, creation and entertainment: memos that we can write by ourselves; 3D printing anything you can imagine; Turn words into a Pixar movie; A game experience like Roblox can quickly generate a rich world at the speed we imagined. Although these experiences look like science fiction today, they are developing at a very fast speed-within a few years, we have developed from a narrow language model to automatic code completion-if this speed of change can be continued and the Moore’s Law of large models can be followed, then these incredible scenes may enter the realm of possibility.

PS: This article was written together with GPT-3. Of course, the whole article is not generated by GPT-3, but it is responsible for countering the writer’s words, generating complete sentences and paragraphs, and brainstorming different use cases for generative artificial intelligence. Writing this article with GPT-3 can make people experience a human-computer interaction, which may form a new normal. We also illustrated this article with Midjourney. I have to say, it’s very interesting!

Translator: boxi.

Reporting/feedback