Alphabet, the parent firm of Google has unveiled Gemini, its largest and most capable artificial intelligence (AI) model to date, to take on rivals such as OpenAI's GPT-4 and Meta's Llama 2. Initially teased by company CEO Sundar Pichai at the annual Google I/O developer conference in June, the AI model is now being officially rolled out to the public. The Gemini AI model will be offered in three distinct sizes: Ultra, tailored for highly intricate tasks, Pro, designed for scalability across a broad spectrum of tasks and Nano, specialised for on-device tasks.
Gemini AI May Redefine Text, Audio, Code And More
Google's Gemini AI has been made from the foundation with a "multimodal" approach, allowing it to comprehend and process various forms of information concurrently, including text, code, audio, image, and video. Presently available exclusively in English, Gemini is expected to support other languages soon. Pichai envisions the model's integration into Google's search engine, ad products, the Chrome browser, and more worldwide, heralding it as the future of Google, arriving precisely when needed.
"Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built. Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video," Demis Hassabis, CEO and Co-Founder of Google DeepMind, on behalf of Gemini team, wrote in a blog post.
Google is rolling out the AI model through various channels: Bard is now powered by Gemini Pro and Google Pixel 8 Pro users will experience new features courtesy of Gemini Nano, while Gemini Ultra is slated for release in 2024. Starting December 13, developers and enterprise customers can access Gemini Pro via Google Generative AI Studio or Vertex AI in Google Cloud.
Currently, Gemini's basic models operate with text inputs and outputs, but more advanced versions like Gemini Ultra can handle images, video and audio.
Gemini Ultra is currently available to a limited audience, comprising select customers, developers, partners, and safety and responsibility experts for initial experimentation and feedback. A wider release to developers and enterprise customers is scheduled for early next year.
According to Hassabis, it is poised to evolve beyond this, encompassing areas like action and touch, more akin to robotics functionalities. He envisions Gemini gaining additional senses over time, enhancing awareness, accuracy, and grounding in the process. While these models may still exhibit hallucinations, biases, and other issues, Hassabis asserts that their improvement correlates with their expanding knowledge of the world, leading to overall enhancement.