Large Multimodal Model (LMM) with Gemini
TL;DR Google’s Gemini can use text, image, video, audio as input. This makes lots of new applications possible or easier to implement or more powerful. In this post, I will introduce the Large Multimodal Model (LMM) with Gemini. Introduction We already seen lots of amazing applications of Large Language Model (LLM) (e.g. ChatGPT, GPT-3). But most of them only use text as input. Some advanced LLM models, like ChatGPT 4o, can use text and image as input. but Gemini can use text, image, video, audio as input. These new features give us lots of new possibilities. ...