Posts

Large Multimodal Model (LMM) with Gemini

TL;DR Google’s Gemini can use text, image, video, audio as input. This makes lots of new applications possible or easier to implement or more powerful. In this post, I will introduce the Large Multimodal Model (LMM) with Gemini. Introduction We already seen lots of amazing applications of Large Language Model (LLM) (e.g. ChatGPT, GPT-3). But most of them only use text as input. Some advanced LLM models, like ChatGPT 4o, can use text and image as input. but Gemini can use text, image, video, audio as input. These new features give us lots of new possibilities. ...

Brief History of Tensorflow Programing Interface

TL;DR TensorFlow has gone through several major evolutions of the user programming interface since its release in 2015, and this article will provide a brief introduction to these evolutions and discuss the motivations behind them, so as to facilitate the reader’s understanding of TensorFlow’s design thinking at different stages. Notes: By user programming interface, we mean the programming interface that TensorFlow provides to users, not the internal programming interface of TensorFlow. We are discussing TensorFlow as a generalized concept, including TensorFlow and JAX, which are machine learning frameworks from Google based on the same underlying layers. TensorFlow 0.x era (2015 - 2017) Source code: https://github.com/tensorflow/tensorflow/releases/tag/0.12.1 ...

The decoding process of ChatGPT and the various parameters in it

TL;DR OpenAI’s ChatGPT provides the range and meaning of various parameters in its official documentation (https://platform.openai.com/docs/api-reference/chat/create). We will discuss ChatGPT’s generation process and how these parameters implement its generation effects. ChatGPT’s Decoding Process We assume minGPT (equivalent to GPT-2) and ChatGPT have the same decoding process: https://github.com/karpathy/minGPT/blob/master/mingpt/model.py#LL283C12-L283C12. The overall process can be summarized as the following steps: Expand the user’s request from 1 to a batch size of num_samples Perform model inference to obtain logits Perform temperature mapping: logits = logits / temperature [Optional] Perform topk processing: logits = topk_func(logits, top_k) Map logits to probabilities: probs = softmax(logits) Whether to sample: Sample: idx_next = multinomial_sample(probs, num_samples=1) Don’t sample: idx_next = topk_func(probs, k=1) Repeat the above process max_new_tokens times Decoding Parameters of ChatGPT temperature The official definition of the temperature parameter is: ...

Solution for TensorBoard embedding blocked when loading metadata

TL;DR using relative path as metadata_path to projector will cause TensorBoard cannot find metadata. The correct way is use FQPN (fully-qualified path name, aka absolute path) ...

Introduce to the implement of Whisper: the time-serial database

TL;DR This article show how Whisper work and some Linux programming tricks it used. ...