DraganSr: OpenAI Tokenizer

Sunday, April 07, 2024

OpenAI Tokenizer

OpenAI Platform

OpenAI's large language models (sometimes referred to as GPT's) process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

You can use the tool below to understand how a piece of text might be tokenized by a language model, and the total count of tokens in that piece of text.

OpenAI - Wikipedia

Byte pair encoding - Wikipedia

How to count tokens with Tiktoken | OpenAI Cookbook

tiktoken/README.md at main · openai/tiktoken

gpt-3-encoder - npm

Sunday, April 07, 2024

OpenAI Tokenizer

No comments: