> <i>What does the input side of the neutral network look like? Is it enough bits to represent N tokens where N is the context size?</i><p>Not quite. The raw text converted into IDs corresponding to tokens by the tokenizer. Each token maps onto a vector, via a so-called embedding lookup (I always thought the word choice embedding was weird, but it's a standard).<p>This vector is then augmented with further information, such as positional and relational information, which happens inside the model.<p>The context is not a bitfield of tokens. It's a collection of vectors that are annotated with additional information by the model. The context size of a model is a maximum usable sequence length, it's not a fixed input array.