The model employs State Space Duality (SSD) from Mamba-2 to achieve both training efficiency and inference efficiency. During the inference phase, it utilizes a step-by-step inference approach that reduces memory consumption to just 4MB and decreases CPU operation latency to 10ms for the first time. This breakthrough enables local inference within web browsers without requiring GPUs.
Half-life of the state of temporal normalization1.000