The path to ubiquitous AI | Taalas:

Taalas’ silicon Llama achieves 17K tokens/sec per user, nearly 10X faster than the current state of the art, while costing 20X less to build, and consuming 10X less power.

  1. Hot dang, that’s fast. Their live demo is so fast it seems fake.
  2. It’s still a 2.5kW server, but that’s not very far from residential! No word on cost…
  3. I wonder when models they can bake into hardware get “good enough” to stop churning. They’re crazy fast, but spinning new silicon for every new model sounds insane at the current pace of progress.
Jerry Towler @jatowler