Лекция 9. MoE, DeepSeek, Qwen3

  1. 00:16GPT-4 с архитектурой MoE (Mixture of Experts)
  2. 03:06MoE: Sparsity
  3. 07:50Switch Transformer
  4. 13:49Mixtral 8x7B
  5. 18:16MoE: параметры, total и active
  6. 25:06Mixture of Experts: выводы
  7. 28:38DeepSeek
  8. 40:23DeepSeek: параметры
  9. 52:04GRPO vs PPO
  10. 56:08Qwen3