Лекция 9. MoE, DeepSeek, Qwen3
- 00:16GPT-4 с архитектурой MoE (Mixture of Experts)
- 03:06MoE: Sparsity
- 07:50Switch Transformer
- 13:49Mixtral 8x7B
- 18:16MoE: параметры, total и active
- 25:06Mixture of Experts: выводы
- 28:38DeepSeek
- 40:23DeepSeek: параметры
- 52:04GRPO vs PPO
- 56:08Qwen3
