In the ever-evolving landscape of machine learning, one of the most exciting frontiers is meta-learning—the art of teaching models not just to solve tasks, but to quickly adapt to new tasks with minimal data. In this long-form blog post, we’ll explore three influential algorithms in this domain:
Along the way, we’ll revisit the key questions that arose during our discussion, unpack ablation studies, and point you to the seminal papers that introduced these methods.
Traditional supervised learning trains a model on a large, fixed dataset and then evaluates it on held-out data. But in many real-world scenarios—medical diagnosis, robotics, personalization—we need models that can adapt rapidly when faced with a new task and only a handful of labeled examples. This is the realm of few-shot learning, and meta-learning offers a principled way to achieve it.
At its heart, meta-learning asks: Can we learn a learning procedure? Instead of just optimizing a model to perform well on one task, we optimize it so that fine-tuning on new tasks is quick and effective.
Introduced by Finn et al. (2017) in “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” (arXiv:1703.03400), MAML is one of the most popular meta-learning methods. It is “model-agnostic” because it can be applied to any differentiable model—classification, regression, or even reinforcement learning.
1. Random Initialization: Start with parameters θ.
2. Sample Tasks: Draw a batch of tasks T_i ∼ p(T), each with its own small support set (K-shot) and query set.
3. Inner Loop (Adaptation):
θ'_i = θ - α ∇_θ L_T_i^train(θ)
(Few gradient steps on each task’s support set.)
4. Outer Loop (Meta-Update):
Evaluate L_T_i^val(θ'_i) on query sets.
θ = θ - β ∇_θ Σ_i L_T_i^val(θ'_i)
(Backprop through inner loop; second-order gradients.)
Inner loop: Task-specific fine-tuning (few-shot learning).
Outer loop: Meta-optimization to improve the starting point θ for all tasks.
In Section 5.4 of the MAML paper, the authors ran ablations to test:
They demonstrated on Omniglot and Mini-ImageNet that MAML consistently outperforms these alternatives, confirming the power of learning to learn.
While MAML is elegant, the need for second-order derivatives can be computationally heavy. Enter First-Order MAML (FOMAML), proposed by Nichol et al. (2018).
1. Inner Loop (same as MAML):
θ'_i = θ - α ∇_θ L^train(θ)
2. Outer Loop:
g_i = ∇_{θ'_i} L^val(θ'_i)
θ = θ - β Σ_i g_i
(Treat θ'_i as constant; no second-order gradients.)
Also introduced in Nichol et al. (2018), Reptile sidesteps meta-gradients altogether by using a vector-difference update.
Sample task T_i. Run k steps of SGD on the support set starting from θ, yielding θ'_i.
θ = θ + ε (θ'_i - θ)
No losses or gradients in the outer loop—just move θ toward θ'_i.
Meta-learning represents a powerful paradigm shift: teaching models how to learn rather than what to learn. Starting from MAML’s elegant bi-level optimization to FOMAML’s pragmatic simplifications and Reptile’s sheer simplicity, we see a spectrum of methods balancing computational cost and adaptation performance.
By understanding these algorithms, their updates, and their trade-offs, you’ll be well-equipped to tackle few-shot and continual learning challenges in your own projects. Happy meta-learning!