Oak Lab

The goal of Oak Lab is to discover and implement the computational principles that allow agents to learn to achieve their goals in big worlds.

Prior work

Talk
The OaK Architecture
A concise tour of the OaK agent design—how perception, model, policy, and value fit together for learning in big worlds.
Paper
The Alberta plan for AI research
How the Alberta groups frame AI: continual learning, big worlds, and building agents that keep improving over a lifetime—open to all who share the direction.
Paper
The Big World Hypothesis and its Ramifications for AI
The world outgrows any agent: prior training is never enough, continual learning matters, and cheap, reliable learning algorithms are essential.
Paper
SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning
A temporal-difference method aimed at fast, stable credit assignment when learning from long, noisy streams of experience.
Paper
Step-size optimization for continual learning
Meta-gradient methods like IDBD tune step sizes toward the true objective; common optimizers use heuristics that can drift away from better step sizes—relevant for lifelong learning in neural nets.
Paper
Reward-respecting subtasks for model-based reinforcement learning
Subtasks and options defined to respect the main reward so discovered temporal abstractions help planning with a world model, not only auxiliary or bottleneck goals.
Paper
The quest for a common model of the intelligent decision maker
A cross-discipline view of the intelligent agent: inputs, goals, and internal structure for perception, world models, and choice—naming what psychology, control, and AI already share.
Paper
Reward is enough
A position paper: maximizing reward is enough, in principle, to yield perception, language, and other hallmarks of intelligence through reinforcement learning and trial-and-error experience.
Paper
Planning with expectation models
When a value function is linear in features, planning with a learned expectation model can match planning with a full distribution model—plus sound model-based policy evaluation in stochastic settings.
Paper
On the role of tracking in stationary environments
Tracking the best policy can beat any converging algorithm even on stationary problems (e.g. the Black and White problem, computer Go), with links to metalearning and IDBD for step-size adaptation.
Paper
Scalable Real-time Recurrent Learning using Columnar-constructive Networks
Constraints on RTRL so online recurrent learning scales linearly in parameters—no extra gradient bias from truncated BPTT-style shortcuts, at the cost of some network flexibility.