The goal of Oak Lab is to discover and implement the computational principles that allow agents to learn to achieve their goals in big worlds.
Oak Lab
Prior work
-
Talk
The OaK ArchitectureA concise tour of the OaK agent design—how perception, model, policy, and value fit together for learning in big worlds.
-
Paper
The Alberta plan for AI researchHow the Alberta groups frame AI: continual learning, big worlds, and building agents that keep improving over a lifetime—open to all who share the direction.
-
Paper
The Big World Hypothesis and its Ramifications for AIThe world outgrows any agent: prior training is never enough, continual learning matters, and cheap, reliable learning algorithms are essential.
-
Paper
SwiftTD: A Fast and Robust Algorithm for Temporal Difference LearningA temporal-difference method aimed at fast, stable credit assignment when learning from long, noisy streams of experience.
-
Paper
Step-size optimization for continual learningMeta-gradient methods like IDBD tune step sizes toward the true objective; common optimizers use heuristics that can drift away from better step sizes—relevant for lifelong learning in neural nets.
-
Paper
Reward-respecting subtasks for model-based reinforcement learningSubtasks and options defined to respect the main reward so discovered temporal abstractions help planning with a world model, not only auxiliary or bottleneck goals.
-
Paper
The quest for a common model of the intelligent decision makerA cross-discipline view of the intelligent agent: inputs, goals, and internal structure for perception, world models, and choice—naming what psychology, control, and AI already share.
-
Paper
Reward is enoughA position paper: maximizing reward is enough, in principle, to yield perception, language, and other hallmarks of intelligence through reinforcement learning and trial-and-error experience.
-
Paper
Planning with expectation modelsWhen a value function is linear in features, planning with a learned expectation model can match planning with a full distribution model—plus sound model-based policy evaluation in stochastic settings.
-
Paper
On the role of tracking in stationary environmentsTracking the best policy can beat any converging algorithm even on stationary problems (e.g. the Black and White problem, computer Go), with links to metalearning and IDBD for step-size adaptation.
-
Paper
Scalable Real-time Recurrent Learning using Columnar-constructive NetworksConstraints on RTRL so online recurrent learning scales linearly in parameters—no extra gradient bias from truncated BPTT-style shortcuts, at the cost of some network flexibility.