# Next <div class="pills-container"> <span class="pill">Last Updated: April 21, 2026</span> </div> ## The next five years AI safety is producing tons of important work. What it isn't producing enough of is the infrastructure that makes that work robust (e.g., evals that generalize across contexts, findings that get replicated, pipelines that can bring more people into credible contribution). Without that layer, the field stays fragile in specific ways: - Results are harder to validate and findings are harder to build on; and - The work stays concentrated in a small network of institutions within a specific geographic area. I think we are underinvesting on building the maintenance infrastructure of AI safety (as a field) relative to its importance, and it's where I want to focus. I'm particularly interested in the science of evals, replication infrastructure, and the field-building pipelines that expand the pool of people doing credible AI safety work. I'm doing this from the Philippines, which shapes the specific form it takes. [The obvious thing](https://www.lesswrong.com/posts/Zpqhds4dmLaBwTcnp/trying-the-obvious-thing) when local bottlenecks are mostly institutional is doing policy reform work, since that's what can actually unlock local AI safety capacity.[^1] I've also been thinking seriously about [middle power governance](https://forum.effectivealtruism.org/posts/qo8CmZeCAJRweesMf/middle-powers-in-ai-governance-potential-paths-to-impact-and) (especially in [[#Middle power governance (especially in Asia)|Asia]]) since most governance frameworks assume that the regulator and the developer are in the same jurisdiction, and they're not, for most of the world. These are high-trust domains, and to be effective in them I need credibility I don't fully have yet. ***If I want downstream impact, I need upstream positioning first.*** My near-term focus is depth over breadth for the next 2-3 years.[^2] If you're working on related problems, I'd like to hear from you. ## Bets worth taking These are directional bets. I'm pursuing them in a rough priority order, and I'd rather go deep on one or two than spread across all of them. If any of these interest you, please work on them too. Note that I could also be wrong about some of the assumptions here, as [[Ethos#Mistakes I've made|I've been wrong many times before]], and I haven't done a comprehensive literature review for all of these. If you want to work on any of these, [reach out on LinkedIn](https://www.linkedin.com/in/llenzl/). ### Science of agentic evaluations If our evals don’t measure what we think they measure, then conclusions about model safety and alignment drawn from them may be wrong. That’s an [ecological validity](https://en.wikipedia.org/wiki/Ecological_validity) and [construct validity](https://en.wikipedia.org/wiki/Construct_validity) problem, and I think it’s underappreciated. Some pathways under this umbrella that might be promising are: - **Developing multi-agent and multi-objective evals for both capabilities and alignment.** I did some [initial work](https://docs.google.com/presentation/d/1ePaTc4qq4Ec8eZQV-V4Ev1NfK5x-Ky3P8JmpwA2XDp0/edit?usp=sharing) on this in [AI Safety Camp 10](https://www.aisafety.camp/). I generally think pluralistic alignment has a lot of promise, by which I mean training agents to balance multiple bounded objectives rather than maximizing a single unbounded one. This draws on principles from biology (homeostasis) and economics (diminishing returns) to build benchmarks that expose the failure modes that current evals miss. Single-agent benchmarks tell us about isolated behavior, but multi-agent systems produce emergent behaviors that those benchmarks can’t capture. If we can’t predict how capabilities develop in multi-agent settings, we also can’t assess alignment in them. - **Building interp-based and environment-based evals.** Compared to static evals, these are much harder to game (I think).[^3] There’s already a decent number of environment-based evals being built, but interp-based evals (like using linear probes or crosscoders to eval a behavior) may actually be promising. [AbliterationBench](https://github.com/gpiat/AIAE-AbliterationBench/), which uses steering vectors, is an example of this. - **Debugging the evals ported to [Inspect](https://inspect.aisi.org.uk/).** I worked on this along with [Hugo Save](https://github.com/HugoSave) for our capstone in [ARENA 6.0](https://arena.education/). We focused on Google DeepMind’s Dangerous Capabilities CTF suite and found bugs in the most-used eval in the whole repo. I’d guess there are more (and bigger) bugs in the 90+ evals ported to Inspect, especially the ones that don’t get used much. - **Docent/Transluce for interp-/environment-based evals.** Making transcript analysis easier makes debugging evals easier. Current tools don’t apply to interp- or environment-based evals (at least those that don’t rely on LLM transcripts), which means debugging them is much harder than it should be. ### Middle power governance (especially in Asia) Most AI governance frameworks are written by and for countries with a domestic AI industry. Asia mostly doesn’t have that. Most countries in the region are consumer states (i.e., they rely on imported AI), which means the standard enforcement mechanisms (liability for developers, compute governance, model audits) are either unenforceable or inapplicable. - **Direct work in policy.** The first-generation AI policies in many Asian countries are being written now. Once regulatory frameworks are set, they’re hard to change, which means the next 2-3 years are unusually high-leverage for anyone who can contribute credibly. I’m most interested in efforts within China, India, Singapore, and Taiwan, though the rest of Southeast Asia is catching up. - **Consumer state governance.** How does a country’s reliance on imported AI constrain their policy options? Most governance frameworks assume the regulator and the developer are in the same legal jurisdiction (they’re not, for most of the world). Drawing from cybercrime governance, the UN Convention against Cybercrime has articles unimplementable in Southeast Asia due to conflicting legal language and limited judicial capacity for ICT cases. These patterns are likely to recur in AI treaties. The more interesting question is what governance looks like when you start from the leverage points a consumer state actually has (i.e., procurement decisions, deployment conditions, data governance, and labor market protections) rather than borrowing enforcement mechanisms designed for a different institutional context. ### Field-building and advocacy The lack of localized advocacy is a huge barrier to AI policy reform. Policy doesn’t move without people who can translate technical risk into terms that resonate with local institutions. Right now, there aren’t enough of those people, especially outside the West. Although, I'm not arguing that we need more people in these spaces. We do, but I'm mostly concerned about the quality of folks who enter these spaces. - **Building advocacy projects.** Writing about AI safety and building communication tools for it is essential in a "this-could-make-or-break-support-in-policy-reforms" way. [AI 2027](https://ai-2027.com/) is a good example of what happens when you make an accessible forecast out of what x-risk researchers have been worried about for years. It moved people. More of that needs to happen locally, in languages and frames that resonate with non-Western policymakers. - **Designing quicker feedback loops.** The gap between ideas and prototyping is narrowing, but that opportunity mostly exists for senior researchers and the small pool of junior researchers they can take in. Infrastructure designed for faster iteration, like [Apart Sprints](https://apartresearch.com/sprints), is promising. Strong middle layers can move high-agency people from interest toward genuine contribution faster than the current pathway does, and without requiring full-time commitment to do it. - **Fast-tracking the pipeline to senior-level AI safety folks.** We have plenty of junior talent in AI safety. However, we do lack a stable supply of senior-level researchers who can mentor and give feedback. [Other field-builders are noticing this too.](https://matsresearch.substack.com/p/ai-safety-talent-needs-in-2026-insights) There is a space between someone with zero knowledge on AI safety and governance and someone with genuine research taste who can work autonomously with minimal supervision that we need to close. [^1]: Why is AI policy the obvious thing? I live in the Philippines and I still want to work in AI safety. I could argue that working on policy reforms is the best way for me to make impact in my field. There's not a lot of technical stuff going on locally, and the bottlenecks for those can be unlocked by (surprise, surprise) policy reforms. So if we want more local safety work, then we need reforms to make it happen. While field-building is obviously a big part of this, I'm a bit hesitant to being reliant on funding that comes from the West. [^2]: I initially thought I'd want to spend the next 5 years to upskill, but that's quite a long time to be upskilling for such an urgent issue; so I'll have to make pursuing an accelerated career path a feasible option. [^3]: This is so far my working hypothesis. I can be proven wrong.