# Next
<div class="pills-container">
<span class="pill">Last Updated: April 3, 2026</span>
</div>
## The next five years
When people ask me what the next 3-5 years look like for me, they almost usually want an answer like "In 5 years, I want to be a lawyer" or "In 5 years, I want to be in a more managerial role." I feel uneasy answering that. I don't believe in job titles, and I'm more comfortable looking into the gravity of the contribution.
In 5 years, I want to be doing the most impactful work I *can*. That would mean working on projects or products that affects people's lives at scale (and ideally for the better). Right now, since I'm staying in the Philippines, [the obvious thing](https://www.lesswrong.com/posts/Zpqhds4dmLaBwTcnp/trying-the-obvious-thing) would be to work in policy reforms.[^1] If I end up moving to an AI hub, then the next best thing would be transitioning to multi-agent safety research or doing [more work in technical AI governance](https://cdn.governance.ai/Open_Problems_in_Technical_AI_Governance.pdf). Recently, I've also considered [working on middle power governance](https://forum.effectivealtruism.org/posts/qo8CmZeCAJRweesMf/middle-powers-in-ai-governance-potential-paths-to-impact-and) (especially in [[#Middle power governance (especially in Asia)|Asia]]) since it seems to be highly neglected, important, and very tractable given my background.
Now, these high-impact domains are also high-trust. To even be taken seriously among these circles, I need to build both **credibility** and **capability**. If I want downstream impact, I need upstream positioning. Thus, my near-term focus is to **spend the next 2-3 years building the career capital and depth in my AI safety work, and hopefully position myself to lead in my chosen niche**.[^2]
## Bets worth taking
I enumerate here some questions and ideas I think are worth validating (even. by others). This is a living document. The things written here reflects my current thinking. If you are interested in working on any of these ideas, notify me on [LinkedIn](https://www.linkedin.com/in/llenzl/).
**A very important note:** I could be wrong about certain assumptions, as [[Ethos#Mistakes I've made|I've been wrong countless of times before]]. Note that I have not done a comprehensive literature for some of these so it would be good to check the existing literature before starting work on any of them.
### Science of agentic evaluations
I'm particularly interested in looking into the [ecological validity](https://en.wikipedia.org/wiki/Ecological_validity) and [construct validity](https://en.wikipedia.org/wiki/Construct_validity) of eval setups to make them more reproducible.
- **Developing multi-agent and multi-objective evals for both capabilities and alignment.** I did some [initial work](https://docs.google.com/presentation/d/1ePaTc4qq4Ec8eZQV-V4Ev1NfK5x-Ky3P8JmpwA2XDp0/edit?usp=sharing) on this in [AI Safety Camp 10](https://www.aisafety.camp/). I think this line of work is generally underdeveloped compared to single-agent benchmarks. We don't know much about how multi-agent or multi-objective systems align, but we also don't know much how they even develop certain capabilities since behaviors become emergent in these settings.
- **Building interp-based and environment-based evals.** Compared to static evals, these are much harder to game (I think).[^3] There's already a decent number of environment-based evals that are being built, but interp-based evals (like using linear probes or crosscoders to do eval on a behavior) may actually be promising. [AbliterationBench](https://github.com/gpiat/AIAE-AbliterationBench/), which uses steering vectors, is an example of this.
- **Debugging the evals ported to [Inspect](https://inspect.aisi.org.uk/).** I worked on this along with [Hugo Save](https://github.com/HugoSave) for our capstone project in [ARENA 6.0](https://arena.education/). In our case, we focused on Google DeepMind's Dangerous Capabilities Capture-the-Flag suite and found some bugs; and this was the most used eval (a.k.a. most maintained) at the time we were working on it. I would guess that there's probably a bunch more (and bigger) bugs in the 90+ evals that are ported to Inspect, especially those which aren't used much.
- **Docent/Transluce for interp-/environment-based evals.** Making transcript analysis easier to do makes debugging evals easier to do. At the moment, current tools just don't apply to interp-/environment-based evals (at least those that do not rely on LLM transcripts).
### Middle power governance (especially in Asia)
While frontier AI safety within western AIS hubs are obviously very relevant, I'm also quite concerned about AI safety field-building in Asia. The region is very diverse, and so policy interventions would look a lot different in this space, compared to what we see in the US, UK, or Europe. There's also way more compute deserts and compute south countries in the region, which can be a concern.
My core thesis is that Asia's capacity for AI safety work is a huge bottleneck. Policy reforms can unlock some of this capacity, but only alongside complementary infrastructure and field-building work.
- **Direct work in policy.** There are many emerging opportunities to work in AI policy within Asian countries right now. I'm most interested in efforts within China, India, Singapore, and Taiwan. Although the rest of Southeast Asia also seems to be catching up, so it's worth looking out for efforts in these spaces.
- **Consumer state governance.** How does a country’s reliance on imported AI constrain their policy options? Governments say that it is unclear what non-frontier states can regulate when AI is developed abroad. How effective are international AI agreements under institutional constraints? Drawing from cybercrime governance, the UN Convention against Cybercrime (UNCC) has articles unimplementable in Southeast Asia due to conflicting legal language and limited judicial capacity for ICT cases. These are patterns that are likely to recur in AI treaties.
### Field-building and advocacy
Drawing from my experience working on policy and the things I've read on policy work in AI safety, I'm quite convinced that the lack of localized advocacy has been a huge barrier in pushing for AI policy reforms. On top of that, people who are willing to engage in the space don't even have a space to contribute in it.
- **Building advocacy projects.** Simple contributions like writing about AI safety and creating more communication tools for it are essential. And I mean it in a "this-could-possibly-make-or-break-support-in-policy-reforms" essential. Just look at how far [AI 2027](https://ai-2027.com/) was able to go because they decided to make an easy-to-understand forecast of what x-risk folks have been worried since forever.
- **Designing quicker feedback loops.** The gap between ideas and prototyping is getting narrower over time. But this sort of opportunity exists mostly for senior-level researchers and the small pool of junior-level researchers that they can take in. Young folks from halfway across the world usually don't have access to these types of opportunities. But infra that is designed for this like the [Apart Sprints](https://apartresearch.com/sprints) are promising. I think we should have more of these across different sub-areas of AI safety. Not everyone needs to be a full-time AI safety researcher. I think strong middle layers can help fill the vacuum between interest and expertise, and can actually move high-agency people towards doing research that is 4x more impactful that the normative output with more FTEs.
- **Fast-tracking the pipeline to senior-level AI safety folks.** I've realized (and I think [other field-builders are seeing this too](https://matsresearch.substack.com/p/ai-safety-talent-needs-in-2026-insights)) that a huge bottleneck in AI safety work is the lack of senior-level folks. We have an abundance of junior-level talent, but virtually no one to mentor them. That means field-builders should be filling the gap between completely zero knowledge to someone with research taste and can work autonomously with minimal supervision.
[^1]: Why is AI policy the obvious thing? I live in the Philippines and I still want to work in AI safety. I could argue that working on policy reforms is the best way for me to make impact in my field. There's not a lot of technical stuff going on locally, and the bottlenecks for those can be unlocked by (surprise, surprise) policy reforms. So if we want more local safety work, then we need reforms to make it happen. While field-building is obviously a big part of this, I'm a bit bullish on being reliant on funding that comes from the West.
[^2]: I initially thought I'd want to spend the next 5 years to upskill, but that's quite a long time to be upskilling for such an urgent issue; so I'll have to make pursuing an accelerated career path a feasible option.
[^3]: This is so far my working hypothesis. I can be proven wrong.