About our research
Ornis Research is an independent AI alignment laboratory. We study the behavior of large language models under realistic deployment pressures, the post-training choices that produce that behavior, and the relationships between these systems and the people who rely on them. Our work focuses on capability evaluation, alignment methodology, and human–AI interaction. We share the field's view that AI safety is a genuinely open problem, and we believe it benefits when these questions are pursued from a wider set of research traditions than the current literature reflects.
Research themes
Our work currently clusters around the following themes.
Capability evaluation and elicitation
Some failure modes — deception, scheming, sandbagging, motivated reasoning — resist naive measurement, because their absence under one elicitation condition does not imply absence under another. We design paired benchmarks with directional controls that isolate the construct of interest from broader instruction-following, and we treat measurement that survives adversarial scrutiny as the standard worth reaching.
Alignment methodology
Post-training pathways shape model behavior in ways that are often opaque from the outside and are not captured by standard capability benchmarks. We study how different post-training choices — instruction tuning, reasoning distillation, constitutional approaches, RLHF and its successors — produce different deployment-relevant policies, and what this implies for how alignment claims should be reported and audited.
Human–AI interaction and trust
AI safety is also a question about systems that include humans — reviewers, auditors, decision-makers, end users — who form expectations about model behavior and adjust their actions accordingly. We study how these expectations form, where they break under load, and how evaluation procedures can account for the human side of the loop without diluting the technical content of the alignment problem.
Current work
A manuscript on stated prior-commitment pressure in open-weight large language models is currently in preparation for submission. Where possible, we release pre-registrations, code, and full result tables in advance of publication, so that the work is auditable and reusable by the wider research community.
How we work
We operate as an independent research group, unaffiliated with any large company or specific university. Most of our work is published openly — papers, blog posts, pre-registrations, code, and result tables — and we do not undertake proprietary research. We place a high weight on methodological rigor and reproducibility: paired controls, pre-registered verdict criteria, public datasets and analysis scripts, and walking back claims when replication fails. The lab writes for the international academic literature, and we welcome correspondence from researchers working on similar questions in alignment, evaluation, and methodology.