Research

My work sits at the intersection of mathematics and machine learning theory. I study the dynamics of attention in transformers — how tokens move, cluster, and represent information.

Papers

Normalization in Attention Dynamics

Nikita Karagodin, Yury Polyanskiy · NeurIPS 2025 (39th Conference on Neural Information Processing Systems)

We unify different normalization schemes (LayerNorm, RMSNorm, etc.) under the umbrella of token geometry, showing that the choice of normalization acts as a speed control on attention dynamics. We prove convergence rates that identify which methods converge faster.

Clustering in Causal Attention Masking

Nikita Karagodin, Yury Polyanskiy · NeurIPS 2024 (38th Conference on Neural Information Processing Systems)

We provide the first treatment of autoregressive (causal) attention dynamics, establishing convergence guarantees and predicting how Value matrices influence the final token configuration.

Other fascinating things

Papers and ideas I didn't write but find remarkable.

The Unreasonable Effectiveness of Mathematics

Eugene Wigner, 1960

The classic essay on why mathematical concepts developed in pure abstraction turn out to describe physical reality with uncanny precision.

Neural Ordinary Differential Equations

Chen et al., NeurIPS 2018

The paper that started the neural ODE revolution — replacing discrete residual layers with continuous dynamics defined by an ODE solver.

Санкт-Петербург

A city of white nights, bridges rising over the Neva, and the quiet beauty that shaped a part of who I am.