New RNN Architecture Surpasses Transformer: Each Hidden State Is a Model, First Author Says It Fundamentally Changes Language Models
I read the claim that a new RNN treats hidden states as models, beating Transformers. Ops take: If it doesn't fit our GPU budget, it's just another lab demo.