Research Publications & Contributions

As a Staff Gen AI/ML researcher and former CTO, I bridge the gap between cutting-edge research and production systems. My work focuses on making advanced AI accessible and efficient for real-world applications.

Published Research

RWKV: Reinventing RNNs for the Transformer Era

Linear Transformers Can Do Real-Time Language Modeling

Paper: arXiv:2305.13048

My Contributions: Core contributor studying infinite context length capabilities and efficient linear attention mechanisms.

Impact: This groundbreaking work enables language models with theoretically unlimited context while maintaining O(n) complexity, making it 10-100x more efficient than traditional transformers for long sequences.

Read Detailed Analysis

Research Interests

Efficient Language Models

Linear attention mechanisms and state-space models
Model compression and quantization techniques
Edge deployment optimization
Resource-constrained AI systems

Production AI Systems

Scalable inference architectures
Real-time model serving
Cost-efficient deployment strategies
Multi-model orchestration

Generative AI Applications

Multi-agent systems and coordination
RAG (Retrieval-Augmented Generation) optimization
Long-context processing
Voice synthesis and storytelling AI

Industry Impact

My research contributions have directly influenced:

FakeYou.com: Advanced voice synthesis using efficient models
Open Source: Democratizing access to efficient AI through RWKV

Collaborations & Speaking

I’m open to research collaborations, speaking engagements, and consulting on:

Efficient transformer architectures
Production AI system design
Small language model deployment
Real-world AI applications

Discuss Research Collaboration