Research LLMs @allen_ai: eval, synthetic data, reasoning, agents. WildBench, Magpie, SwiftSage. Ex: @GoogleAI & FAIR; fun fact: found the 13.11 vs 13.8 problem
Research LLMs @allen_ai: eval, synthetic data, reasoning, agents. WildBench, Magpie, SwiftSage. Ex: @GoogleAI & FAIR; fun fact: found the 13.11 vs 13.8 problem