Blog

Technical notes, experiments, and longer writeups on AI, distributed systems, product work, and whatever else was worth keeping.

March 15, 2026

My Current AI Opinions

My current thoughts on AI and how to be good at it.

aiopinionscurrent

March 7, 2026

TinySafe v3: How I Built a Near-SOTA Safety LLM for Under $100

A 4B parameter model fine-tuned with QLoRA gets within 0.008 F1 of SOTA on ToxicChat, then hits a wall. Five versions, $100, and a lot of lessons about why 0.83 is so hard to break.

aimachine learningsafetynlpexperimentdistillationqlora

February 28, 2026

I Built TinySafe, a Safety Model that Beats 8B Guard Models with 71M Parameters for $37

Building a sub-2ms safety classifier that outperforms LlamaGuard, ShieldGemma, and every encoder-based model in its weight class.

aimachine learningsafetynlpexperiment

January 25, 2026

Which AI Models Lean in Which Political Direction?

I asked the top LLMs 20 different political compass questions to figure out how left/right/libertarian/authoritarian they were.

aillmpolitical compassbiasexperimentpoliticsalignment

September 22, 2025

Are Computer Science Majors "Cooked" because of AI?

A speculative deep dive on whether AI will augment or replace new software engineers.

aicomputer sciencejob markettech industrysoftware engineeringunemploymentcareer

September 14, 2025

I Trained AI on 70k Clash Royale Battles to Settle the Ultimate Debate: Does Your Deck Actually Matter?

Using machine learning on 70,000+ real battles from the Clash Royale API to definitively answer whether deck composition actually predicts victory, or if it's just skill and luck.

aimachine learningclash royalegame analysisdata sciencelightgbmpytorchdecksexperiment

September 7, 2025

The Nutrition Prediction Benchmark: Testing LLMs on Google Cafeteria Menus

Testing large language models on their ability to predict accurate nutritional information from Google cafeteria dishes, revealing surprising insights about AI's understanding of food science.

aillmbenchmarknutritiongoogleexperiment

August 31, 2025

The Semantic Diversity Benchmark: A New Way to Test AI Language Models

A simple but powerful benchmark for testing AI language models by asking them to generate maximally semantically unrelated words, revealing surprising insights about model capabilities.

aillmbenchmarksemantic diversitylanguage modelsexperiment

August 2, 2025

New Grad Job Search for Software Engineers

A quick guide for new grads looking to break into the software engineering industry, including networking tips, technical preparation advice, and insights into why the current job market is more challenging.

careerjob searchnew gradsoftware engineeringtech industry

August 1, 2025

Transformers are Limited

The transformer architecture is fundamentally limited for true reasoning.

transformerslimitationreasoningllm

June 9, 2025

Welcome to My Blog

Welcome to my blog!

welcomeblogjdleo