John Leonardo

Blog

Technical notes, experiments, and longer writeups on AI, distributed systems, product work, and whatever else was worth keeping.

My Current AI Opinions

My current thoughts on AI and how to be good at it.

aiopinionscurrent

TinySafe v3: How I Built a Near-SOTA Safety LLM for Under $100

A 4B parameter model fine-tuned with QLoRA gets within 0.008 F1 of SOTA on ToxicChat, then hits a wall. Five versions, $100, and a lot of lessons about why 0.83 is so hard to break.

aimachine learningsafetynlpexperimentdistillationqlora

I Built TinySafe, a Safety Model that Beats 8B Guard Models with 71M Parameters for $37

Building a sub-2ms safety classifier that outperforms LlamaGuard, ShieldGemma, and every encoder-based model in its weight class.

aimachine learningsafetynlpexperiment

Which AI Models Lean in Which Political Direction?

I asked the top LLMs 20 different political compass questions to figure out how left/right/libertarian/authoritarian they were.

aillmpolitical compassbiasexperimentpoliticsalignment

Are Computer Science Majors "Cooked" because of AI?

A speculative deep dive on whether AI will augment or replace new software engineers.

aicomputer sciencejob markettech industrysoftware engineeringunemploymentcareer

I Trained AI on 70k Clash Royale Battles to Settle the Ultimate Debate: Does Your Deck Actually Matter?

Using machine learning on 70,000+ real battles from the Clash Royale API to definitively answer whether deck composition actually predicts victory, or if it's just skill and luck.

aimachine learningclash royalegame analysisdata sciencelightgbmpytorchdecksexperiment

The Nutrition Prediction Benchmark: Testing LLMs on Google Cafeteria Menus

Testing large language models on their ability to predict accurate nutritional information from Google cafeteria dishes, revealing surprising insights about AI's understanding of food science.

aillmbenchmarknutritiongoogleexperiment

The Semantic Diversity Benchmark: A New Way to Test AI Language Models

A simple but powerful benchmark for testing AI language models by asking them to generate maximally semantically unrelated words, revealing surprising insights about model capabilities.

aillmbenchmarksemantic diversitylanguage modelsexperiment

New Grad Job Search for Software Engineers

A quick guide for new grads looking to break into the software engineering industry, including networking tips, technical preparation advice, and insights into why the current job market is more challenging.

careerjob searchnew gradsoftware engineeringtech industry

Transformers are Limited

The transformer architecture is fundamentally limited for true reasoning.

transformerslimitationreasoningllm

Welcome to My Blog

Welcome to my blog!

welcomeblogjdleo