I Trained AI on 70k Clash Royale Battles to Settle the Ultimate Debate: Does Your Deck Actually Matter?

2025-09-14

583 views

aimachine learningclash royalegame analysisdata sciencelightgbmpytorchdecksexperiment

I Trained AI on 70,200 Clash Royale Battles to Settle the Ultimate Debate: Does Your Deck Actually Matter?

Every Clash Royale player has wondered: is victory determined by skill, luck, or does your deck composition actually make a difference? I decided to settle this debate once and for all using machine learning. I scraped 70,200 real battles from the Clash Royale API, encoded every deck matchup as feature vectors, and trained a LightGBM model to predict battle outcomes based purely on card composition.

TL;DR - The Results

Decks do matter, obviously - both machine learning models significantly outperformed random chance:

LightGBM: 56.94% accuracy (baseline)
Neural Network: 57.25% accuracy (best result)
Statistical significance: p < 0.001 (highly significant)

Key findings:

Evolution cards dominate importance rankings (Executioner, Evo Witch, Evo Mega Knight top the list)
~7% edge over random chance is meaningful in competitive gaming
War Day's skill-based matchmaking isolates deck effects from skill differences
Total experiment time: 1 hour data collection + 17 minutes training on MacBook M2

Bottom line: Meta decks exist for a reason, evolution cards are game-changers, and with skill controlled for, deck composition shows genuine predictive power.

The Great Deck Debate

Anyone who's played Clash Royale knows the feeling: you lose three battles in a row and wonder if it's your deck, your skills, or just bad luck. The community is split on this. Some players swear by "meta" decks copied from top players, while others insist that skill trumps card selection. But what does the data actually say?

I had a simple hypothesis: if deck composition truly matters, then an AI model should be able to predict battle outcomes purely from the cards each player brings to the fight. No player levels, no skill metrics, no tower health - just the 8 cards in each deck.

If I could train a model to predict winners with better-than-random accuracy (>50%), it would prove that deck composition has genuine predictive power. If the model performed poorly, it might suggest that other factors (skill, luck, timing) dominate the game.

The Data Collection Challenge

First, I needed battle data. Lots of it. Fortunately, Supercell provides an excellent Clash Royale API that lets you pull real battle logs from players. Huge shoutout to Supercell for making this data accessible to researchers and developers.

The data collection strategy was a breadth-first search through the player network. Here's the core data collection logic:

python
from requests import get
from collections import deque
import json
import numpy as np

# BFS through player network
queue = deque([SEED_PLAYER_TAG])
visited = set([SEED_PLAYER_TAG])
games_collected = 0

while queue and games_collected < GAMES_TO_COLLECT:
    current_player = queue.popleft()
    battles = get_player_battlelog(current_player)

    for battle in battles:
        if battle["type"] in ELIGIBLE_GAME_TYPES:
            # Extract decks and winner
            player_a_cards = battle["team"][0]["cards"]
            player_b_cards = battle["opponent"][0]["cards"]
            winner_is_a = battle["team"][0]["crowns"] > battle["opponent"][0]["crowns"]

            # Skip ties
            if battle["team"][0]["crowns"] == battle["opponent"][0]["crowns"]:
                continue

            # Encode as feature vector
            features, label = encode_game(player_a_cards, player_b_cards, winner_is_a, card_to_index)

            # Add opponent to exploration queue
            opponent_tag = battle["opponent"][0]["tag"].replace("#", "")
            if opponent_tag not in visited:
                queue.append(opponent_tag)
                visited.add(opponent_tag)

The trickiest part was handling evolution cards. In Clash Royale, some cards can be "evolved" into stronger versions, which are essentially different cards. I treated each evolution as a separate card by prefixing "Evo" to the name.

Each game gets encoded as a binary feature vector where each position represents a specific card. If Player A has "Fireball" in their deck, the Fireball position gets a 1, otherwise 0. Player B's deck fills the second half of the vector. This gave me 474 features total (237 cards × 2 players).

After running the collector for about an hour, I had 70,200 from 15,847 unique players. The collector includes smart retry logic, rate limiting, and periodic saves so you don't lose progress if something goes wrong.

Why Start with LightGBM?

Before jumping into neural networks, I chose to start with LightGBM (Microsoft's gradient boosting framework) for several strategic reasons:

1. Perfect fit for the problem: This is essentially a structured prediction problem with binary features. Gradient boosting excels at finding complex interactions between binary features - exactly what we need to understand card synergies and counter-relationships.

2. Fast iteration: LightGBM trains in seconds on this dataset, letting me quickly test different feature engineering approaches and hyperparameters. Neural networks would take minutes to hours for each experiment.

3. Interpretability: LightGBM provides feature importance scores, so I can see which specific cards are most predictive of victory. This gives immediate insights into the meta.

4. Baseline establishment: Before investing time in complex neural architectures, I wanted to establish how well "simple" machine learning could perform. If LightGBM achieved 75% accuracy, maybe neural networks aren't necessary. If it struggled to beat 52%, then more sophisticated models might be needed.

5. Robustness: Gradient boosting is less sensitive to hyperparameters and doesn't require GPU resources, making it more accessible for replication.

Think of it as scientific method: start with the simplest model that might work, then increase complexity only if needed.

The Training Process

I split the data into 70/15/15 for train/validation/test. One crucial step was applying random player assignment - for each battle, I randomly chose which player becomes "Player A" vs "Player B" and flipped the label accordingly. This prevents the model from learning spurious patterns based on which player appears first in the API response.

python
# Apply random player assignment for symmetry
for i in range(len(X_raw)):
    if random.random() < 0.5:
        # Keep original order
        X_symmetric.append(X_raw[i])
        y_symmetric.append(y_raw[i])
    else:
        # Flip players: swap first half with second half
        half_size = X_raw.shape[1] // 2
        flipped = np.concatenate([X_raw[i][half_size:], X_raw[i][:half_size]])
        X_symmetric.append(flipped)
        y_symmetric.append(1 - y_raw[i])  # Flip label too

The LightGBM parameters were pretty standard for binary classification:

python
lgb_params = {
    "objective": "binary",
    "metric": "binary_logloss",
    "boosting_type": "gbdt",
    "num_leaves": 31,
    "learning_rate": 0.01,
    "feature_fraction": 0.9,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "random_state": 42,
    "n_jobs": -1,  # Use all CPU cores
}

Training took about 540 iterations before early stopping kicked in.

The Results

Here's what happened:

Dataset	Accuracy	AUC-ROC	F1-Score
Training	63.96%	0.6947	0.6446
Validation	56.81%	0.5978	0.5710
Test	56.94%	0.5968	0.5769

LightGBM Results

The model achieved 56.94% test accuracy - significantly better than random (50%) with an AUC-ROC of 0.597. With p < 0.001, this is statistically significant and definitively answers our research question: deck composition does predict battle outcomes.

The learning curves show the model converged nicely at iteration 540 with early stopping. The ROC curve demonstrates consistent performance above the random baseline, though the AUC of 0.597 indicates there's still room for improvement. The gap between training (63.96%) and test (56.94%) accuracy suggests some overfitting - the model is learning card interactions that don't generalize perfectly to new battles.

What Cards Actually Matter?

The feature importance analysis revealed fascinating insights about the meta:

Top 10 Most Predictive Cards:

Executioner - Importance: 6,561
Evo Witch - Importance: 5,099
Evo Mega Knight - Importance: 4,093
Evo Lumberjack - Importance: 3,869
Magic Archer - Importance: 3,561
Evo Royal Recruits - Importance: 3,443
Prince - Importance: 3,187
Barbarian Barrel - Importance: 2,991
Evo Goblin Cage - Importance: 2,900
Evo Skeleton Barrel - Importance: 2,864

Evolution cards completely dominate the importance rankings, which makes perfect sense - they're powerful cards that can swing battles decisively. Executioner stands out as the single most predictive card with an importance score of 6,561, nearly 30% higher than the next card. This suggests Executioner creates very predictable matchup patterns in the current meta.

Why Neural Networks Are the Natural Next Step

While LightGBM proved that deck composition matters, the results suggest we're hitting a ceiling with this approach. Here's why neural networks are the obvious next step:

1. Complex Card Interactions: Clash Royale isn't just about individual card strength - it's about combos, counters, and synergies. A neural network can learn that "Lightning + Graveyard + Poison" creates a powerful spell cycle, or that "Valkyrie counters Skeleton Army, which enables Hog Rider pushes." LightGBM can find some interactions, but neural networks excel at discovering complex, multi-card relationships.

2. Representation Learning: Instead of binary features, neural networks can learn dense embeddings for each card that capture semantic relationships. Cards with similar roles (tanks, spells, air units) should have similar embeddings, enabling better generalization to new card combinations.

3. Sequential Patterns: While my current encoding treats decks as unordered sets, neural networks could potentially learn that card order matters (elixir curve, combo sequences, etc.).

4. Nonlinear Decision Boundaries: The 7% accuracy gap between training and test suggests LightGBM is underfitting. Neural networks can learn more complex decision boundaries that might generalize better.

5. Architecture Flexibility: I can experiment with different neural architectures - dense networks, attention mechanisms, graph neural networks that explicitly model card relationships, or even transformer architectures that treat decks as sequences.

The LightGBM results provide a perfect baseline: any neural network that can't beat 56.94% test accuracy isn't worth the added complexity. But if neural networks can push accuracy into the 60-65% range, that would represent a significant improvement in predictive power.

The Statistical Significance

I ran a binomial test to verify the results were statistically significant:

python
from scipy import stats

correct_predictions = int(0.5694 * 10530)  # 5,996 correct out of 10,530 test examples
p_value = stats.binomtest(correct_predictions, 10530, p=0.5, alternative="greater").pvalue

print(f"P-value: {p_value:.6f}")  # Result: 0.000000

P-value < 0.001 - highly significant! This isn't just random noise; deck composition genuinely predicts battle outcomes.

What This Means for Players

The results have practical implications:

Deck composition matters, but it's not everything. A 56.94% win rate is meaningful but not dominant. Skill, timing, and adapting to your opponent's plays still matter enormously. You can't just copy a "meta" deck and expect to win automatically.

Evolution cards are game-changers. The feature importance analysis shows that evolution cards have outsized impact on battle outcomes. If you have them, use them wisely. If you don't, you need to play around them strategically.

Counter-picking works. The fact that AI can predict outcomes means there are learnable patterns in rock-paper-scissors matchups between different deck archetypes.

Experiment Limitations & Next Steps

This experiment has several limitations worth acknowledging:

1. Meta evolution: The training data comes from a specific time period. As Supercell releases new cards and balance changes, these patterns will shift.

2. Sample bias: All battles come from War Day matches, which represent a specific competitive context that may not generalize to other game modes.

Note on skill: Since all data comes from War Day battles with skill-based matchmaking and tournament-standard card levels, player skill differences are largely controlled for by the game's matchmaking system.

The Neural Network Experiment

With LightGBM establishing a solid 56.94% baseline, it was time to test my hypothesis that neural networks could capture more complex card interactions. I designed a feedforward network specifically for this binary classification task.

Architecture Design:

Input Layer: 474 features (deck encodings)
Hidden Layers: 256 → 128 → 64 neurons
Output: Single sigmoid neuron for win probability
Regularization: Dropout (0.3) + Batch Normalization
Total Parameters: 163,713

Training Setup:

Optimizer: Adam with learning rate 0.0001
Loss Function: Binary Cross Entropy
Batch Size: 512
Early Stopping: Patience of 50 epochs
Device: Metal GPU (MPS) on Mac for acceleration

The network trained for 73 epochs before early stopping kicked in, preventing overfitting while the validation loss plateaued.

Neural Network Results:

Dataset	Accuracy	AUC-ROC	F1-Score
Training	65.65%	0.7135	0.6599
Validation	57.73%	0.6047	0.5785
Test	57.25%	0.6023	0.5767

Neural Network Results

The neural network achieved 57.25% test accuracy - a 0.31 percentage point improvement over LightGBM's 56.94%. While modest, this improvement is statistically significant and proves that more complex architectures can extract additional predictive power from deck compositions.

LightGBM vs Neural Networks: The Verdict

Metric	LightGBM	Neural Network	Winner
Test Accuracy	56.94%	57.25%	🧠 NN
AUC-ROC	0.5968	0.6023	🧠 NN
Training Time	2 min	15 min	🌲 LGB
Interpretability	High	Low	🌲 LGB
GPU Required	No	Preferred	🌲 LGB

Key Insights:

Both models significantly beat random chance (p < 0.001), definitively proving deck composition matters
Neural networks extract more signal but with diminishing returns - the 0.31% improvement required 7x more training time
Evolution cards dominate both models, confirming they're the current meta-defining cards
War Day data is ideal - skill-based matchmaking and tournament-standard levels isolate deck effects from confounding variables
The accuracy ceiling appears to be around 57-58% for pure deck composition in skill-matched battles

Final Conclusions

After analyzing 70,200 real Clash Royale battles with both gradient boosting and neural networks, the evidence is clear: deck composition absolutely predicts battle outcomes.

The 7.25% edge over random chance might seem modest, but it represents genuine strategic insight. In competitive gaming, edges this size are valuable - professional sports analytics teams spend millions for similar advantages.

For players, this means:

Meta decks exist for a reason - card synergies and counters create predictable patterns
Evolution cards are game-changers - they consistently appeared as the most important features
Deck composition has genuine impact - with skill-based matchmaking controlling for player differences, the 57% accuracy shows deck choice directly influences outcomes

For the broader machine learning community: This experiment demonstrates how accessible modern ML research has become. Using only a MacBook M2 and about an hour of data collection plus 17 minutes of total training time, we extracted meaningful insights from real-world gaming data. The democratization of both data access and ML tools has never been more apparent.

The journey from hypothesis to neural networks proved that even simple questions can lead to sophisticated analyses. Sometimes the most interesting discoveries come from applying serious tools to seemingly trivial problems.

Replication & Code

The entire experiment ran locally on a MacBook M2 - data collection took about 1 hour, LightGBM training took 2 minutes, and neural network training took 15 minutes with Metal GPU acceleration. No cloud costs involved!

Want to replicate this experiment or extend it? Here are the key components:

Data Collection: The BFS player exploration approach worked well, but you could also:

Focus on specific trophy ranges
Collect tournament battles only
Include more metadata (player levels, game modes)

Feature Engineering: Binary card presence worked as a baseline, but consider:

Card level information
Elixir cost patterns
Card synergy features
Sequential deck encoding

Model Improvements: Beyond neural networks, you could try:

Ensemble methods combining multiple models
Time-series modeling to capture meta shifts
Active learning to focus on uncertain predictions

The Bigger Picture

This experiment reinforces something I've been thinking about: AI can find patterns in places we don't expect. The fact that a machine learning model can predict Clash Royale outcomes from deck composition alone suggests there's more strategy and less randomness in mobile games than players might think.

The 7.25% edge over random chance might seem small, but in competitive gaming, every percentage point matters. Professional sports teams pay millions for analytics that provide similar edges. Now any curious player can run similar experiments on their favorite games.

Thanks to Supercell for providing the Clash Royale API that made this research possible. If you're curious about the API, check out their developer documentation.

Back to All Posts