Guess Who: A Recommender System with `fcaR`

Author

fcaR team

Published

8 July 2026

Introduction

Can we use Formal Concept Analysis (FCA) to build a game-playing agent? In this demo, we will recreate the logic behind games like “Guess Who”, using the fcaR package as our artificial intelligence engine.

This demo was initially an idea from my son Daniel, who asked me if we could use FCA to build a “Guess Who” agent. He had already implemented an Akinator-like agent using FCA, and he thought it would be fun to recreate the logic for “Guess Who”.

The core of our agent relies on the Simplification Logic (SL), a robust mathematical framework that allows us to not only deduce hidden information but also drastically reduce the search space (the number of remaining questions) as the game progresses.

Let’s see how it works!

1. Preparing the Formal Context

First, we need a dataset describing our characters. The fcaR package provides a dataset called guesswho, containing the 24 classic characters from the original board game and featuring 19 binary attributes (Hair color, Gender, Accessories, etc.).

However, to build a conversational system that can reason about both “YES” and “NO” answers, we must dichotomize our context. This means that for every attribute like Glasses, we explicitly create its negative counterpart no-Glasses.

Code

# 1. Load the Guess Who matrix
data("guesswho")

# 2. Dichotomize the context
I <- guesswho
I_neg <- 1 - I
colnames(I_neg) <- paste0("no-", colnames(I))

# Bind them together
I_final <- cbind(I, I_neg)

# Create the formal context
fc <- FormalContext$new(I_final)

By doing this, our context now contains 24 objects (characters) and 38 attributes (both positive and negative).

	Male	Female	Hat	no-Male	no-Female	no-Hat
Alex	1	0	0	0	1	1
Alfred	1	0	0	0	1	1
Anita	0	1	0	1	0	1
Anne	0	1	0	1	0	1
Bernard	1	0	1	0	1	0

2. Extracting Knowledge as Implications

The “brain” of our game is the set of implication rules extracted from the context. These rules encode the logical dependencies between attributes.

Code

fc$find_implications()

We have extracted 298 implications. To understand what these rules mean, let’s look, for instance, at rule ({Moustache} -> {Male}) which dictates that anyone with a moustache is male. Let’s save the full set of implications to use during our game:

Code

imps <- fc$implications$clone()

3. The Game Engine: Simplification Logic

How do we decide which question to ask next? Instead of randomly guessing or using simple probabilities, we can leverage the rules themselves.

A powerful heuristic is to evaluate which attribute simplifies our problem the most. Mathematically, giving a set of active implications \Sigma, we can compute the “importance” of an attribute a by counting how many times it appears in the left-hand side (LHS) of the rules with positive support:

\text{Importance}(a) = \sum_{A \to B \in \Sigma} I(a \in A) Where I(a \in A) is the indicator function. The more often an attribute is needed to trigger a deduction (LHS), the more useful it is to ask about it!

We will select the attribute that maximizes \text{Importance}(a) + \text{Importance}(\text{no-}a) among the remaining unknown features.

A Simulated Turn

Let’s simulate a game where the user has thought of the character Richard. Richard is a Male with Brown Hair, Brown Eyes, Bald, Moustache, and Beard.

At the beginning, we don’t know anything. We initialize our knowledge as a vector of zeros:

Code

# x is just a vector which will store the attributes we know are true (1) or false (0), both by our questions or by deductions from the rules.
x <- rep(0, length(fc$attributes))
names(x) <- fc$attributes

Asking Questions

Using the mathematical heuristic described above –defined by the importance function–, our system determines the most promising attribute and asks about it.

Let’s say the heuristic tells us to ask: “Does your character have the attribute ‘Sad’?” The user answers: NO.

We register this in our state vector:

Code

x["no-Sad"] <- 1

The Magic of `reduce = TRUE`

Here is where the Simplification Logic (SL) shines. We convert our knowledge x into an fcaR Set, and compute its closure against the implications.

But we don’t just want the closure. By using the parameter reduce = TRUE, the SL engine will:

Infer all logical consequences of our current knowledge.
Remove and simplify the rules that are no longer relevant or have already been triggered, returning a highly reduced set of active rules for the next turn!

Code

S <- fcaR::as_Set(x)
closure <- imps$closure(S, reduce = TRUE)

# 1. Update our knowledge with FCA deductions
x <- as.numeric(fcaR::as_vector(closure$closure))
names(x) <- fc$attributes

# 2. Update our active rules!
imps <- closure$implications

How many rules do we have left now?

Code

imps$cardinality()

[1] 282

Just by answering “NO” to “Sad”, the Simplification Logic automatically pruned dozens of obsolete rules!

Peeking at the Reduced Implications

To understand how the system chooses the next question, let’s look at the remaining rules that have actual support among the remaining candidates.

Code

# Filter the active rules (support > 0)
active_rules <- imps[imps$support() > 0]

# Let's peek at the first 3 active rules:
active_rules[1:3]

Implication set with 3 implications.
Rule 1: {no-Beard, no-BigNose, no-Earrings} -> {no-BrownHair}
Rule 2: {no-Moustache, no-BigNose, no-Earrings} -> {no-BrownHair}
Rule 3: {no-Moustache, no-RosyCheeks, no-BigNose} -> {no-BigMouth}

Notice how certain attributes continuously appear on the left hand side (LHS) of these rules. Our Importance(a) heuristic works precisely by counting these occurrences. If an attribute appears heavily on the LHS, it means that answering a question about it will likely trigger multiple rule deductions, drastically reducing our search space.

Let’s query the system to count which unknown attributes appear the most in the LHS of these active rules:

Code

# Count occurrences of each attribute in the LHS
lhs_counts <- Matrix::rowSums(active_rules$get_LHS_matrix())

# Remove attributes we already know
unknown_counts <- lhs_counts[x == 0]

# Show the top 5 most promising questions!
head(sort(unknown_counts, decreasing = TRUE), 5)

  no-Earrings no-RosyCheeks    no-RedHair   no-BigMouth      no-Beard 
          102            98            91            91            90

As we can see, attributes like no-Earrings and no-RosyCheeks appear at the very top. They are the most critical pieces of information missing from the puzzle. By combining both positive and negative weights, the heuristic will confidently select Earrings as the optimal next question.

A Chain Reaction

The true power of this method appears when a sequence of answers triggers a chain of implications. Below is the complete trace of the system trying to guess Richard:

========================================================
 STARTING GAME: Trying to guess 'Richard'
========================================================

[Question 1] Does your character have 'Sad'?
  -> Answer: NO
  -> Remaining rules in the base: 282
  -> Remaining candidates (22)

[Question 2] Does your character have 'Earrings'?
  -> Answer: NO
  -> Remaining rules in the base: 231
  -> Remaining candidates (19)

[Question 3] Does your character have 'RosyCheeks'?
  -> Answer: NO
  -> Remaining rules in the base: 196
  -> Remaining candidates (15)
  
[Question 4] Does your character have 'Beard'?
  -> Answer: YES

When we register Beard = YES and run the closure with reduce = TRUE, a massive chain reaction occurs:

  -> [FCA Simplification]: By closure, automatically deduced: Male, BrownEyes, no-Female, no-BlackHair, no-WhiteHair, no-RedHair, no-BlueEyes, no-Hat, no-Glasses, no-BigNose, no-BigMouth

Without asking any further questions, the system deduced 11 additional attributes! The rules implicitly encoded that everyone with a beard in this game is a male with brown eyes.

And the number of remaining rules plummets:

  -> Remaining rules in the base: 51
  -> Remaining candidates (2): David, Richard

Our next heuristic pick is the deciding factor between the two:

[Question 5] Does your character have 'Bald'?
  -> Answer: YES
  -> [FCA Simplification]: By closure, automatically deduced: BrownHair, Moustache, no-BlondeHair
  -> Remaining rules in the base: 37
  -> Remaining candidates (1): Richard

========================================================
GAME OVER! The character is 'Richard'. Guessed in 5 questions.
Result: SUCCESS.
========================================================

Conclusion

By combining Formal Concept Analysis, Dichotomized Contexts, and the Simplification Logic (reduce = TRUE), fcaR behaves exactly like an expert recommender system. It actively reduces its knowledge space, deduces redundant questions automatically, and converges to the solution in extremely few steps, mimicking human deductive reasoning.

Disclaimer

I thought Daniel’s idea was great, and I hope you enjoy it!

Introduction

1. Preparing the Formal Context

2. Extracting Knowledge as Implications

3. The Game Engine: Simplification Logic

A Simulated Turn

Asking Questions

The Magic of reduce = TRUE

Peeking at the Reduced Implications

A Chain Reaction

Conclusion

Disclaimer

The Magic of `reduce = TRUE`