Beyond certainty: Defining and analyzing relevance measures for formal implications

FCA
Implications
Association rules
Knowledge discovery
Data mining
Publication idea

This paper addresses the “informativity” problem of large canonical bases in FCA. We propose a framework of relevance measures (like lift, improvement, and stability) adapted from association rule mining to filter and rank formal implications, making them more interpretable and useful for domain experts.

Author

Domingo López Rodríguez, Kira Adaricheva

Published

16 November 2025

Keywords

Relevance measures, Implication stability, Lift, Improvement, Canonical basis, Interpretability

While formal implications are, by definition, 100% certain, they are not all equally informative or relevant to a human expert. A canonical basis for a real-world dataset can contain thousands of rules, making it uninterpretable without filtering or ranking. This paper addresses this “informativity” problem by proposing a systematic framework of relevance measures for formal implications. We adapt well-known measures from association rule mining—such as lift, improvement, and conviction—to the specific context of formal implications. We analyze the algebraic properties of these measures and the preference relations they induce. Furthermore, we introduce the concept of “implication stability,” analogous to concept stability, which quantifies a rule’s robustness to data subsampling. We provide efficient algorithms for computing these measures and demonstrate through a case study on a biological dataset how they can be used to filter and rank implications, revealing the most significant and robust knowledge for domain experts.

Introduction

A key promise of Formal Concept Analysis (FCA) is its ability to provide an exhaustive and sound representation of knowledge. The canonical basis of implications is the most concise such representation. However, exhaustiveness often comes at the cost of interpretability. In a complex domain, such as modeling dengue zoonosis, the canonical basis may contain thousands of logically valid but practically trivial or uninteresting rules, overwhelming the domain experts.

The problem is that all implications are treated equally, as they all have a confidence of 1. What is missing are criteria to distinguish between an implication that captures a deep, non-obvious dependency and one that is a trivial consequence of the data’s structure.

This paper fills this gap by formalizing measures of relevance for formal implications. Our contributions are:

  1. The adaptation of established quality measures from association rule mining (e.g., lift, improvement) to the context of formal implications.
  2. The introduction of a novel measure, implication stability, to quantify rule robustness.
  3. A formal analysis of the algebraic properties of the preference relations induced by these measures.
  4. A demonstration of how these measures can be used in a real-world workflow to aid expert knowledge discovery.

Methodology

Our methodology involves defining a suite of relevance measures and analyzing their properties. Let \mathbb{K}=(G,M,I) be a context and A \to B be a valid implication.

Adapting association rule measures

Unlike association rules, the confidence of A \to B is always 1. However, other measures can be adapted. Let supp(X) = |X^\downarrow|/|G|. * Support: The support of the rule, supp(A \cup B), remains a primary measure of generality. * Lift: Lift is defined as \frac{conf(A \to B)}{supp(B)} = \frac{1}{supp(B)}. An implication with a high lift represents a dependency that is much more frequent than would be expected if the consequent were independent of the antecedent. * Improvement: Defined as \min_{A' \subset A} (conf(A \to B) - conf(A' \to B)). In our context, this becomes \min_{A' \subset A} (1 - conf(A' \to B)). A high improvement value indicates that the full antecedent A is truly necessary for the implication to hold with certainty.

Implication stability

Analogous to concept stability, we define the stability of an implication A \to B as the probability that the implication remains valid in a sub-context formed by randomly selecting a subset of objects from G. This measures the rule’s resilience to noise or changes in the data.

Formal analysis

For each measure \mu, we will study the binary relation \succeq_\mu on the set of implications, where \varphi_1 \succeq_\mu \varphi_2 if \mu(\varphi_1) \geq \mu(\varphi_2). We will analyze properties like transitivity, and whether different measures induce similar or conflicting orderings.

Case study

We will apply these measures to the dengue zoonosis dataset mentioned in the research memorandum. We will compute the canonical basis, rank the implications using each of the proposed measures, and, in collaboration with biologists, evaluate which measures best align with their expert intuition of what constitutes an “important” rule.

Work plan

  • Months 1-3: Formal definition of all measures and rigorous analysis of their algebraic properties. This will be the focus of the collaboration with Prof. Kira Adaricheva.
  • Months 4-5: Develop efficient algorithms for computing each measure for all implications in a basis.
  • Months 6-8: Apply the methods to the case study. Analyze the rankings and work with domain experts to interpret the results.
  • Months 9-12: Write the manuscript, combining the formal definitions, the algorithmic aspects, and the compelling results from the case study.

Potential target journals

  1. Knowledge-Based Systems (Q1): An ideal target, as it welcomes papers on novel knowledge representation techniques and their practical application for expert systems.
  2. Data Mining and Knowledge Discovery (Q1): A top journal that would be interested in the formal connection between association rule mining metrics and FCA.
  3. Journal of Biomedical Informatics (Q1): A strong option if the results of the dengue zoonosis case study are particularly impactful and clearly demonstrate the method’s utility in a real-world biomedical problem.

Minimum viable article (MVA) strategy

This topic can be divided into a foundational paper and an application paper.

  • Paper 1 (The MVA - the framework paper):
    • Scope: Introduce and formally define the suite of relevance measures (support, lift, improvement, stability). Provide a thorough analysis of their mathematical properties and the relationships between them. Include the algorithms for their computation. The case study can be used as a small, illustrative example.
    • Goal: To establish a formal framework for ranking and filtering implications in the FCA literature.
    • Target venue: A journal with a theoretical-yet-practical focus like International Journal of Approximate Reasoning or a submission to the ICFCA conference.
  • Paper 2 (The in-depth case study):
    • Scope: This paper would be a full-fledged application paper. It would briefly cite Paper 1 for the definitions, but its focus would be a deep dive into a complex real-world problem (e.g., dengue zoonosis, e-learning data). It would demonstrate the entire workflow: data preprocessing, basis computation, ranking by multiple measures, and, crucially, a detailed interpretation of the top-ranked rules in collaboration with domain experts, showing how the method leads to new insights.
    • Goal: To provide a compelling demonstration of the practical necessity and power of relevance measures for making FCA useful to non-experts.
    • Target venue: An applied, domain-specific journal like Journal of Biomedical Informatics or an application-focused AI journal like Expert Systems with Applications.