Rule embeddings via lattice geometry: An algebraic approach to continuous representation of formal implications

FCA

Embeddings

Machine learning

Theoretical CS

Symbolic AI

Publication idea

This research introduces Rule2Vec_Alg, a novel, deterministic method to embed FCA’s logical implications into a continuous vector space. The algebraic approach preserves the Galois closure and lattice structure by design, bridging symbolic AI with geometric deep learning.

Author

Domingo López Rodríguez

Published

27 November 2025

Keywords

Rule2Vec, Embeddings, Formal implications, Canonical basis, Algebraic embedding, Galois operator, Lattice geometry, PCA

The success of modern machine learning is fundamentally tied to continuous vector representations (embeddings). However, structured logical knowledge, such as the implication rules derived from Formal Concept Analysis (FCA), remains confined to discrete, symbolic space. This paper proposes Rule2Vec_{\text{Alg}}, a novel, deterministic, and purely algebraic embedding that maps the canonical basis of implications to a continuous vector space \mathbb{R}^D. Our approach is based on a structural morphism that explicitly encodes the Galois closure operator and the reticular position of the pseudo-intent, thereby preserving the fundamental logical and lattice-theoretic properties in the vector space. We formally demonstrate how the vector sum preserves the closure operation, and through a comprehensive case study on a planetary dataset, we use Principal Component Analysis (PCA) to show how Euclidean distance reflects logical separation and how major variance axes encode specific algebraic features of the rule set. This work bridges symbolic knowledge discovery (FCA) and geometric learning (Deep Learning), paving the way for vector arithmetic-based Conceptual Exploration.

Introduction

Formal Concept Analysis (FCA) provides a powerful algebraic framework for data mining based on lattice theory […]. The core output of FCA is the concept lattice \mathcal{L}(\mathbb{K}) and the canonical basis of implications (Stem Base) […], which concisely captures all dependencies within a dataset. While powerful, this output resides in a discrete, symbolic domain, hindering its integration with modern continuous representation models.

Our goal is to define a function \Phi: \text{Implications}(\mathbb{K}) \to \mathbb{R}^D that moves the logical structure into a geometric space. Unlike conventional embeddings (e.g., Word2Vec, Node2Vec) that rely on sampling and stochastic optimization, our Rule2Vec_{\text{Alg}} is a direct algebraic construction designed to satisfy key lattice-theoretic constraints a priori.

The algebraic Rule2Vec formalism (Rule2Vec_{\text{Alg}})

The Rule2Vec_{\text{Alg}} is a deterministic, non-learned map \Phi based on the vector representation of the attribute space V_M = \mathbb{R}^{|M|}. Given |M| = n and k=|\mathcal{M}| (the number of meet-irreducible intents), the total dimension is D=2n+k.

\mathbf{v}_r = \Phi(r) = [\mathbf{v}_{\text{Premise}} \mid \mathbf{v}_{\text{Gap}} \mid \mathbf{v}_{\text{Reticular}}] \in \mathbb{R}^{2n+k}

Component I: Premise vector (\mathbf{v}_{\text{Premise}} \in \mathbb{R}^{n})

This component directly encodes the pseudo-intent A as a standard indicator vector \mathbf{v}(A) \in \{0, 1\}^n: \mathbf{v}_{\text{Premise}}(r) = \mathbf{v}(A)

Component II: Gap vector (\mathbf{v}_{\text{Gap}} \in \mathbb{R}^{n})

This component encodes the logical residue B = A'' \setminus A, which is the non-redundant part of the closure. This is the algebraic tension of the implication. \mathbf{v}_{\text{Gap}}(r) = \mathbf{v}(A'') - \mathbf{v}(A) = \mathbf{v}(A'' \setminus A)

Component III: Reticular position vector (\mathbf{v}_{\text{Reticular}} \in \mathbb{R}^{k})

This component anchors the rule to the global geometry of the Concept Lattice \mathcal{L}(\mathbb{K}) using the meet-irreducible intents \mathcal{M} = \{M_1, \dots, M_k\}. It measures the proximity of the premise A to each generator M_j via the Galois-based Jaccard Similarity.

\mathbf{v}_{\text{Reticular}}(r)_j = \frac{|A \cap M_j|}{|A \cup M_j|}

Theoretical properties

Theorem 1: Preservation of the closure operator

The geometric sum of the premise and gap vectors exactly reconstructs the vector of the closed intent A'' in the continuous space, proving a homomorphic property for the closure operation.

Theorem 1: Closure preservation For any canonical implication r: A \to B, the vector sum of its premise and gap components yields the vector representation of the closed intent A'': \mathbf{v}_{\text{Premise}}(r) + \mathbf{v}_{\text{Gap}}(r) = \mathbf{v}(A'')

Theorem 2: Reflection of the reticular metric (conjecture)

The Euclidean distance between two Rule2Vec vectors reflects the logical separation of the rules’ underlying intents within the concept lattice. The \mathbf{v}_{\text{Reticular}} component is key to ensuring that rules are separated not only by their attributes but also by their position relative to the structural generators of the lattice.

Theorem 2: Reticular metric reflection (conjecture) The Euclidean distance \|\mathbf{v}_{r_1} - \mathbf{v}_{r_2}\| is strongly correlated with the logical distance between the closed intents A_1'' and A_2'', effectively mapping the partial order structure onto a geometric metric.

Case study: Analysis of a planetary implication base

We analyze a complete set of 10 implications derived from a formal context \mathbb{K}_{\text{Planets}} with n=7 attributes (M = \{\text{small, near, far, moon, no\_moon, large, medium}\}) and k=7 meet-irreducible intents (\mathcal{M}). The total vector dimension is D=21.

The canonical basis and vector construction

The full set of canonical implications is: 1. r_1: \{\text{no\_moon}\} \to \{\text{small, near}\} (A''=\{\text{small, near, no\_moon}\}) 2. r_2: \{\text{far}\} \to \{\text{moon}\} (A''=\{\text{far, moon}\}) 3. r_3: \{\text{near}\} \to \{\text{small}\} (A''=\{\text{small, near}\}) 4. r_4: \{\text{large}\} \to \{\text{far, moon}\} (A''=\{\text{large, far, moon}\}) 5. r_5: \{\text{medium}\} \to \{\text{far, moon}\} (A''=\{\text{medium, far, moon}\}) 6. r_6: \{\text{medium, large, far, moon}\} \to \{\text{small, near, no\_moon}\} (A''=M) 7. r_7: \{\text{small, near, moon, no\_moon}\} \to \{\text{medium, large, far}\} (A''=M) 8. r_8: \{\text{small, near, far, moon}\} \to \{\text{medium, large, no\_moon}\} (A''=M) 9. r_9: \{\text{small, large, far, moon}\} \to \{\text{medium, near, no\_moon}\} (A''=M) 10. r_{10}: \{\text{small, medium, far, moon}\} \to \{\text{large, near, no\_moon}\} (A''=M)

Table 1 details the full 21-dimensional vector components for three illustrative rules.

Component	R1	R4	R6
Premise components (\mathbf{v}_{\text{Premise}})
small	0.00	0.00	0.00
near	0.00	0.00	0.00
far	0.00	0.00	1.00
moon	0.00	0.00	1.00
no_moon	1.00	0.00	0.00
large	0.00	1.00	1.00
medium	0.00	0.00	1.00
Gap components (\mathbf{v}_{\text{Gap}})
small	1.00	0.00	1.00
near	1.00	0.00	1.00
far	0.00	1.00	0.00
moon	0.00	1.00	0.00
no_moon	0.00	0.00	1.00
large	0.00	0.00	0.00
medium	0.00	0.00	0.00
Reticular components (\mathbf{v}_{\text{Reticular}})
M1	0.00	0.00	0.25
M2	0.00	0.00	0.50
M3	0.00	0.33	0.75
M4	0.00	0.00	0.75
M5	0.00	0.00	0.00
M6	0.00	0.00	0.00
M7	0.33	0.00	0.00

Table 1: Full Rule2Vec_{\text{Alg}} vector components for selected rules (D=21)

Analysis of principal components (PCA)

Principal Component Analysis (PCA) was applied to the 10 vectors in \mathbb{R}^{21} to identify the primary axes of variance, summarizing the algebraic features of the rule set.

The first two components capture a significant portion of the total variance: * PC1: 36.99% of the variance. * PC2: 24.53% of the variance. * Total: 61.52% captured by 2D projection.

Table 2 details the correlation (loadings) of the original 21 dimensions with the two principal components.

Component	PC1	PC2
Premise loadings (P)
P: small	0.361	0.247
P: near	0.059	0.414
P: far	0.339	-0.232
P: moon	0.423	0.059
P: no_moon	-0.047	0.222
P: large	0.067	-0.267
P: medium	0.067	-0.267
Gap loadings (G)
G: small	-0.088	-0.151
G: near	0.214	-0.317
G: far	-0.189	0.194
G: moon	-0.273	-0.097
G: no_moon	0.398	-0.180
G: large	0.249	0.303
G: medium	0.249	0.303
Reticular loadings (R)
R: M1	0.106	0.015
R: M2	0.175	-0.068
R: M3	0.169	-0.142
R: M4	0.169	-0.142
R: M5	0.090	0.062
R: M6	0.074	0.185
R: M7	0.051	0.221

Table 2: PCA loadings and algebraic interpretation of components

Rule	PC1	PC2
R1	-0.908	-0.143
R2	-0.737	-0.428
R3	-0.979	0.458
R4	-1.342	-0.188
R5	-1.342	-0.188
R6	0.784	-1.570
R7	0.317	1.987
R8	1.403	0.995
R9	1.401	-0.462
R10	1.401	-0.462

Table 3: Rule2Vec coordinates in the 2D principal component space

Algebraic interpretation of principal components

PC1: The axis of logical density

PC1 is dominated by positive loadings for high-density components and negative loadings for low-density ones. * Positive PC1: Highly correlated with \mathbf{v}_{\text{Premise}} of ‘moon’ and ‘far’, and \mathbf{v}_{\text{Gap}} of ‘no_moon’ and ‘large/medium’. This axis captures the implication strength. The rules on the far positive side (R8, R9, R10, R6) are the dense rules, defined by complex premises and a closure on the trivial concept (A''=M). * Negative PC1: Correlated with the \mathbf{v}_{\text{Premise}} of simple, specific attributes. Rules like r_4 and r_5 are maximally negative, reflecting rules based on minimal pseudo-intents.

PC1 separates rules based on the size of their closed intent A'' (logical strength).

PC2: The axis of thematic contrast

PC2 is defined by the thematic opposition between ‘near/small/no_moon’ and ‘far/large/medium’. * Positive PC2: Highly correlated with the \mathbf{v}_{\text{Premise}} of ‘near’ and \mathbf{v}_{\text{Reticular}} of M_7 (\{\text{small, near, no\_moon}\}). Rules like r_7 and r_8 are maximally positive, representing implications linked to the small/near sub-lattice. * Negative PC2: Highly correlated with the \mathbf{v}_{\text{Premise}} of ‘large/medium’ and \mathbf{v}_{\text{Gap}} of ‘near’. Rules r_6, r_9, r_{10} are strongly negative, representing rules linked to the large/far sub-lattice.

PC2 separates rules based on the thematic cluster of attributes they concern, effectively reflecting the decomposition of the concept lattice.

Implications of PCA projection

The PCA confirms that the Rule2Vec_{\text{Alg}} successfully translates the structure of the canonical basis into a metric space: * Clustering: Rules r_4 and r_5 (based on \{\text{large}\} and \{\text{medium}\} premises) are numerically near-identical in the 2D space (PC coordinates are almost equal), validating that the embedding captures their equivalence as structural ‘siblings’ in the lattice. * Rule discovery: The inverse mapping of PC components allows for the geometric discovery of prototype rules, where the PC vectors encode the average structure of the dense and sparse rule sets, respectively.

Applications and extensions

Vector arithmetic for conceptual exploration

The most compelling application is using vector arithmetic to suggest new valid implications, a task critical for automated conceptual exploration […]. The inverse mapping from the continuous space to the discrete canonical basis allows for algebraic rule manipulation and automated discovery of rules by analogy.

Generalization to fuzzy relational systems

Building on the work of Bělohlávek on fuzzy relational systems […], the Rule2Vec_{\text{Alg}} can be directly generalized by replacing the binary vectors with membership degree vectors in \mathbb{R}^n, resulting in a fully continuous embedding. This extension allows the embedding to handle the residuum operation central to Residuated Lattices.

Conclusion

We have formalized Rule2Vec_{\text{Alg}}, an algebraic and deterministic method for embedding the canonical basis of implications from FCA into a continuous vector space. The planetary case study, utilizing all 10 implications and a detailed PCA, provides empirical support: logical proximity is directly translated into geometric proximity. The PCA loadings were shown to correspond directly to specific algebraic features (density and thematic contrast) of the rules. Future work involves rigorous proof of the metric reflection theorem and the implementation of vector arithmetic for automated rule discovery.

Work plan

Months 1-3: Formalize the Rule2Vec_Alg components (premise, gap, reticular). Prove Theorem 1 (closure preservation).
Months 4-6: Develop the proof (or formal bounds) for the reticular metric conjecture (Theorem 2). This is the key theoretical challenge.
Months 7-9: Implement the embedding generator as a new module in `fcaR`. Apply it to the planetary dataset and other standard UCI datasets.
Months 10-12: Run the PCA and other geometric analyses. Write the manuscript focusing on the algebraic-to-geometric mapping.

Potential target journals

Information Sciences (Q1): An excellent venue for a novel FCA algorithm that bridges symbolic and sub-symbolic AI.
Knowledge-Based Systems (Q1): A great fit for the focus on knowledge representation and its integration with machine learning.
ICFCA Conference (CORE B): The ideal place to present the core idea to the specialized FCA community and get feedback.

Minimum viable article (MVA) strategy

The split between the binary and fuzzy cases is the most natural.

Paper 1 (The MVA - the binary algebraic embedding):
- Scope: This exact paper. Introduce Rule2Vec_Alg for binary contexts. Provide the proof for Theorem 1 and the empirical validation (PCA) for the conjecture (Theorem 2).
- Goal: To establish the first deterministic, algebraic embedding for formal implications.
- Target venue: ICFCA conference, followed by Information Sciences.
Paper 2 (The fuzzy generalization):
- Scope: As described in the “Generalization to fuzzy relational systems” section. This paper would extend Rule2Vec_Alg to L-FCA, replacing binary vectors with membership-grade vectors. The core contribution would be proving that this fuzzy embedding preserves (or approximates) the fuzzy closure operator and residuum.
- Goal: To generalize the symbolic-geometric bridge to fuzzy logic systems.
- Target venue: Fuzzy Sets and Systems or IEEE Transactions on Fuzzy Systems.