Rule embeddings via lattice geometry: An algebraic approach to continuous representation of formal implications
This research introduces Rule2Vec_Alg, a novel, deterministic method to embed FCA’s logical implications into a continuous vector space. The algebraic approach preserves the Galois closure and lattice structure by design, bridging symbolic AI with geometric deep learning.
Rule2Vec, Embeddings, Formal implications, Canonical basis, Algebraic embedding, Galois operator, Lattice geometry, PCA
The success of modern machine learning is fundamentally tied to continuous vector representations (embeddings). However, structured logical knowledge, such as the implication rules derived from Formal Concept Analysis (FCA), remains confined to discrete, symbolic space. This paper proposes Rule2Vec, a novel, deterministic, and purely algebraic embedding that maps the canonical basis of implications to a continuous vector space . Our approach is based on a structural morphism that explicitly encodes the Galois closure operator and the reticular position of the pseudo-intent, thereby preserving the fundamental logical and lattice-theoretic properties in the vector space. We formally demonstrate how the vector sum preserves the closure operation, and through a comprehensive case study on a planetary dataset, we use Principal Component Analysis (PCA) to show how Euclidean distance reflects logical separation and how major variance axes encode specific algebraic features of the rule set. This work bridges symbolic knowledge discovery (FCA) and geometric learning (Deep Learning), paving the way for vector arithmetic-based Conceptual Exploration.
Introduction
Formal Concept Analysis (FCA) provides a powerful algebraic framework for data mining based on lattice theory […]. The core output of FCA is the concept lattice and the canonical basis of implications (Stem Base) […], which concisely captures all dependencies within a dataset. While powerful, this output resides in a discrete, symbolic domain, hindering its integration with modern continuous representation models.
Our goal is to define a function that moves the logical structure into a geometric space. Unlike conventional embeddings (e.g., Word2Vec, Node2Vec) that rely on sampling and stochastic optimization, our Rule2Vec is a direct algebraic construction designed to satisfy key lattice-theoretic constraints a priori.
The algebraic Rule2Vec formalism (Rule2Vec)
The Rule2Vec is a deterministic, non-learned map based on the vector representation of the attribute space . Given and (the number of meet-irreducible intents), the total dimension is .
Component I: Premise vector ()
This component directly encodes the pseudo-intent as a standard indicator vector :
Component II: Gap vector ()
This component encodes the logical residue , which is the non-redundant part of the closure. This is the algebraic tension of the implication.
Component III: Reticular position vector ()
This component anchors the rule to the global geometry of the Concept Lattice using the meet-irreducible intents . It measures the proximity of the premise to each generator via the Galois-based Jaccard Similarity.
Theoretical properties
Theorem 1: Preservation of the closure operator
The geometric sum of the premise and gap vectors exactly reconstructs the vector of the closed intent in the continuous space, proving a homomorphic property for the closure operation.
Theorem 1: Closure preservation For any canonical implication , the vector sum of its premise and gap components yields the vector representation of the closed intent :
Theorem 2: Reflection of the reticular metric (conjecture)
The Euclidean distance between two Rule2Vec vectors reflects the logical separation of the rules’ underlying intents within the concept lattice. The component is key to ensuring that rules are separated not only by their attributes but also by their position relative to the structural generators of the lattice.
Theorem 2: Reticular metric reflection (conjecture) The Euclidean distance is strongly correlated with the logical distance between the closed intents and , effectively mapping the partial order structure onto a geometric metric.
Case study: Analysis of a planetary implication base
We analyze a complete set of 10 implications derived from a formal context with attributes () and meet-irreducible intents (). The total vector dimension is .
The canonical basis and vector construction
The full set of canonical implications is: 1. () 2. () 3. () 4. () 5. () 6. () 7. () 8. () 9. () 10. ()
Table 1 details the full 21-dimensional vector components for three illustrative rules.
| Component | R1 | R4 | R6 |
|---|---|---|---|
| Premise components () | |||
| small | 0.00 | 0.00 | 0.00 |
| near | 0.00 | 0.00 | 0.00 |
| far | 0.00 | 0.00 | 1.00 |
| moon | 0.00 | 0.00 | 1.00 |
| no_moon | 1.00 | 0.00 | 0.00 |
| large | 0.00 | 1.00 | 1.00 |
| medium | 0.00 | 0.00 | 1.00 |
| Gap components () | |||
| small | 1.00 | 0.00 | 1.00 |
| near | 1.00 | 0.00 | 1.00 |
| far | 0.00 | 1.00 | 0.00 |
| moon | 0.00 | 1.00 | 0.00 |
| no_moon | 0.00 | 0.00 | 1.00 |
| large | 0.00 | 0.00 | 0.00 |
| medium | 0.00 | 0.00 | 0.00 |
| Reticular components () | |||
| M1 | 0.00 | 0.00 | 0.25 |
| M2 | 0.00 | 0.00 | 0.50 |
| M3 | 0.00 | 0.33 | 0.75 |
| M4 | 0.00 | 0.00 | 0.75 |
| M5 | 0.00 | 0.00 | 0.00 |
| M6 | 0.00 | 0.00 | 0.00 |
| M7 | 0.33 | 0.00 | 0.00 |
Table 1: Full Rule2Vec vector components for selected rules ()
Analysis of principal components (PCA)
Principal Component Analysis (PCA) was applied to the 10 vectors in to identify the primary axes of variance, summarizing the algebraic features of the rule set.
The first two components capture a significant portion of the total variance: * PC1: 36.99% of the variance. * PC2: 24.53% of the variance. * Total: 61.52% captured by 2D projection.
Table 2 details the correlation (loadings) of the original 21 dimensions with the two principal components.
| Component | PC1 | PC2 |
|---|---|---|
| Premise loadings (P) | ||
| P: small | 0.361 | 0.247 |
| P: near | 0.059 | 0.414 |
| P: far | 0.339 | -0.232 |
| P: moon | 0.423 | 0.059 |
| P: no_moon | -0.047 | 0.222 |
| P: large | 0.067 | -0.267 |
| P: medium | 0.067 | -0.267 |
| Gap loadings (G) | ||
| G: small | -0.088 | -0.151 |
| G: near | 0.214 | -0.317 |
| G: far | -0.189 | 0.194 |
| G: moon | -0.273 | -0.097 |
| G: no_moon | 0.398 | -0.180 |
| G: large | 0.249 | 0.303 |
| G: medium | 0.249 | 0.303 |
| Reticular loadings (R) | ||
| R: M1 | 0.106 | 0.015 |
| R: M2 | 0.175 | -0.068 |
| R: M3 | 0.169 | -0.142 |
| R: M4 | 0.169 | -0.142 |
| R: M5 | 0.090 | 0.062 |
| R: M6 | 0.074 | 0.185 |
| R: M7 | 0.051 | 0.221 |
Table 2: PCA loadings and algebraic interpretation of components
| Rule | PC1 | PC2 |
|---|---|---|
| R1 | -0.908 | -0.143 |
| R2 | -0.737 | -0.428 |
| R3 | -0.979 | 0.458 |
| R4 | -1.342 | -0.188 |
| R5 | -1.342 | -0.188 |
| R6 | 0.784 | -1.570 |
| R7 | 0.317 | 1.987 |
| R8 | 1.403 | 0.995 |
| R9 | 1.401 | -0.462 |
| R10 | 1.401 | -0.462 |
Table 3: Rule2Vec coordinates in the 2D principal component space
Algebraic interpretation of principal components
PC1: The axis of logical density
PC1 is dominated by positive loadings for high-density components and negative loadings for low-density ones. * Positive PC1: Highly correlated with of ‘moon’ and ‘far’, and of ‘no_moon’ and ‘large/medium’. This axis captures the implication strength. The rules on the far positive side (R8, R9, R10, R6) are the dense rules, defined by complex premises and a closure on the trivial concept (). * Negative PC1: Correlated with the of simple, specific attributes. Rules like and are maximally negative, reflecting rules based on minimal pseudo-intents.
PC1 separates rules based on the size of their closed intent (logical strength).
PC2: The axis of thematic contrast
PC2 is defined by the thematic opposition between ‘near/small/no_moon’ and ‘far/large/medium’. * Positive PC2: Highly correlated with the of ‘near’ and of (). Rules like and are maximally positive, representing implications linked to the small/near sub-lattice. * Negative PC2: Highly correlated with the of ‘large/medium’ and of ‘near’. Rules , , are strongly negative, representing rules linked to the large/far sub-lattice.
PC2 separates rules based on the thematic cluster of attributes they concern, effectively reflecting the decomposition of the concept lattice.
Implications of PCA projection
The PCA confirms that the Rule2Vec successfully translates the structure of the canonical basis into a metric space: * Clustering: Rules and (based on and premises) are numerically near-identical in the 2D space (PC coordinates are almost equal), validating that the embedding captures their equivalence as structural ‘siblings’ in the lattice. * Rule discovery: The inverse mapping of PC components allows for the geometric discovery of prototype rules, where the PC vectors encode the average structure of the dense and sparse rule sets, respectively.
Applications and extensions
Vector arithmetic for conceptual exploration
The most compelling application is using vector arithmetic to suggest new valid implications, a task critical for automated conceptual exploration […]. The inverse mapping from the continuous space to the discrete canonical basis allows for algebraic rule manipulation and automated discovery of rules by analogy.
Generalization to fuzzy relational systems
Building on the work of Bělohlávek on fuzzy relational systems […], the Rule2Vec can be directly generalized by replacing the binary vectors with membership degree vectors in , resulting in a fully continuous embedding. This extension allows the embedding to handle the residuum operation central to Residuated Lattices.
Conclusion
We have formalized Rule2Vec, an algebraic and deterministic method for embedding the canonical basis of implications from FCA into a continuous vector space. The planetary case study, utilizing all 10 implications and a detailed PCA, provides empirical support: logical proximity is directly translated into geometric proximity. The PCA loadings were shown to correspond directly to specific algebraic features (density and thematic contrast) of the rules. Future work involves rigorous proof of the metric reflection theorem and the implementation of vector arithmetic for automated rule discovery.
Work plan
- Months 1-3: Formalize the Rule2Vec_Alg components (premise, gap, reticular). Prove Theorem 1 (closure preservation).
- Months 4-6: Develop the proof (or formal bounds) for the reticular metric conjecture (Theorem 2). This is the key theoretical challenge.
- Months 7-9: Implement the embedding generator as a new module in
`fcaR`. Apply it to the planetary dataset and other standard UCI datasets. - Months 10-12: Run the PCA and other geometric analyses. Write the manuscript focusing on the algebraic-to-geometric mapping.
Potential target journals
- Information Sciences (Q1): An excellent venue for a novel FCA algorithm that bridges symbolic and sub-symbolic AI.
- Knowledge-Based Systems (Q1): A great fit for the focus on knowledge representation and its integration with machine learning.
- ICFCA Conference (CORE B): The ideal place to present the core idea to the specialized FCA community and get feedback.
Minimum viable article (MVA) strategy
The split between the binary and fuzzy cases is the most natural.
- Paper 1 (The MVA - the binary algebraic embedding):
- Scope: This exact paper. Introduce Rule2Vec_Alg for binary contexts. Provide the proof for Theorem 1 and the empirical validation (PCA) for the conjecture (Theorem 2).
- Goal: To establish the first deterministic, algebraic embedding for formal implications.
- Target venue: ICFCA conference, followed by Information Sciences.
- Paper 2 (The fuzzy generalization):
- Scope: As described in the “Generalization to fuzzy relational systems” section. This paper would extend Rule2Vec_Alg to L-FCA, replacing binary vectors with membership-grade vectors. The core contribution would be proving that this fuzzy embedding preserves (or approximates) the fuzzy closure operator and residuum.
- Goal: To generalize the symbolic-geometric bridge to fuzzy logic systems.
- Target venue: Fuzzy Sets and Systems or IEEE Transactions on Fuzzy Systems.