fca-core: A high-performance C++ backend for multi-language formal concept analysis

FCA

Software

C++

Python

Performance

Publication idea

This paper presents fca-core, a new high-performance C++ library for FCA algorithms. It is designed with a C API to provide lightweight wrappers for languages like R (powering the new fcaR) and Python. Benchmarks show this backend provides speedups of one to two orders of magnitude, making large-scale FCA more accessible.

Author

Domingo López Rodríguez, Dominik Dürrschnabel, Johannes Hirth, Tobias Hille, and the GIMAC group

Published

2 December 2025

Keywords

fca-core, fcaR, pyfca, High-performance computing, C++ backend, Software architecture, JOSS, R journal

The practical application of Formal Concept Analysis (FCA) is often hampered by the performance limitations of implementations in high-level scripting languages. To address this, we present `fca-core`, a new, open-source, high-performance C++ library for fundamental FCA algorithms. The library is designed for efficiency, scalability, and extensibility, providing optimized implementations for concept lattice construction (e.g., InClose, FastCbO), implication basis computation, and more. To ensure broad accessibility, `fca-core` is designed with a clean C API, enabling the creation of lightweight wrappers for various high-level languages. We describe the architecture of the C++ backend and present the existing wrappers for R (as the new backend for the popular `fcaR` package) and Python. Benchmarks demonstrate that this new architecture provides speedups of one to two orders of magnitude compared to previous pure-R implementations, making large-scale FCA accessible to a wider community of data scientists and researchers.

Introduction

As datasets grow, the need for efficient computational tools becomes paramount. While languages like R and Python offer excellent ecosystems for data science, their interpreted nature can be a bottleneck for computationally intensive algorithms like those in FCA. The `fcaR` package, while successful, faces performance challenges on large contexts.

To solve this problem, we have initiated a collaborative project to re-engineer the computational heart of FCA tooling. This paper introduces `fca-core`, the result of this effort. It is not just another FCA library, but a foundational backend designed to power a new generation of multi-language FCA tools.

Our contributions are:

The design and implementation of `fca-core`, a header-only C++ library with highly optimized FCA algorithms.
A stable C Application Binary Interface (ABI) that allows for easy integration with any programming language that can call C functions.
The presentation of two wrapper libraries: an updated `fcaR` that uses `fca-core` as its backend, and a new Python package, `pyfca`.
A comprehensive performance benchmark showing significant speedups over existing tools.

The entire ecosystem is available under a permissive open-source license at fcarepository.org.

System architecture

The architecture is designed in two layers for maximum performance and flexibility.

Layer 1: The C++ core (`fca-core`)

The core is a modern C++ (C++17) header-only library. This choice allows for compile-time optimizations and easy integration into other C++ projects. Key features include: * Efficient data structures: Custom bitset representations for formal contexts and concepts to maximize speed and minimize memory usage. * Optimized algorithms: State-of-the-art implementations of InClose, FastCbO, LinCbO, and other key algorithms, with a focus on cache efficiency and performance. * Templated design: The library is templated to allow for easy extension to different data types, forming the basis for future work on L-FCA.

Layer 2: The C API and wrappers

To be universally accessible, the C++ functionality is exposed through a stable C API. This API handles memory management and provides simple function calls for high-level languages. * R wrapper (`fcaR` v2.0): The existing `fcaR` package has been refactored. The R code now primarily handles data preparation and visualization, while all heavy computation is delegated to `fca-core` via R’s `.C` interface. * Python wrapper (`pyfca`): A new Python package has been created using tools like Cython or pybind11. It exposes the core functionality to the Python ecosystem, allowing seamless integration with libraries like NumPy, pandas, and scikit-learn.

Performance evaluation

We conducted benchmarks comparing the new `fcaR` (with C++ backend) against the old, pure-R version. Experiments were run for lattice construction on contexts of increasing size and density. The results show speedups ranging from 10x to over 100x, with the largest gains seen on larger, more complex contexts. This performance leap transforms FCA from a tool for small-to-medium datasets into one viable for genuine big data challenges.

Work plan (for the project)

Phase 1 (completed): Design of C++ core, implementation of key binary FCA algorithms. Creation of R and Python wrappers.
Phase 2 (ongoing): Extension of the C++ core to support L-FCA natively. This involves templating the algorithms to work with different lattice structures.
Phase 3 (future): Integration of algorithms for mixed contexts, bonds, and other advanced FCA structures.

This paper focuses on describing the completed Phase 1 and its impact.

Potential target journals

The R Journal (Q2): An ideal venue, as it serves as a follow-up to the original `fcaR` paper and announces a major upgrade to the R community.
Journal of Open Source Software (JOSS): A highly respected journal for publishing papers that describe significant and reusable scientific software.
SoftwareX (Q2): Similar to JOSS, it focuses on the publication of impactful scientific software.

Minimum viable article (MVA) strategy

A software paper is already an MVA. The goal is to publish the system and its impact.

Paper 1 (The MVA - system description):
- Scope: The paper as described above. It introduces the C++ core, the wrapper architecture, and presents compelling benchmarks demonstrating the performance gains. It serves as the official academic citation for the new software ecosystem.
- Goal: To announce the new high-performance FCA toolkit to the world and make it citable.
- Target venue: The R Journal is the top choice because it builds on the existing `fcaR` brand. JOSS is an excellent second choice for its focus on software quality and impact.