Spatio-B-RAG - Methods for Knowledge Retrieval from Spatially Annotated He-terogeneous Data for Predictive Damage Assessment in Bridge Structures
The transition from reactive to prescriptive maintenance strategies for preserving and extending the life cycle of bridge structures requires not only modern sensing and acquisition technologies but also a structured processing and targeted access to the as-built data and inspection history of the assets. The vast amount of information collected over decades harbors immense latent knowledge about bridge conditions (e.g., “a water damage at the left lower corner of the front abutment”), degradation processes, and damage patterns. Leveraging this knowledge enables the derivation of condition indicators and the development of predictive models (see Fig. 1). A retrospective analysis of past maintenance actions further allows for learning from previous interventions, assessing their impact on structural longevity, and ultimately improving decision-making and recommendation processes. In addition, automated and systematic knowledge extraction from legacy data facilitates machine-readable searchability and helps preserve knowledge that is often bound to individuals or specific projects.

However, legacy data on bridge structures is typically available only in semi-structured or unstructured form and can only be interpreted and integrated into a holistic analytical view of the asset (including BIM models, measurement data, point clouds, etc.) by qualified experts. Cross-sectional analyses of such legacy data across thousands of bridges, aiming to detect correlations and degradation trends, are not feasible even for domain experts. They thus require an explicit, machine-readable representation of the contextualized data and their interrelations.
Over decades, inspection reports, damage documentation, and condition measurements have been conducted by various SMEs, resulting in a fragmented and heterogeneous data landscape with considerable noise. Nationwide, inspection data are primarily documented in the SIB-Bauwerke relational database (WPM Ingenieure GmbH, 2020), which relies on predefined text blocks and implicitly localized damage descriptions (“bottom left,” “third beam from the left,” etc.). Construction and rehabilitation plans, whether drawn analogously or digitally, utilize project-specific referencing schemes (e.g., “Axis A, view toward Aachen”). In addition, photographic documentation of damages is often only locatable by project participants, with spatial references described in non-standardized, unstructured text.
Moreover, a significant portion of asset information is contained in free-text documents such as inspection reports, maintenance descriptions, and detailed damage narratives.
For systematic processing and evaluation of these data, it is therefore essential to make the often only implicitly spatially related heterogeneous data explicit and machine-readable, thus enabling integrated and holistic analyses (see Fig. 2).

In the first phase of the SPP subproject RaumLink, the research focused on investigating how the spatial superposition of heterogeneous data can serve as a basis for automatically deriving linkages between diverse resources and aligning information accordingly. Documents occupying the same (project) space are, to varying degrees, semantically related—they may describe the same object (component, defect), its sub-aspects, or provide contextual information.
To this end, rule-based and formal-logical approaches for symbolic knowledge modeling were developed (Göbels & Beetz, 2024; Schulz & Beetz, 2024) to map the heterogeneous datasets of a bridge into an overarching three-dimensional project space (Schulz et al., 2023), compute their spatial overlaps, and thereby generate explicit interconnections between datasets (Göbels et al., 2024). A foundational spatial metadata schema was developed, together with document-type and content-specific rules for its application. The resulting data are stored and linked within spatial, container-based environments such as Common Data Environments (CDEs) (Schapke et al., 2018; Werbrouck, 2024) or digital archives (e.g., ICDD) (Senthilvel, 2024).
Given the high individuality of (as-built) structural data, symbolic processing reaches its limits with respect to scalability and robustness in handling inconsistent, manually produced data. While a fully automated spatial localization process could be achieved for (semi-)structured SIB-Bauwerke data, the type categorization and extraction of spatial references from unstructured data (texts, plans, and photographs) so far required manual effort. Purely rule-based processing also bears the risk of information loss, as it only captures data conforming to predefined rules. Data outside these formal constraints remains unconsidered, resulting in gaps in the knowledge graph that hinder logical inference and reasoning.
The follow-up project Spatio-B-RAG focuses on developing a Retrieval-Augmentation Generator (RAG) that leverages domain-specific spatial knowledge to enable predictive damage assessment of bridge structures. By integrating heterogeneous, multimodal data, such as inspection reports, construction drawings, photographic documentation, and point clouds, within a spatial context, a dynamic knowledge system is established. The foundation consists of the spatially annotated knowledge graphs resulting from the RaumLink project, in which semi-structured bridge inspection data from the SIB-Bauwerke database were modeled using the ReLoc Ontology based on the Anweisung Straßeninformationsbank, Teilsystem Bauwerksdaten (ASB-ING). These were developed and validated on the reference bridge in Worms and now serve as training data for domain-specific RAGs for bridge models. The RAG will be developed using both symbolic and subsymbolic methods, including Named Entity Recognition (NER) with BERT, multimodal models such as VisualBERT and LXMERT, and Graph Neural Networks (GNNs), to efficiently process textual, visual, and 3D data. The integration of spatial relations and domain ontologies enables efficient information retrieval and provides the foundation for predictive modeling and maintenance decision support.
Team
Publications
Göbels, A., Schulz, O., & Beetz, J. (2024). Towards a Common Digital Space: Proposing a Schema for Spatially Linking Heterogeneous Resources. Proceedings of the 41st International Conference of CIB W78. CIB W78, Morocco, Marrakech. https://itc.scix.net/paper/w78-2024-13
Schapke, S.-E., Beetz, J., König, M., Koch, C., & Borrmann, A. (2018). Collaborative Data Management. In A. Borrmann, M. König, C. Koch, & J. Beetz (Hrsg.), Building Information Modeling (S. 251–277). Springer International Publishing. https://doi.org/10.1007/978-3-319-92862-3_14
Schulz, O., Werbrouck, J., & Beetz, J. (2023). Towards Scene Graph Descriptions for Spatial Representations in the Built Environment. Proceedings of the 30th EG-ICE International Workshop on Intelligent Computing in Engineering. EG-ICE 2023: International Conference on Intelligent Computing in Engineering, London, United Kingdom.



