Automating borehole data extraction for Swisstopo
%20-%20main%20image%20(7)%20(1).avif)
Challenge: Bringing structure, scale, and standardization to geological archives
The Swiss Federal Office of Topography (swisstopo) manages borehole data critical to understanding Switzerland’s subsurface geology. However, much of this data, spanning over 150 years, is stored in unstructured PDF documents, including scanned images of historical records. With over 30,000 boreholes already registered and new documents from cantonal offices, federal corporations, and archival sources constantly added, manually digitizing and structuring this information is difficult and time-consuming.
The complexity is further compounded by:
- Diverse document formats: Documents vary widely in layout, structure, and language.
- Low standardization: Geological terms and field naming conventions vary significantly across sources, making it difficult to unify data into a consistent schema.
- Historical content: Many files are legacy scans, requiring interpretation of low-quality images or handwritten text.
- Scaling needs: The constantly increasing number of documents demands automation that is both accurate and scalable.
How we helped
To address these challenges, Visium assisted swisstopo in designing and implementing an advanced document processing pipeline, tailored specifically to the nuances of geological borehole data. The approach combines techniques in computer vision, natural language processing (NLP), and rule-based heuristics to automatically extract, interpret, and classify relevant borehole information from PDF documents.
Key components include:
- Computer vision methods, such as object and line detection, to parse visual layouts and identify structural elements within scanned documents.
- Domain-specific rules and heuristics to extract and classify data reliably, even when documents contain ambiguous or inconsistent information.
- Transformer-based Large Language models to interpret and classify geological terminology from multilingual textual descriptions.
Developed as an open-source solution, this project extends its impact beyond Switzerland, empowering other geological surveys and public institutions to adopt and adapt the same approach for their own data.
The impact: Modernizing geological data infrastructure with AI
The project establishes a scalable and automated pipeline for processing geological borehole documents, significantly reducing manual workload for swisstopo. By integrating machine learning with expert validation workflows, the solution ensures both efficiency and data quality.
Expected benefits include:
- Accelerated data ingestion, unlocking access to decades of previously underutilized geological information.
- Improved data consistency, with structured storage of both metadata and subsurface observations.
- Long-term maintainability and scalability, allowing the system to keep pace with the growing number of borehole entries from various public and private sources.
- Enhanced accessibility for end-users, including researchers, infrastructure planners, and governmental agencies, through a unified and searchable interface.
This initiative represents a significant step toward modernizing Switzerland’s geological data infrastructure and sets a precedent for similar efforts internationally.