Artificial Intelligence for Descriptive Geology
There is a fundamental challenge in geology: the discipline heavily relies on descriptive data, yet humans are often inconsistent and inefficient in this task. In 2015, advancements in deep learning, such as AlexNet and the rise of computer vision, inspired us to consider applying these methods to unstructured geological data. This gave rise to a broad program of research within our group that continues to this day.
Size matters. The diagram on the right shows the results of our investigation on the impact of dataset size (Dawson et al, 2023). Spoiler alert: the best network architecture to use depends on the size of the dataset available to you. There is no one size fits all.
Earth Science data often consists of images. This is notably the case for core photographs, or thin sections. We pioneered the use of transfer learning with deep neural networks for classifying carbonate textures (Dawson et al., 2023). This research systematically evaluated convolutional neural network architectures and dataset sizes, revealing that even datasets considered large in Earth Sciences (7,000 images) can lead to overfitting, despite the use of transfer learning. This work, cited numerous times despite being relatively recent, has influenced applied AI research across disciplines.
Another common problem in geosciences is the need to quantify different types of minerals, or grains. We worked extensively on that issue. Two examples include classifying zircon (image below) in Cu-bearing batholiths, and quantifying carbonate grains.
We demonstrated that convolutional network with transfer learning were a good solution to classify zircon automatically (Nathwani et al, 2023). This has important implications for clean energy: zircons are often associated with rare metals, in our case Cu, which are crucial for batteries needed in the energy transition.
In addition, object detection allows the idenfitication and quantification of individual grains within a rock. In Dawson and John, 2024 (image below) we show that not only is this much faster (in fact, hundreds of times faster) than a human counting individual grains, it also allows for much higher resollution data gathering (thus not missing any subtle trends) and avoids errors.
Overcoming the Small Dataset Problem with Semi-Supervised Learning
Subsequent work in our lab focused on strategies to overcome the ‘small dataset’ problem common in geosciences. To this effect, we explored the use of generative artificial intelligence for synthetic image creation, and more advanced framework such as the use of semi-supervised machine learning. We recently demonstrated that the semi-supervised SimCLR framework could leverage unlabelled core data to surpass transfer learning even with only 5000 labelled images (Mamode et al, 2025). This opens the door to applications of deep learning to common Earth Science problems were few human-labelled data exist, but a large corpus of unlabelled data is available.
Semi-supervised Learning is a more advanced deep learning approach that allows one to use images that are unlabelled, i.e. that were no previously classified by a humans. We demonstrated in our paper “Do more with less: Exploring semi-supervised learning for geological image classification” by Mamode et al 2025 that this was able to overcome the