## Geoscience Day

**Lessons from historical data exploration in a long-lived consolidated mineral belt**

by* Joao Gabriel Mottas*, University of Exeter

Mineral exploration uses a range of surface- or sub-surface-based techniques to acquire invaluable data over the properties (e.g., chemistry, physics, categorical groupings) of target geological bodies across varied spatial scales. The data represents a critical asset for government agencies, mineral companies, and research development. These data are often acquired over prolonged time frames, with surveying being carried out under different exploration philosophies, skill levels, technical limitations (hardware and software), management directions, and budgetary conditions. That landscape converts into several potential issues such as: poor documentation of the surveying and data processing carried at the time; integrity issues with physical and digital storage; faulty spatial continuity in the data itself; uncertainty over spatial position of sampling stations; positioning and measurement accuracy, and overall data quality, directly impacting their reliability. The stacking of all these practices might render historical data unusable, reduce the number of useful entries to critical levels. The judicious task of assembling the data sources in a homogeneous format and structure, along with backtracking storage, documentation, and building viable solutions might be time-consuming. Considering that the output of a computer intelligence system relates directly to the quality and reliability of the input data and workflow, dedicating time for inspection for issues in historical data ensures the integrity of any future interpretation and decision-making. Potential pitfalls, mitigation strategies and good practices leading for upcycling of legacy/historical data are presented and discussed in the light of project management focussing on their use in data-intensive ecosystems. Lessons from dealing with industry, research, and government geoscience data repositories and from non-mineral exploration markets are presented. Ongoing steady assimilation of data science and machine intelligence systems in exploration programs makes it crucial to use pre-existing data while acknowledging and mitigating their deficiencies, and furthermore, secure diligent preservation of data being acquired at present.

**Using Legacy Core Material to Support the Future of Data Science** by Kirstie Wright, North Sea Core CIC, Aberdeen UK

Core material provides a unique dataset with which to understand the subsurface. In the UK we have been collecting offshore core since exploration drilling started in the 1960’s. Once collected, the core is divided between operating company and the British Geological Survey (BGS) who keep an archive. As traditional exploration and production across the UK Continental Shelf begins to decrease, operator held core samples are being released. Originally destined to be thrown away, North Sea Core have been working to collect and redistributed this unwanted core material. Often working in collaboration with external companies and consultants, we have been creating new open access data to support the geoscience community and facilitate research. During this talk I will present an overview of existing subsurface data, including our own data and the potential for future developments, both in how we utilise legacy material and its use in data science.

**Advances in experimental mechanics: opportunities, challenges, and limitations**

by

*Elma Charalampidou*, Heriot-Watt University

The aim of this talk is to discuss advances in experimental mechanics and provide some open questions to trigger further discussion on how experimental mechanics can affect data and/or model uncertainties.

I will first discuss some full-field experimental methods along with their sensitivity and resolution. I will then illustrate their use commenting on resolved micro-processes taking place in Hard Soils – Soft Rocks (HSSR) and/or Rocks. Although these materials have been studied for many decades, their complex structure due to depositional, diagenetic and deformational processes make them irresistible research targets. Examples will draw on mechanical experiments (i.e., occurred micro-processes) and flow experiments (i.e., advancement of fluids, single and two-phase flow in homogeneous and heterogeneous samples). I will then discuss experimental mechanics challenges and limitations aiming to understand how these can be of interest for the data science community.

**Statistical characteristics of flow in heterogeneous random porous media **

by *Anna Isaeva*, Faculty of Physics, Moscow State University

Geological heterogeneities affect the flow of reservoir fluids. However, different levels of heterogeneity are important for different types of fluid (e.g., dry gas, wet gas, light oil, heavy oil) and production mechanisms. This fact underlies a simple rule of thumb – ‘Flora’s Rule’ – which determines the critical level of permeability contrast for a given fluid type and displacement process. For example, according to this rule, gas reservoirs are only sensitive to 3 orders of magnitude of permeability variation. This means that less detailed reservoir models with relatively coarse grid are useful to predict fluid flow in gas reservoirs under depletion. Since ‘Flora’s Rule’ has the status of a rule of thumb, it seems promising to bring a fundamental basis under these statements.

Here we consider fluid flow in macroscopically heterogeneous porous media (the standard Darcy law controls the flow). We model spatially heterogeneous permeability of a porous medium using geostatistical algorithm. We demonstrate that our approach allows us to generate spatially heterogeneous petrophysical properties with a given variogram function and anisotropy. Then we simulate numerically the fluid flow in the generated heterogeneous porous media. We analyze spatial properties of the simulated flow: we calculate empirical variogram for fluid velocity components, estimate its anisotropy, etc. Since we control the spatial heterogeneity of porous medium properties, our approach enables us to link spatial heterogeneity of permeability to the resulting heterogeneity of flow, thus, we test ‘Flora’s Rule’ for a series of simulations.

**Some applications of global sensitivity analysis in geosciences**

by *Dmitriy Kolyukhin*, Trofimuk Institute of Petroleum Geology and Geophysics , Novosibirsk, SB RAS

The lack of measurement data and the complexity of the studied objects and phenomena lead to the need to use statistical models. The global sensitivity analysis (GSA) allows estimating the impact of uncertainty caused by different random parameters variability on the total uncertainty of the model. GSA is usually used to study complex nonlinear models and may be used to determine essential and non-essential parameters, possible reduction in problem dimension, and improve understanding of model behavior.

This presentation provides an overview and comparison of the main existing GSA methods. Estimation of Sobol indices (SI) for some geoscience problems is considered. In particular, we illustrate this technique for connectivity characteristics of a discrete fracture network (DFN) analysis. In conclusion, a recently developed method for estimating SI for hierarchical statistical models using the Monte Carlo method with double randomization is described.

**Estimating permeability field by combining data from wireline logs and hydrodynamic well tests**

by *Vladimir Vannovskiy*, Skoltech, Moscow

One of the big challenges in geoscience is to obtain a coherent model of the subsurface space by combining data from various sources and of different locality. We present a two-dimensional approach based on kernel regression able to combine wireline logs’ interpretation with hydrodynamic well tests data. The former data are strongly local (on the scale of meters) and the latter ones represent integral field properties (on the hundreds of meters scale). The form of the proposed kernel allows one to take into account different data locality. The kernel parameters are tuned according to leave-one-out metric. The possibility to incorporate seismics into the approach will be discussed as well as the other open challenges.

## DataScience Day

**Graph discovery and Bayesian filtering in state-space models for temporal data**

by *Victor Elvira*, University of Edinburgh

Modeling and inference in multivariate time series is central in data sciences, including statistics, signal processing, and machine learning, with applications in Earth observation. The linear-Gaussian state-space model is a common way to describe a time series through the evolution of a hidden state, with the advantage of presenting a simple inference procedure due to the celebrated Kalman filter. A fundamental question when analyzing multivariate sequences is the search for relationships between their entries (or the modeled hidden states), especially when the inherent structure is a directed (causal) graph. In such context, graphical modeling combined with parsimony constraints allows to limit the proliferation of parameters and enables a compact data representation which is easier to interpret in applications, e.g., in inferring causal relationships of physical processes. We propose a novel expectation-maximization algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model.

**Reinforcement learning for fluid control**

by *Ahmed Elsheikh*, Institute of GeoEnergy Engineering, Heriot-Watt University

Reinforcement learning (RL) is a promising tool for solving optimal subsurface flow control problems where the model parameters are highly uncertain and the observations are sparse and noisy. RL algorithms relies on performing a large number of flow simulations and this could easily become computationally intractable for large scale models. In order to address this computational bottleneck, we introduce an adaptive multi-grid RL (MGRL) framework which is inspired by principles of geometric multi-grid methods used in iterative numerical algorithms. In MG-RL, control policies are initially learned using computationally efficient low fidelity simulations. Subsequently, the simulation fidelity is increased adaptively towards higher fidelity simulations. The proposed MGRL framework is demonstrated using a model-free policy based RL algorithm, namely the Proximal Policy Optimisation (PPO) algorithm, for two case studies of robust optimal well control problems which are often encountered in subsurface reservoir management. We observe prominent computational gains using the proposed MGRL framework saving around 60-70% when compared to a single fine-grid RL counterpart.

**Graph deep learning in reservoir conditioning to production data** by G*leb Shishaev*, Tomsk Polytechnic University

Generative deep learning is becoming a widely used approach in geological modelling especially in solving inverse problems like history matching. The basic idea of by generative deep learning application to reservoir modelling and history matching is to train an adversarial reservoir model generator and find ensemble optimal solutions under geological constraints. There are different approaches that have been developed to utilize various modifications of Generative Adversarial Networks or Variational Autoencoders to implement reservoir generators, but most of them utilize “conventional” convolution neural networks, hence all data have to be regular (rectangular) in structure. This should be recognized as a limitation because many reservoir models are based on unstructured grids to account for geological structural discontinuities and structural uncertainty.

In this talk, I will introduce a novel approach for reservoir modelling with Variational Autoencoders (VAE) based on graph convolutions as opposed to “conventional” convolutions. In this approach a reservoir model is considered as a graph, i.e., not a lattice structured data type. Graph convolutions can handle these data types and link up with the Variational Autoencoder to generate unstructured reservoirs models and populate the model property distribution. Variational Autoencoders demonstrate the ability to implicitly parameterize geological representations into a latent space of reduced dimensionality and provides ways to solve the history matching problem based production profiling across multiple geological concepts and quantify uncertainty with ensemble of history matched models.

**Performance of Machine Learning Algorithms and the Value of Information**

by *Roman Belavkin*, Faculty of Science and Technology, Middlesex University, London

Various data analysis and machine learning algorithms allow us to extract information about various phenomena and create useful models (e.g. for recognition, classification, prediction, etc). Given a limited amount of data, what is the best possible performance that can be achieved by these models? Or given a minimum performance level, what is the smallest amount of data and information required to achieve this performance? These questions are related to variational problems of the value of information (VoI) theory that originated in the works of Claude Shannon on rate distortion, and later developed in the 1960s by Ruslan Stratonovich and his colleagues. I will outline the main mathematical ideas about different types of information and their values, and then show how Shannon’s VoI can be computed using entropy and the objective function (e.g. utility or cost) in a simple example. The VoI function can be used to tune hyperparameters of an algorithm to maximize its performance. I will show how this approach was used to find optimal control of mutation rates in genetic algorithms. I will also discuss how the value of information is maximized in multilayer neural networks.

**Data science approaches in geology**

by *Alexey Antonov*, *Mikhail Karpushin*, Faculty of Mathematics and Mechanics, Moscow State University

The talk is devoted to an overview of various applications of machine and deep learning in geology. Examples are given both from the personal experience of the authors and some existing projects. Key ideas and tools needed to solve such problems will be discussed.

**Machine Learning Approaches in Geological Mapping – A case study from Northern Ireland**

by *Zeinab Smillie*, University of Stirling

Geological mapping has enormous economic, environmental and societal value. In addition, geological maps support decisions regarding land use and infrastructure development. However, mapping techniques are expensive, time-consuming and rely on geologists own experience. Valid integration of remotely sensed data and modern data approaches, including machine learning, leads to rapid and practical solutions easily communicated among a broader range of scientists and decision-makers, especially when aiming to map more expansive areas.

Self-organising map (SOM) is an unsupervised classification tool trained by competitive learning. The method helps analyse and visualise high-dimensional data based on principles of vector quantification of similarities and clustering in a high-dimensional space. In addition, the technique can perform prediction, estimation, pattern recognition of large data sets. Here we apply the technique to integrate geological, topographical, and geophysical data (airborne geophysical data acquired through the Tellus project). The data characterise the K, U and Th distribution associated with the natural geological features in Northern Ireland.

Different experiments (iterations) were used to assess the impacts of various features on controlling the mapping patterns.

## Application Day

**Improving marine renewable energy industry, Floating Solar Plant in particular, by linking marine geoscience with data science** by *Amir Honaryar*, Queen’s University, Belfast

Three-fourths of our planet is covered with the ocean, however, the contribution of such a great mass of water towards worldwide renewable energy production is negligible. A relative lack of detailed marine geoscience research data has been a formidable challenge marine renewable energy industry is still facing. Offshore wind, wave, tidal current, bathymetry, and seabed soil classification are among these data. Having known as the two primary renewable sources of energy, solar and wind energies are now being harnessed floating on the seawater also called floating solar and floating wind, respectively, to free up precious lands for other purposes such as agriculture and farming. Design challenges for the former are even more serious as most of the meteorological wind data are available for the 10-m altitude (and more) above the sea, whereas the floating solar plant is operating below ten meters height. I will be talking on how we could use data science approaches like numerical modelling, experimental testing, and time and frequency domain statistical methods to better design marine renewable energy systems, particularly the floating solar plant.

**Domaining of downhole geochemical data – an automated approach applied to the Northern Limb of the Bushveld Complex **by *Tom Buckle*, University of Exeter

The Northern Limb (NL) of the Bushveld Complex is host to some of the largest platinum group element (PGE) deposits in the world, however there is a lack of understanding of the controls on the style and spatial distribution of mineralisation. Geochemical domains can be used as an initial step in understanding magmatic processes, or for generating 3D orebody models. Boundary detection using the continuous wavelet transform (CWT) method was first used in the geosciences to identify boundaries in downhole geophysical data by Cooper & Cowan (2009). Hill et al. (2021) further developed the methodology by allowing for multivariate inputs, and by improving the visualisation of the CWT scalogram, enabling interpretation of boundaries across multiple spatial scales. This multiscale CWT method can be effectively used to domain downhole bulk geochemical data from a series of exploration drillholes from the NL to facilitate repeatable domaining that takes downhole spatial continuation into consideration. The attributes describing these domains can then be clustered to allow for spatial correlation across drillholes. The results of this workflow are shown to be a good comparison to results of manual geochemical domaining on NL drill hole data as performed by a geologist, with the additional benefit of minimising human bias, enabling (re)interpretation of drill hole data with increased speed, and the ability to use different inputs to identify domains for specific purposes or at different scales of observation

**Agent based modelling for subsurface dynamic screening**

by *Bastian Steffens, Quentin Corlay*, Heriot-Watt University

Understanding subsurface hydrocarbon migration is a crucial task for petroleum geo-scientists. Hydrocarbons are released from deeply buried and heated source rocks, such as shales with a high organic content. They then migrate upwards through the overlying lithologies. Some hydrocarbon becomes trapped in suitable geological structures that, over a geological timescale, produce viable hydrocarbon reservoirs. Here we investigate how intelligent agent models can mimic these complex natural subsurface processes and account for geological uncertainty. Physics-based approaches are commonly used in petroleum system modelling and flow simulation software to identify migration pathways from source rocks to traps. However, the problem with these simulations is that they are computationally demanding, making them infeasible for extensive uncertainty quantification. They are also not suited for brief screening of seismic cubes for potential areas of hydrocarbon accumulation. We present a novel dynamic screening tool for secondary hydrocarbon migration that relies on agent-based modelling. It is fast and is therefore suitable for uncertainty quantification or screening procedures, before using petroleum system modelling software for a more accurate evaluation of migration scenarios.

With this presentation, we want to showcase how and where Agent-based modelling fits into the data science landscape and give an example of how geoscientists can utilise it in the subsurface modelling domain.

**Machine learning approach to reduce uncertainties in reservoir description using gravity and magnetic data**, by *Aleksandra Volkova*, Tomsk Polytechnic University

Use of 3D seismic is usually associated with big data, high costs, a lot of complicated processing ML techniques, and good resolution. Continuous improvement seismic quality should not discount the field methods, such as gravity and magnetics, which are cheap and cover a large area. Machine learning (ML) provides a way to integrate gravity and magnetics with seismic to help reduce geological uncertainty. I will demonstrate the value of these geophysical methods with an example where only seismic data cannot help to reduce uncertainties. While in another example elementary ML help reduce uncertainty the use gravity and magnetic data. The presentation leads the way to develop new data science tools for gravity and magnetics data integration in geological modelling.

**Classification of lithology (autointerpretation) and empty logs restoration by means of Machine Learning**, by *Vladlen Sakhnyuk*, *Alexander Sharifullin* and *Eugeny Novikov* , Moscow State University

The penetration of Machine Learning and Data Science to all industries nowadays can not be underestimated. There are lots of examples where ML algorithms can be applied: marketing, telecommunication, economics, etc. The objective of the current survey is to study the applicability of mathematical models and data analysis techniques for oil geology tasks, particularly lithology prediction based on the well-logs data, curves values restoration and its influence on the classification models metrics.

The input data contains 349 wells with the available well-log curves (Gamma Ray, Spontaneous Potential, Resistivity, etc.). Collected input data were preprocessed, discretized and split for the validation purposes. The test was conducted on one well, while other data were used during training.

There are 3 ML models that were involved in this work: CatBoost (the algorithm that represents gradient boosting group of ML models based on the Decision Trees), Multilayer Perceptron (4 fully-connected layers) and LSTM (recurrent neural net that is eligible for sequence data tasks). The efficiency of models was evaluated with an accuracy score for classification task and RMSE for regression.

The results make it possible to conclude that ML models generally may be applicable to geology. The performance metrics highlighted the superiority of CatBoost model among other models. Finally, some advice and other conclusions were made based on the results.