– At Aker BP, we started testing Microsoft Copilot integrated as a virtual assistant, which you can chat with in the Teams app. It can, for example, prepare an action list after a Teams meeting or find and summarize relevant documents in your Sharepoint. However, finding and summarising technical information is not that simple. Even more so, the current AI struggles to make sense of data across multiple databases and formats, says Tomasz Wiktorski, Data Delivery Manager at Aker BP and Professor at UiS.

ChatGPT started receiving tremendous attention at the end of 2022. The Atlantic named ChatGPT and other large language models (LLMs) in the “Breakthrough of the Year” article in 2022. Lately, LLMs excelled in answering general questions, writing code, and transforming and translating text. However, using LLMs for domain-specific factual responses poses a challenge.

AI for drilling data

– We wanted to develop a custom question-answering bot that can help drilling personnel make informed decisions promptly. Developing the chatbot started as a side project but has quickly become a significant part of my Ph.D. within machine learning for drilling, says Felix James Cardano Pacis, a Ph.D. candidate at the University of Stavanger and DigiWells.

Pacis is currently developing the chatbot with Tomasz Wiktorski, Sergey Alyaev, a senior researcher from NORCE, and Gilles Pelfrene, also a senior researcher from NORCE.

Felix in a drilling chair Felix Pacis during a DigiWells collaboration visit to Aker BP in 2022. Photo by S. Alyaev

– Felix started working on a data filtering and visualization dashboard before he started his Ph.D. He showed the first chatbot functionality with it at the DigiWells Seminar in 2022. That was before ChatGPT took the world by storm, says Alyaev, Deep-Learning-epic leader and Pacis’s Ph.D. co-supervisor. – We got a lot of interest from the consortium and encouraged Felix to dig deeper into the problem. That was very much in the spirit of DigiWells’ agile fashion. We definitely could not foresee this when we announced the Ph.D. position, he adds.

Pacis was on an extended research stay at the Colorado School of Mines in 2023 with an initial plan to explore data-based rate-of-penetration optimization. Simultaneously, he was improving the chatbot prototype for the upcoming conference. The chatbot project was so exciting that he decided to return earlier, as the group in Colorado did not have the right competence in LLMs.

Rigorous evaluation of available LLMs on industry jargon

– We used a zero-shot learning technique that relies on an LLM’s ability to generate responses for tasks outside its training, explains Pacis.

In machine learning, zero-shot learning occurs when an AI model is given tasks that were not part of the training and needs to make an inference about these tasks. This generally works by providing context information together with such a task. For example, an AI model trained to recognize horses can recognize a zebra given the context: “Zebras look like striped horses.”

For the recent study, Pacis implemented a controlled zero-shot learning “in-context” procedure that sends a user’s query augmented with text data to an LLM as inputs.

– This implementation encourages the LLM to take the answer from the data while leveraging its pre-trained contextual-learning capability. And we documented the pre-trained LLMs’ ability to provide correct answers and identify petroleum industry jargon from the collated dataset, says Pacis.

DigiWells researchers gathered and collated text data from publicly available databases, such as the Norwegian Offshore Directorate, annual reports of different companies, and the petroleum glossary. They used the dataset to create a domain-specific benchmark of multiple-choice questions. Their study, published at the SPE AIDC conference, evaluates seven commercial and open-source large language models (LLMs) on this benchmark. The results of the LLM evaluations aided the researchers in selecting the best-performing components for the petroleum chatbot.

Towards a fully functional chatbot

– We tailored the algorithm for drilling documents and made it preferable over Google and ChatGPT. Unlike our chatbot, ChatGPT and Google search could not provide specific and direct answers to industry-specific questions, says Pacis.

3-way-comparison Figure. Three-way comparison between Google search, Open AI ChatGPT-4, and DigiWells chatbot. Performed in Feb. 2024.

The work on the chatbot continues with the development of the ranking model. The researchers are also working to identify specific cases where the bot excels or needs improvement.

– We are including more data, increasing the response rate, and performing further testing and validation. We are also generating artificial data through simulated interactions with the chatbot as supplementary data, explains Pacis.

The DigiWells researchers also work with several public and private partners.

– We started working with the Norwegian Offshore Directorate on a new extensive dataset of all their public data. We’ll use it to train LLMs for offshore data exploration based on open-source offline models. We sorted out “in-context” question-answering in our SPE study. Now, for the full chatbot, we need an improved ranking model that finds the correct context document from any large knowledge base. We think this model can also be an LLM, says Sergey Alyaev.

The large language models are here to stay.

– What started as a PhD sub-project is gaining momentum. There will already be spin-off activities in DigiWells in 2024. We are discussing applications of Felix’s methodology with several of NORCE’s industrial partners. Maybe this summer we’ll get funding to apply it outside drilling, fingers crossed, concludes Alyaev.

NOD visut The chatbot team visiting the Norwegian Offshore Directorate in 2024. Photo by S. Alyaev

Reference

Pacis, F. J., Alyaev, S., Pelfrene, G., and T. Wiktorski. "Enhancing Information Retrieval in the Drilling Domain: Zero-Shot Learning with Large Language Models for Question-Answering." Paper presented at the IADC/SPE International Drilling Conference and Exhibition, Galveston, Texas, USA, March 2024. doi: https://doi.org/10.2118/217671-MS