Drilling Data Management: WITSML to Cloud

Dr. Mehrdad Shirangi | | Published by Groundwork Analytics LLC

Editorial disclosure

This article reflects the independent analysis and professional opinion of the author, informed by published research, vendor documentation, and hands-on experience with drilling data systems. No vendor reviewed or influenced this content prior to publication.

Every well drilled in North America generates between 50 and 200 gigabytes of raw data. Surface drilling parameters at one-second intervals. Downhole MWD surveys every 90 feet. LWD measurements every six inches. Mud logging data. Gas chromatography readings. Daily drilling reports. Morning reports. BHA records. Bit records. Fluid reports. The volume of data produced during a single well's drilling phase rivals what many companies' IT departments managed across their entire organization two decades ago.

And yet, most operators cannot easily answer a straightforward question: What was our average connection time in the Wolfcamp across all wells drilled last year?

The reason is not a lack of data. It is a lack of data infrastructure. Drilling data is generated by multiple systems from multiple vendors, transmitted through inconsistent pathways, stored in incompatible formats, and often siloed by well, by rig, or by service company. The data exists. The infrastructure to make it usable -- consistently, reliably, and at scale -- is the bottleneck.

This article examines the state of drilling data infrastructure: the standards that govern it, the vendors that provide it, the persistent challenges, and what needs to change for AI and machine learning to deliver on their promise in drilling operations.


The Foundation: Electronic Drilling Recorders

All drilling data management starts at the rig floor, with the Electronic Drilling Recorder (EDR). The EDR is the primary data acquisition system for surface drilling parameters. It captures the core measurements that define what is happening during drilling: hookload, block position, rotary RPM, rotary torque, standpipe pressure, pump strokes, flow rate, and rate of penetration.

Pason EDR

Pason dominates the North American land drilling market, with EDR installations on roughly 60% of active rigs. Pason's system captures data at configurable intervals (typically one to five seconds) and transmits it to Pason's DataHub, a cloud-based data platform that makes the data available for remote monitoring and third-party applications.

Pason's DataHub is significant because it serves as a de facto data aggregation layer for much of the North American drilling industry. Many third-party analytics platforms (Corva being the most prominent example) receive their data feed from Pason rather than connecting directly to rig-floor equipment.

Totco / NOV

NOV's Totco brand provides EDR systems that compete with Pason, particularly on NOV-equipped rigs. The integration between NOV EDR and NOV's NOVOS rig operating system provides a tighter coupling between data acquisition and rig automation compared to the Pason/third-party model. However, Totco's market share in North America is smaller than Pason's.

Offshore and International

Offshore and international rigs typically use EDR systems from a wider range of providers, including Kongsberg, Epsis (now part of Kongsberg), SLB, and various rig OEM-provided systems. Data standardization is generally weaker in these environments, partly because the installed base is more diverse and partly because offshore rigs often run legacy systems that predate modern data standards.


WITSML: The Standard That Almost Solved the Problem

What WITSML Is

WITSML (Wellsite Information Transfer Standard Markup Language) is an XML-based data exchange standard developed by Energistics, the upstream oil and gas data standards consortium. WITSML defines standardized data objects for drilling data -- wellbore trajectories, logs, mudlog data, cement jobs, fluid reports, BHA runs, and more -- along with a server/client architecture for transmitting this data in real time.

The goal of WITSML is straightforward: enable drilling data to flow between systems from different vendors in a consistent, standardized format. A directional drilling company's MWD system should be able to transmit survey data to an operator's monitoring system using the same data format and protocol, regardless of which MWD tool or which monitoring platform is being used.

WITSML Versions

WITSML has evolved through several versions:

  • WITSML 1.3.1 -- The version most widely deployed in production systems. It uses a SOAP-based web services protocol for data exchange. Despite being technically superseded, version 1.3.1 remains the de facto standard in many environments because it has the largest installed base.
  • WITSML 1.4.1 -- An incremental update with additional data objects and improved data model definitions. Adoption is mixed; some operators and vendors upgraded, others stayed on 1.3.1.
  • WITSML 2.0 -- A major architectural overhaul that moved from SOAP to RESTful APIs and aligned with the Energistics Transfer Protocol (ETP) for real-time data streaming. WITSML 2.0 also integrated with the RESQML (reservoir) and PRODML (production) standards under a common framework. Adoption has been slow, partly because upgrading from 1.x to 2.0 requires significant changes to both server and client implementations.

What WITSML Gets Right

WITSML solved a real problem. Before standardized data exchange, every connection between a service company's system and an operator's system was a custom integration. Moving data from a rig to an operations center required point-to-point connections that broke whenever either side changed their system. WITSML provided a common language.

For basic drilling data exchange -- transmitting EDR data, MWD surveys, and mudlog information from the rig to monitoring centers -- WITSML works. Thousands of wells are drilled every year with WITSML data feeds flowing successfully from rig to office.

Where WITSML Falls Short

The limitations of WITSML become apparent when you try to use it as the foundation for analytics and AI.

Data quality is not guaranteed. WITSML defines the format but does not enforce data quality. A WITSML server can transmit data that is technically valid XML but contains sensor errors, missing values, duplicate timestamps, or physically impossible measurements. The standard defines what the data should look like, not whether the data is correct.

Real-time performance varies. WITSML 1.x uses a polling model -- the client periodically requests new data from the server. This introduces latency and overhead compared to streaming protocols. WITSML 2.0 / ETP addresses this with WebSocket-based streaming, but adoption of ETP in production drilling environments is still limited.

Coverage is incomplete. WITSML defines data objects for common drilling data types, but many important data categories are either poorly covered or missing entirely. Drilling fluid properties, equipment maintenance records, formation evaluation interpretations, and operational context (why the driller pulled out of hole) are either absent from the standard or defined in ways that few vendors fully implement.

Semantic consistency is weak. Even within WITSML-compliant data, the same measurement can appear with different units, different naming conventions, or different reference frames. One vendor's "WOB" might be measured at the surface (hookload-derived). Another's might include buoyancy corrections. Both are valid WITSML, but they are not directly comparable without context.

Adoption is uneven. Major service companies generally support WITSML, but the depth of that support varies. Some vendors export only a subset of WITSML data objects. Some support only older versions. Some implement the server side but not the client side, or vice versa. The result is that WITSML interoperability, while better than nothing, is not plug-and-play.


Key Vendors in Drilling Data Management

Petrolink

Petrolink occupies a specialized niche in drilling data management: real-time data aggregation and quality control at the wellsite. Petrolink's systems sit between the various data sources on a rig (EDR, MWD, mud logging) and the operator's monitoring systems, providing data validation, normalization, and transmission.

Petrolink's wellsite data hub aggregates data from multiple sources into a single stream, applying quality checks and ensuring consistent timestamps, units, and channel naming. For operators running multiple rigs with different EDR systems and different service companies, Petrolink provides a consistency layer that raw WITSML feeds often lack.

Petrolink has also invested in cloud-based data management, offering hosted data historian services and analytics dashboards. Their position at the rig site gives them a data quality advantage -- they can catch and correct data issues at the source rather than after the data has been transmitted to the office.

Strengths: Wellsite presence for real-time data quality, vendor-neutral aggregation, strong WITSML expertise, experience across diverse rig environments.

Limitations: Petrolink adds a layer (and a cost) to the data chain that some operators view as unnecessary if their EDR-to-cloud pipeline is already functional. The company's analytics capabilities are less extensive than pure-play analytics platforms like Corva.

Corva (Data Platform)

While Corva is primarily known as a drilling analytics platform, its underlying data infrastructure is a significant part of its value proposition. Corva ingests data from EDR systems (primarily via Pason), normalizes it into a consistent data model, and stores it in a cloud-native database optimized for time-series analytics.

For operators who use Corva across multiple rigs, the platform effectively becomes a drilling data warehouse -- a single, consistent repository of drilling data spanning multiple wells, rigs, and time periods. This is valuable not just for real-time analytics but for historical analysis and machine learning model training.

Corva's API layer allows operators and third-party developers to build applications on top of the normalized data, creating a data ecosystem around the drilling data asset.

Strengths: Cloud-native architecture, automated data normalization, growing historical dataset, API-first design, app marketplace for data consumption.

Limitations: Primarily focused on North American land drilling with Pason EDR integration. Offshore and international environments with diverse EDR systems are less well supported. Data quality is dependent on the quality of the incoming EDR feed.

Kongsberg Digital

Kongsberg (through its Kongsberg Digital subsidiary) provides drilling data management solutions primarily for offshore and international drilling environments. Their SiteCom platform handles wellsite data acquisition and transmission, while their cloud-based solutions provide data historian and analytics capabilities.

Kongsberg's strength lies in its heritage in offshore automation and control systems. The company understands the unique challenges of offshore data management -- intermittent satellite communications, diverse legacy systems on older rigs, and the need for high-reliability data transmission in safety-critical environments.

Strengths: Offshore expertise, robust communication infrastructure, integration with Kongsberg rig automation systems, international presence.

Limitations: Less penetration in the North American land drilling market where Pason dominates, solutions can be complex for simpler rig environments.

SLB (Data Management)

SLB's drilling data management capabilities are embedded within the broader Delfi platform. The platform provides data aggregation, historian services, and analytics tools, with the advantage of tight integration with SLB's downhole tools and drilling services.

SLB has invested heavily in data lakes and data mesh architectures for managing drilling data at enterprise scale. Their approach emphasizes integration across the well lifecycle -- connecting drilling data with geological models, completion data, and production data in a single platform.

Strengths: Integration with SLB tools and services, enterprise-scale data management, investment in modern data architectures, global presence.

Limitations: Best suited for operators who use SLB services extensively. Data from non-SLB sources requires additional integration work. The Delfi platform is a significant undertaking to adopt and may be more than smaller operators need.


The Data Silo Problem

The most persistent challenge in drilling data management is not any individual technology or standard. It is the fragmentation of data across organizational boundaries.

A typical well involves data from:

  • The rig contractor (EDR data, equipment records)
  • The directional drilling company (surveys, geosteering data)
  • The mud logging company (lithology, gas readings)
  • The mud company (fluid properties, mud checks)
  • The operator's drilling engineering team (well plan, design calculations)
  • The operator's geology team (formation tops, pore pressure estimates)

Each of these organizations maintains its own data systems. Data exchange between them happens through a combination of WITSML feeds, email attachments, shared drives, and phone calls. Even when WITSML connections are in place, they typically cover only a subset of the data being generated.

The result is that no single system contains a complete picture of what happened during drilling. The directional drilling company knows the wellbore trajectory in detail but not the mud properties. The mud logger knows the gas readings but not the drilling parameters. The operator's engineer has the well plan but may not have real-time access to downhole vibration data.

For AI and machine learning applications, this fragmentation is a fundamental obstacle. A model that predicts stuck pipe risk needs surface drilling parameters (from the EDR), mud properties (from the mud company), formation data (from the mud logger and geologist), and wellbore geometry (from the directional driller). If these data sources cannot be integrated in real time, the model cannot function as intended.


Cloud Migration: Where the Industry Stands

The drilling data management landscape is in the middle of a migration from on-premise and rig-site systems to cloud-based platforms. This migration is driven by several factors:

Remote operations. The COVID-19 pandemic accelerated the adoption of remote drilling operations centers, which require cloud-accessible data. This trend has proven durable -- many operators now run hybrid remote/rig-site operations as standard practice.

Analytics scalability. Machine learning model training and large-scale historical analysis require compute resources that are impractical to maintain on-premise. Cloud platforms provide elastic compute that can scale up for analytics workloads and scale down when not needed.

Data consolidation. Cloud platforms provide a natural aggregation point for data from multiple rigs, wells, and time periods. This consolidated dataset is far more valuable for analytics than data scattered across rig-site systems.

Vendor ecosystem. Cloud-native platforms with open APIs enable a broader ecosystem of analytics applications and integrations than closed, on-premise systems.

However, the migration is not complete, and several challenges remain:

Bandwidth at the wellsite. Many land rig locations have limited connectivity, particularly in remote areas. Transmitting high-frequency EDR data from the rig to the cloud requires reliable bandwidth that is not always available. Satellite and cellular connectivity have improved but are not universal.

Latency for real-time applications. Some drilling applications (particularly automation and safety-critical monitoring) cannot tolerate the latency introduced by a rig-to-cloud-to-user data path. These applications need edge computing at the rig site with cloud synchronization for analytics and historical storage.

Data ownership and governance. When drilling data moves to a vendor's cloud platform, questions of data ownership become important. Who owns the data generated by a rig contractor's equipment, processed by a service company's software, and transmitted to a third-party analytics platform? These questions are often addressed contractually but inconsistently.


The AI Opportunity: What Better Data Infrastructure Would Enable

The purpose of drilling data infrastructure is not data management for its own sake. It is to make drilling operations safer, faster, and cheaper. The most significant opportunity for improving those outcomes lies in AI and machine learning -- but AI requires a data foundation that most operators have not yet built.

Opportunity 1: Continuous Learning Across Wells

With a properly architected drilling data warehouse, every well drilled becomes a training data point for AI models. Stuck pipe prediction models get better with every incident. ROP optimization improves with every foot drilled. Connection time benchmarks update automatically. This requires a consistent, high-quality, well-contextualized historical dataset -- exactly what current fragmented data infrastructure fails to deliver.

Opportunity 2: Real-Time Multi-Source Fusion

The most valuable drilling analytics require fusing data from multiple sources in real time: surface parameters + downhole measurements + mud properties + formation data + wellbore geometry. WITSML provides a standard for parts of this, but true multi-source data fusion requires a data platform that can ingest, align, and serve data from heterogeneous sources with consistent timestamps and semantics.

Opportunity 3: Digital Drilling Twins

A true digital twin of a drilling operation -- a real-time model that represents the current state of the wellbore, formation, and equipment -- requires continuous data feeds from every sensor and system on the rig, integrated into a single model. This is the ultimate expression of drilling data infrastructure, and it remains largely aspirational. The individual components exist (physics-based models, real-time data feeds, cloud computing), but the integration layer is not yet mature.

At Groundwork Analytics, we view the data infrastructure challenge as the critical enabler for AI in drilling. Our experience building data pipelines for drilling analytics has shown that 70-80% of the effort in any AI project goes into data engineering -- cleaning, normalizing, contextualizing, and integrating data from disparate sources. The organizations that invest in this unglamorous but essential work are the ones that can actually deploy AI models that work in production, not just in pilot projects.


Practical Recommendations

For operators looking to improve their drilling data infrastructure:

Standardize on WITSML where possible, but do not assume it solves the problem. WITSML is a necessary but insufficient foundation. You still need data quality monitoring, normalization, and contextualization on top of the standard.

Invest in a drilling data warehouse. Whether you build it on a cloud platform, use a vendor like Corva or Petrolink, or deploy something custom, you need a single, consistent repository of drilling data that spans all your wells and all your data sources.

Define data quality requirements explicitly. What channels are required? At what frequency? What quality checks must pass? These requirements should be in your drilling contracts, not assumed.

Establish data governance early. Who owns the data? Who can access it? How long is it retained? These questions are easier to answer before you have 500 wells of data scattered across three vendors' cloud platforms.

Plan for AI from the start. If you know you want to deploy machine learning models in the future, design your data infrastructure for that use case now. That means consistent channel naming, complete metadata, rich contextual information (why was the well shut in? why did we change mud weight?), and time-aligned multi-source data.

The drilling data management landscape is not glamorous. It is plumbing. But like all plumbing, when it works, everything built on top of it functions smoothly. When it does not, nothing else matters.


Dr. Mehrdad Shirangi is the founder of Groundwork Analytics and holds a PhD from Stanford University in Energy Systems Optimization. He has been building AI solutions for the energy industry since 2018. Connect on X/Twitter and LinkedIn, or reach out at info@petropt.com.


Related Articles

Have questions about this topic? Get in touch.