Editorial disclosure: This article reflects the independent analysis and professional opinion of the author, informed by published research, vendor documentation, industry surveys, and practitioner experience. No vendor reviewed or influenced this content prior to publication. Product capabilities described are based on publicly available information and may not reflect the latest release.
Ask someone in oil and gas about their "data architecture" and you will get a different answer depending on which building they sit in. The drilling engineer describes a world of WITSML feeds and one-second EDR data. The completions engineer talks about high-frequency pressure and rate data from the frac fleet. The production engineer points to SCADA dashboards and morning reports. The service company operations manager explains how they track 50 pump units across three basins. The midstream scheduler mentions custody transfer measurement and pipeline integrity monitoring.
They are all correct. And they are all describing fundamentally different data architectures that were designed independently, evolved separately, and remain stubbornly disconnected.
This disconnect is not merely an IT inconvenience. It is the single biggest reason why well lifecycle analytics -- using data from drilling through completions through production to inform the next well -- remains aspirational at most operators. The data exists. It was generated by different systems, in different formats, at different frequencies, owned by different teams, and stored in different databases that do not talk to each other.
This article maps the actual data architecture for each upstream segment, names the specific vendors and systems in use, and confronts the integration challenge that prevents operators from treating a well as a single data object rather than a collection of disconnected datasets.
Drilling Operations: The WITSML-Centric Architecture
Drilling operations run on the most mature real-time data infrastructure in upstream oil and gas. The architecture is built around a single standard -- WITSML (Wellsite Information Transfer Standard Markup Language) -- that has been the backbone of rig-to-office data transfer since the early 2000s.
How Data Flows
The primary data source on a land rig is the Electronic Drilling Recorder (EDR). Pason EDR systems are installed on roughly 60% of active North American land rigs. The EDR captures surface drilling parameters -- weight on bit, rotary RPM, pump pressure, flow rate, torque, rate of penetration -- at intervals as fine as one second. That is 86,400 data points per channel per day, across dozens of channels.
The data flows from the EDR through Pason's DataHub, which aggregates and normalizes the raw signals. From DataHub, the data is transmitted to cloud or on-premises WITSML servers where it becomes accessible to third-party platforms. Corva, the dominant independent drilling analytics platform, ingests this data primarily through Pason integration. SLB's DrillOps, Halliburton's Digital Well Operations, and NOV's MAX platform each maintain their own ingestion pathways, typically tied to their own equipment and service delivery.
Alongside surface data, Measurement While Drilling (MWD) and Logging While Drilling (LWD) tools generate downhole data -- directional surveys every 90 feet, formation evaluation logs every six inches. This data is transmitted to surface via mud pulse telemetry or electromagnetic telemetry at rates between 1 and 40 bits per second -- a bandwidth constraint that shapes the entire downhole data architecture.
The Typical Drilling Data Stack
| Layer | Technology | Notes |
|---|---|---|
| Data acquisition | Pason EDR | 1-second surface data; ~60% of North American land rigs |
| Data aggregation | Pason DataHub | Normalizes, stores, transmits EDR data |
| Data transport | WITSML 2.1 over ETP | Standard protocol; replaces older SOAP-based transfer |
| Real-time analytics | Corva | Cloud-native; benchmarking, ILT detection, ROP optimization |
| Well planning | SLB DrillPlan or Halliburton DecisionSpace | Trajectory, casing, hydraulics modeling |
| Directional drilling | Service company proprietary | SLB Compass, Halliburton COMPASS, SDI |
| Drilling automation | NOV NOVOS (on NOV rigs) | Connection automation, slide-rotate optimization |
| Post-well database | WellView (Peloton) | Final well records, BHA records, time-depth data |
| Dashboards | Corva + Spotfire or Power BI | Real-time operations center + engineering analysis |
What Service Companies Provide vs. What Operators Build
This distinction matters and is often misunderstood. The service companies -- SLB, Halliburton, Baker Hughes -- provide integrated drilling software platforms that work best with their own equipment and services. Operators who switch directional drilling providers every few wells need a vendor-neutral analytics layer. This is where Corva fills a critical gap. Corva ingests data regardless of which service company is on the rig, providing consistent analytics across the operator's drilling program.
The result is a hybrid architecture: service-company-provided tools for real-time operations on the current well, and operator-owned tools for cross-well analytics and historical benchmarking.
Real-Time Requirements
Drilling is the most time-sensitive segment. Surface drilling parameters are sampled at one-second intervals. The drilling engineer monitoring from a remote operations center needs to see those parameters within seconds of measurement -- not minutes, not hours. A stuck pipe event can go from manageable to catastrophic in the time it takes to brew coffee.
This real-time requirement shapes the entire architecture. Data flows through WITSML in near-real-time using the Energistics Transfer Protocol (ETP), which replaced the older SOAP-based polling mechanism. Edge computing at the rig site handles local alarms and automated responses. Cloud-based analytics platforms provide the heavier computational workloads -- benchmarking, predictive models, pattern recognition -- with latency tolerances of seconds to minutes.
Completions and Frac Operations: The Event-Driven Architecture
If drilling data architecture is a continuous stream, completions data architecture is a series of intense bursts. A horizontal well might take 15 to 25 days to drill, generating continuous data throughout. The completions phase on that same well might involve 40 to 60 frac stages pumped over 5 to 10 days, each stage generating a massive pulse of high-frequency data followed by a pause.
How Data Flows
During a frac stage, the pressure pumping company monitors treating pressure, pump rate, proppant concentration (clean rate and slurry rate), fluid volume, and bottomhole pressure (if gauges are available) at sub-second intervals. A single stage pumped at 90 barrels per minute for two hours generates thousands of data points per second across all channels.
The frac company's data acquisition system -- which is proprietary to each company -- captures this data on location. Liberty Energy uses custom-built systems. ProPetro runs AccuFrac, which includes Job Center for remote control and data acquisition, plus Power Center for monitoring their FORCE electric fleet's 165 MW of power generation. NexTier operates NexHub Digital Center for 24/7 monitoring and the EOS platform for real-time workflows, often integrating Corva for visualization.
Real-Time Frac Monitoring
Modern completions monitoring extends well beyond basic treating parameters:
Fiber optic monitoring (DAS/DTS). Distributed acoustic sensing and distributed temperature sensing provide continuous measurements along the entire wellbore during pumping. Fiber shows fluid distribution across perforation clusters, fracture initiation from individual clusters, and inter-stage communication. A single frac stage can generate gigabytes of DAS data.
Frac hit monitoring. Pressure gauges in offset wells (parent wells) detect when hydraulic fractures from the new well communicate with existing production. Liberty Energy built a proprietary system called WellWatch specifically for frac hit monitoring.
Stage-by-stage analytics. Each frac stage produces a discrete dataset: treating pressures, rates, volumes, proppant placement, ISIP (instantaneous shut-in pressure), closure pressure from DFIT analysis, and any downhole measurements available.
The Typical Completions Data Stack
| Layer | Technology | Notes |
|---|---|---|
| Data acquisition | Service company proprietary | Liberty custom, ProPetro AccuFrac, NexTier EOS |
| Real-time monitoring | Service company data center | NexHub (NexTier), remote monitoring centers |
| Fiber optic | OptaSense, SLB, Halliburton, Silixa | DAS/DTS; gigabytes per stage |
| Frac design | FracPro/Gohfer (Halliburton), StimPlan (NSI), Kinetix (SLB) | Pre-job design + real-time model calibration |
| Completions analytics | Novi Labs, Mangrove (SLB), operator custom | Data-driven design optimization |
| Post-job database | FracTrends (Liberty, 60K+ well database), operator custom | Treatment records, stage data |
| Dashboards | Corva, Spotfire, Power BI | Stage-by-stage analysis, well comparison |
The PDF Problem
At the end of a frac job, the pressure pumping company delivers a job report to the operator. At many companies, this report is a PDF. Not a structured data file. Not an API call. A PDF.
That PDF contains the treating parameters, stage summaries, proppant volumes, fluid volumes, and pressure data for the entire well. The operator's completions engineer reviews it, files it in a shared drive, and may manually enter key parameters into a spreadsheet or database. The rich, high-frequency treating data that was captured at sub-second intervals during the job is compressed into summary statistics and static charts in a document format that no analytics platform can ingest without manual effort.
This is not universal -- larger operators and more sophisticated service companies have moved to structured data delivery. But it remains common enough that it deserves explicit mention. If your completions data arrives as a PDF, your analytics program starts with a data entry problem before it reaches a data science problem.
Production Operations: The SCADA-Centric Architecture
Production operations data architecture is the most varied across operator sizes, because the production phase lasts years or decades (unlike the weeks of drilling and completions), and the systems in place often reflect technology decisions made a decade or more ago.
How Data Flows
The foundational data source for production operations is SCADA (Supervisory Control and Data Acquisition). RTUs (Remote Terminal Units) installed at wellheads and facilities measure pressure, temperature, flow rate, tank level, and equipment status. These measurements are transmitted to a central SCADA host via radio, cellular, or satellite communication, depending on field infrastructure and remoteness.
The major SCADA platforms in upstream production include Emerson (OpenEnterprise SCADA with ROC RTUs), Weatherford CygNet, eLynx, zdSCADA (acquired by Quorum in March 2025), and Inductive Automation's Ignition. The installed base reflects operator size: larger operators tend to run CygNet or Emerson. Mid-size operators increasingly adopt eLynx (at roughly $10 per asset per month) or zdSCADA.
From SCADA, data flows to a historian -- a time-series database optimized for storing high-frequency sensor data. AVEVA PI System dominates this layer, used by an estimated 85% of top oil and gas companies. PI System is arguably the single most entrenched piece of data infrastructure in the industry. Any modern analytics platform that wants to be relevant in upstream production must integrate with PI.
The Morning Report Workflow
The morning report deserves specific attention because it is the single most universal workflow in production operations and it illustrates how the data architecture actually functions (or fails to function) in practice.
Every production engineering team in the industry produces some version of a daily morning report. This report typically includes: which wells are currently shut in and why, production volumes for the previous day (or most recent test data), wells that are underperforming relative to expectation, artificial lift issues, facility status, and a prioritized list of field activities for the day.
Building this report requires pulling data from multiple systems: SCADA for real-time well status, production accounting software for allocated volumes, artificial lift monitoring for equipment health, and the engineer's own knowledge of field conditions that are not captured in any database. At many mid-size operators, this process takes two to three hours every morning and involves a significant amount of manual work.
The Typical Production Data Stack
| Layer | Technology | Notes |
|---|---|---|
| Field sensors | Pressure, temperature, flow, level, vibration | Wellhead, separator, tank battery, compressor |
| RTU/communication | Emerson ROC, ABB, Allen-Bradley | Radio, cellular, satellite backhaul |
| SCADA host | CygNet, eLynx, zdSCADA, Ignition, Emerson OpenEnterprise | Central monitoring and control |
| Historian | AVEVA PI System (85% market share) | Time-series storage; years of data |
| Production surveillance | OspreyData, SLB ForeSite, Ambyint | AI anomaly detection, workflow management |
| Artificial lift optimization | Ambyint (rod pump/ESP), SLB Lift IQ (ESP), Weatherford | Continuous parameter optimization |
| Production accounting | ProdView (Peloton), Quorum ODA, OGsys | Allocation, regulatory reporting |
| Economics/forecasting | PHDWin, Aries, Enverus | Decline curve analysis, reserves |
| Dashboards | Spotfire + Power BI (co-dominant) | Engineering analysis + distribution |
Artificial Lift Surveillance
The majority of producing wells in North America require artificial lift -- rod pumps, ESPs, gas lift, plunger lift, or progressive cavity pumps. Each lift type generates characteristic data:
Rod pumps produce dynamometer cards (surface and downhole) that show the pump's load-displacement cycle. These cards are diagnostic -- an experienced engineer can identify gas interference, fluid pound, rod parting, tubing leak, and other problems from the card shape. Ambyint and others have applied machine learning to automate card classification and optimize pump settings.
ESPs generate motor current, intake pressure, discharge pressure, vibration, and temperature data from downhole gauges. SLB's Lift IQ provides the deepest diagnostics for SLB-manufactured ESPs.
Gas lift optimization requires surface injection rates and pressures, casing pressure measurements, and production rate data. The optimization problem -- allocating limited gas lift gas across multiple wells to maximize total production -- is a classical constrained optimization problem well suited to AI approaches.
The artificial lift monitoring layer is typically separate from the general SCADA/historian architecture, running on the lift equipment manufacturer's software. This creates yet another data silo that the production engineer must check alongside the main SCADA dashboard.
Service Companies: Fleet Management and the Data Handoff Problem
Service companies operate a fundamentally different data architecture than operators, because their business model is fundamentally different. An operator manages a portfolio of wells that produce for years. A service company manages a fleet of equipment that moves from job to job, serving multiple operators across a basin.
The Data Handoff Problem
The most consequential data architecture failure in the upstream industry is not within any single segment -- it is between segments. And nowhere is this more visible than in the data handoff from service company to operator.
When a frac company finishes a well, they possess high-frequency, sub-second treating data across every stage. When a directional drilling company finishes a well, they possess detailed survey data, geosteering logs, and MWD/LWD measurements. When an artificial lift company installs an ESP, they possess the equipment specifications, installation parameters, and initial operating conditions.
How much of this data reaches the operator in a structured, machine-readable format? In many cases, not much. The frac company delivers a PDF job report. The directional drilling company delivers survey files in a proprietary format. The artificial lift company provides an installation report and connects the equipment to a monitoring platform that operates independently from the operator's SCADA system.
The root cause is economic, not technical. Structured data delivery costs money -- data engineering, API development, quality assurance, customer integration support. For service companies operating on thin margins, investing in data delivery infrastructure has historically been seen as a cost center, not a revenue driver. This is changing. Operators are increasingly selecting service companies that bring data capabilities alongside their equipment.
Midstream: Custody Transfer, Compression, and Pipeline Integrity
Midstream operations -- gathering, processing, and transporting hydrocarbons from the wellhead to market -- have data architecture requirements that differ substantially from upstream production, even though the two are physically connected.
Different Data Requirements
Measurement accuracy matters more. In upstream production, a 5% uncertainty in a well's flow rate is normal and manageable. In midstream custody transfer, a 0.5% measurement error can mean millions of dollars in misallocated revenue per year across a gathering system.
Linear asset management. Upstream production manages point assets (wells, facilities). Midstream manages linear assets (pipelines) that can extend for hundreds of miles. The data architecture must associate measurements with specific locations along a pipeline and track conditions over distance.
Compression is the critical variable. In midstream gathering systems, compression availability determines throughput. A compressor station going down can affect dozens of upstream producers. Predictive maintenance for reciprocating compressors is a high-value AI application.
Segment Comparison: Side-by-Side
This table puts the five segments next to each other. The differences in data sources, frequencies, storage, and analytics tools are striking -- and they explain why cross-segment data integration is so difficult.
| Dimension | Drilling | Completions / Frac | Production | Service Companies | Midstream |
|---|---|---|---|---|---|
| Primary data source | Pason EDR, MWD/LWD tools | Frac data acquisition, fiber optic (DAS/DTS) | SCADA RTUs, artificial lift controllers | Equipment telemetry, job execution systems | Custody transfer meters, compressor monitors |
| Update frequency | 1 second (surface); 90 ft (surveys) | Sub-second during pumping; event-driven between stages | 15 sec to 15 min (SCADA); daily (production accounting) | Real-time during jobs; daily for fleet | 1-15 sec (SCADA); hourly (measurement accounting) |
| Data volume per well | 50-200 GB | 10-100+ GB (treating + fiber) | 1-5 GB/year ongoing | Captured per job, not per well | Per pipeline segment |
| Primary standard | WITSML 2.1 / ETP | None widely adopted | PRODML (limited); OPC-UA | Proprietary | API/AGA measurement standards |
| Data lifespan | Weeks (active drilling) | Days (active frac) | Years to decades | Per job (weeks) | Decades (infrastructure life) |
The Integration Challenge: How Data Flows Between Segments
The most valuable analytics in upstream oil and gas require data that crosses segment boundaries. Did our completions design changes improve production performance? How does drilling quality affect completions execution? Which artificial lift strategy works best given the well's completion design and reservoir characteristics?
These questions require connecting drilling data to completions data to production data -- and the industry is remarkably bad at doing this.
The Well Lifecycle Data Problem
A well passes through three operational phases -- drilling, completions, production -- each managed by different teams using different systems. In a well-organized operator, the data handoff looks like this:
Drilling to completions. The drilling engineer delivers a wellbore: trajectory surveys, formation tops, mud log, cement evaluation log, casing tallies. This data should inform completions design. In practice, the completions engineer often re-derives much of this information from raw data because the drilling database and the completions design tools do not share a common data model.
Completions to production. The completions engineer delivers a stimulated well: stage locations, treating parameters per stage, proppant volumes, fluid volumes, ISIP data, any fiber optic or microseismic results. In practice, the production engineer often receives a summary (or a PDF) and builds their own expectations from type curves and offset wells rather than from the actual completions data for that specific well.
Production back to drilling and completions. Over time, the producing well generates the data that reveals whether the drilling and completions decisions were correct. But feeding production performance back into drilling and completions design for the next well requires connecting production databases with drilling databases and completions databases. This cross-database connection is where most operators' data architecture fails.
Why Integration Is So Hard
The barriers are not primarily technological. They are organizational and structural:
- •Different teams, different systems. Drilling engineering, completions engineering, and production engineering are typically separate teams with separate budgets, separate software licenses, and separate data management practices.
- •Different time scales. Drilling data is generated over weeks. Completions data is generated over days. Production data is generated over years. The time-series characteristics of each dataset are fundamentally different.
- •Different vendors, different data models. Pason's data model for drilling parameters does not map naturally to Liberty's data model for frac treating parameters, which does not map naturally to CygNet's data model for production SCADA.
- •Service company boundaries. The drilling data belongs partly to the drilling service companies. The completions data belongs partly to the pressure pumping company. The production data belongs to the operator. Assembling a complete well dataset requires collecting data from multiple entities.
- •No universal well identifier. There is no single, consistent identifier that connects a well across all systems. The API number should serve this purpose, but in practice, wells are identified differently in different systems. The first step in any cross-segment analysis is building a well-matching table, which is exactly as tedious as it sounds.
What Operators Who Get It Right Actually Do
The operators who have solved (or at least significantly improved) the cross-segment data integration problem share several characteristics:
- •A unified well data model. They have invested in a common data model that represents a well as a single object with drilling, completions, and production attributes. Peloton's WellView Allez platform is designed for this purpose. OSDU (Open Subsurface Data Universe) aspires to be the industry standard, but adoption remains limited.
- •Data engineering investment. They employ data engineers (not just petroleum engineers who write Python scripts) to build and maintain the pipelines that connect segment-specific databases. Permian Resources' investment in Databricks, Dagster, and dbt is a concrete example.
- •Structured data delivery requirements in service contracts. They contractually require service companies to deliver data in structured, machine-readable formats with agreed-upon field naming conventions and delivery timelines.
- •A single source of truth for well attributes. They maintain a master data management system where well identifiers, locations, formation tops, and key operational dates are managed centrally and referenced by all other systems.
Practical Implications: What This Means for Your Organization
If You Are an Operator
- •Audit the handoff points. Map exactly how data flows from drilling to completions to production within your organization. Where does structured data become a PDF? Where does an engineer manually re-enter numbers? Those handoff points are where data value is destroyed and where automation investment will pay the highest returns.
- •Standardize well identifiers. Pick one well identification scheme and enforce it across all systems. This is an IT governance problem, not a technology problem, and it is a prerequisite for any cross-segment analytics initiative.
- •Require structured data from service companies. Put data delivery requirements in your service contracts. Specify format (CSV, Parquet, API access), field naming conventions, delivery timeline, and data quality expectations.
- •Start with the drilling-to-completions link. Of the three cross-segment handoffs, drilling-to-completions is the most tractable and has the most immediate value.
If You Are a Service Company
- •Structured data delivery is becoming a competitive requirement. Operators are selecting service companies partly on data capabilities. If your data delivery is a PDF, you are at a disadvantage against competitors who provide API access or structured data files.
- •Your proprietary data is your most undervalued asset. The data you accumulate across thousands of jobs is a strategic asset that most service companies underinvest in. Building analytics on that data is the highest-ROI technology investment most service companies can make.
If You Are Building Technology for the Industry
- •Integration is the product. The individual segments are reasonably well served by existing software. What the industry lacks is the connective tissue between segments. Products that bridge WITSML drilling data to completions treatment data to SCADA production data address the most painful gap in the current technology landscape.
- •PI System integration is mandatory. With 85% market penetration among major operators, any analytics product that cannot ingest data from AVEVA PI System is not a serious product for the upstream market.
The most impactful AI projects in oil and gas are not the ones with the most sophisticated algorithms. They are the ones that successfully connect data across segment boundaries -- linking drilling data to completions data to production data into a unified well lifecycle view. That data architecture -- the mundane, unglamorous work of well matching, time alignment, format conversion, and quality assurance -- is where the real competitive advantage lies.
Need help building cross-segment data architecture for your operations? Get in touch.