Editorial disclosure
This article reflects the independent analysis and professional opinion of the author, informed by published research, vendor documentation, industry surveys, and practitioner experience. No vendor reviewed or influenced this content prior to publication. Product capabilities described are based on publicly available information and may not reflect the latest release.
A Permian Basin well pad generates roughly 1-2 GB of raw sensor data per day. That includes tubing pressure sampled every few seconds, casing pressure, flow rates from multiphase meters, rod pump controller diagnostics, tank levels, ambient temperature, and gas detection readings. Multiply that by 500 wells across a lease, and you have a fire hose of data pointed at a cellular modem with the bandwidth of a 2005 home internet connection.
This is the fundamental tension in oilfield data ingestion: the places that generate the most data have the least infrastructure to move it. A rig in the Delaware Basin is not a data center in Dallas. It is a remote site with unreliable cellular coverage, no fiber, extreme temperatures, and equipment that was installed before anyone was talking about the "digital oilfield." Getting data from that site into a cloud analytics environment where it can actually be useful is a harder engineering problem than most people outside the industry realize.
This article is the second in an eight-part series on oil and gas data architecture. The first article covered data sources and field systems -- the SCADA, RTU, EDR, and sensor systems that generate the data. This article covers what happens next: how that data gets from the wellsite to the cloud. It covers the protocols, brokers, gateways, edge compute platforms, and cloud ingestion services that make up Layer 2 of the upstream data pipeline, along with the bandwidth constraints, architecture patterns, and practical trade-offs that determine what actually works in the field.
The Ingestion Problem in Oil & Gas
Before we get into specific technologies, it is worth understanding why data ingestion in upstream oil and gas is fundamentally different from data ingestion in most other industries.
Distance and isolation. Wells spread across hundreds of square miles of remote terrain. No wired internet. Inconsistent power. Many sites run on solar panels and batteries.
Bandwidth constraints. A well pad with line-of-sight to a cell tower might get 10-20 Mbps on LTE. A pad behind a ridge might get 500 Kbps or lose connectivity for hours. Most wellsite data systems assume connectivity is unreliable and bandwidth is limited.
Protocol fragmentation. A Modbus RTU from 2010 communicates differently than an OPC-UA controller from 2024. Drilling data flows through WITSML. Production data might come through Modbus TCP, HART, or proprietary serial protocols. Getting all of this into a common format requires protocol translation at the edge.
Mixed latency requirements. A gas detection alarm needs to trigger in seconds. A rod pump failure prediction needs data within minutes. A decline curve update can wait for the daily allocation run. The architecture must support multiple latency tiers simultaneously.
Protocols: The Language of Oilfield Data
Understanding the protocol landscape is essential before evaluating any ingestion technology. The protocols determine what data you can access, how fast you can access it, and how much engineering work is required to move it.
Modbus (TCP and RTU)
Modbus has been around since 1979 and is everywhere. The majority of RTUs and PLCs at wellsites still speak Modbus in some form. It is simple, reliable, and has zero overhead. It is also completely unstructured -- Modbus gives you register addresses and raw values with no metadata, no timestamps from the source, and no data model. You need to know that register 40001 is tubing pressure in PSI, because the protocol will not tell you. Any system pulling data from Modbus devices needs a register map, and managing these maps across hundreds of wellsites with different RTU configurations is a significant operational burden.
OPC-UA (Unified Architecture)
OPC-UA is the industry standard for structured, secure, real-time data exchange between industrial systems and higher-level applications. Unlike Modbus, OPC-UA carries rich metadata: name, description, data type, unit of measure, quality indicator, and timestamp for each data point. Its PubSub extension (Part 14) enables MQTT bridging, which is critical for modern edge-to-cloud architectures. In upstream, OPC-UA bridges the gap between legacy field equipment (Modbus/HART) and modern cloud platforms.
MQTT (Message Queuing Telemetry Transport)
MQTT is the protocol that changed oilfield data ingestion. Designed in 1999 for monitoring oil pipelines via satellite, MQTT was purpose-built for the constraints that upstream operators face daily: low bandwidth, unreliable connectivity, limited device resources. MQTT 5.0, standardized in 2019 and now widely adopted, adds features like shared subscriptions, message expiry, and topic aliases that make it even more suitable for large-scale oilfield deployments.
Key characteristics: publish/subscribe model that decouples producers from consumers; three QoS levels (at-most-once for bulk sensor data, at-least-once for production data, exactly-once for critical alarms); persistent sessions that queue messages during connectivity outages; and 2-5 byte packet headers vs. hundreds for HTTP -- a difference that matters on metered cellular connections across thousands of data points.
SparkplugB
SparkplugB is a specification built on top of MQTT that adds standardized topic namespaces, Protocol Buffers payloads, device birth/death certificates, and state management. Think of it as "MQTT with an opinion about how industrial data should be organized." Adoption is growing, particularly with Ignition (Inductive Automation) deployments that have built-in SparkplugB support.
WITSML and ETP
For drilling data, WITSML 2.1 remains the industry standard. The newer Energistics Transfer Protocol (ETP 1.2) replaces SOAP/XML with WebSocket-based near-real-time streaming. This is covered in detail in our WITSML deep dive.
MQTT Brokers: The Hub of Oilfield Data Flow
An MQTT broker is the central message routing system that receives data from field devices (publishers) and distributes it to consuming applications (subscribers). For an operator running hundreds or thousands of wells, the MQTT broker is one of the most critical pieces of infrastructure -- if it goes down, data stops flowing from the field.
HiveMQ
HiveMQ is the enterprise MQTT broker most commonly referenced in oil and gas. It provides clustering for high availability, an enterprise bridge for connecting brokers across sites, native Kafka integration (the architecture pattern progressive operators are moving toward), full MQTT 5.0 support, and persistence that survives broker restarts. Licensing is based on connected devices and throughput, aligning well with the operator model of scaling by wells.
EMQX
EMQX is the open-source alternative handling millions of concurrent MQTT connections. Both community and enterprise editions are available, with the enterprise version offering clustering and persistence comparable to HiveMQ. For operators with engineering teams comfortable managing open-source infrastructure, EMQX is a strong option at significantly lower cost.
AWS IoT Core and Azure IoT Hub
Both major cloud providers offer managed MQTT services. AWS IoT Core and Azure IoT Hub provide MQTT endpoints with automatic integration into their respective cloud ecosystems. The advantage is zero infrastructure management. The disadvantage is that they require direct internet connectivity and do not provide local buffering during outages unless paired with an edge gateway. For operators with reliable cellular, cloud-managed MQTT is the simplest architecture. For spotty coverage areas, a local broker with store-and-forward to the cloud is more resilient.
Mosquitto
Eclipse Mosquitto is a lightweight, open-source broker that runs on minimal hardware, ideal as a local broker at the edge that aggregates data from multiple devices at a pad before forwarding to a central broker or cloud service. Many architectures use Mosquitto at the wellsite and HiveMQ or a cloud service at the hub.
OPC-UA Gateways: Bridging Legacy Equipment to Modern Protocols
The reality at most wellsites is that existing RTUs and PLCs speak Modbus, HART, or proprietary protocols, not MQTT or OPC-UA. OPC-UA gateways are the translation layer that converts legacy protocols into structured, metadata-rich data that modern ingestion systems can consume.
Kepware (KEPServerEX) -- PTC
KEPServerEX is the most widely deployed OPC gateway in the industry, supporting over 150 device drivers for virtually every industrial protocol. It acts as an OPC-UA server and includes an IoT Gateway module that publishes directly to MQTT brokers or cloud services. In a typical upstream deployment, Kepware runs on a Windows server at a central facility, connects to RTUs via Modbus, translates data to OPC-UA tags, and forwards to MQTT or cloud ingestion. The main limitation is Windows dependency, though PTC has been adding containerized deployment options.
Softing
Softing's uaGate series are dedicated hardware gateways that convert Modbus, PROFINET, and EtherNet/IP to OPC-UA and MQTT. Unlike Kepware's software-on-PC approach, Softing devices are hardened for industrial environments -- ideal for operators who want a simple gateway at each facility without managing Windows PCs in the field.
HMS Networks (Anybus) and Matrikon (Honeywell)
HMS Networks' Anybus X-gateway provides hardware-based, configuration-driven protocol translation (Modbus to OPC-UA, Modbus to MQTT). Matrikon, now part of Honeywell, specializes in OPC-UA tunneling for environments where data must traverse firewalls or WAN connections securely -- common when connecting remote sites to central SCADA.
The Gateway Decision Matrix
| Scenario | Recommended Approach |
|---|---|
| Single facility, many protocol types | KEPServerEX (breadth of drivers) |
| Remote wellsite, no Windows infrastructure | Softing uaGate or Anybus hardware gateway |
| Multiple sites, need secure WAN traversal | Matrikon OPC-UA tunneling |
| New installation, clean-sheet design | Native MQTT/SparkplugB devices (skip OPC-UA translation) |
| Edge compute already deployed | Software gateway (Kepware or open-source) running on edge device |
Edge Computing: Intelligence at the Wellsite
Edge computing in oil and gas means running compute workloads at or near the wellsite rather than sending all data to the cloud for processing. The motivations are straightforward: reduce bandwidth consumption, enable real-time local decision-making, maintain operation during connectivity outages, and filter or compress data before transmission.
Why Edge Matters in Upstream
Consider a rod pump controller generating vibration data at 1 kHz (1,000 samples per second) for predictive maintenance. Sending that raw data to the cloud would require roughly 86 MB per day per sensor -- multiply by 10 sensors per well and 500 wells, and you need 430 GB per day of upstream bandwidth. That is not feasible over cellular. But if an edge device at the wellsite runs a simple vibration signature model, it can reduce the data to a few KB of summary statistics and anomaly flags per hour. The raw data stays at the edge (or is discarded) and only actionable information goes to the cloud.
This pattern -- compute at the edge, transmit summaries to the cloud -- is the fundamental architecture for bandwidth-constrained oilfield environments.
Azure IoT Edge
Azure IoT Edge is the dominant edge platform in upstream oil and gas, a direct result of Microsoft Azure's 57% cloud market share among operators. Azure IoT Edge runs containerized workloads on Linux or Windows devices at the edge, managed from the Azure IoT Hub cloud service.
Key capabilities: ML model deployment at the edge (train in Azure ML, deploy as containers via IoT Hub), local data buffering during connectivity outages, pre-built OPC-UA Publisher module, and device management across hundreds of sites.
Chevron has been the highest-profile adopter, scaling from pilot to multi-site deployment. The Chevron pattern -- Azure IoT Edge at wellsites, IoT Hub for management, Event Hubs for streaming, ADLS Gen2 for storage -- has become the reference architecture. The limitation is that Azure IoT Edge requires Linux/Windows on industrial PCs, with higher per-site hardware cost than simple MQTT gateways.
AWS IoT Greengrass
AWS IoT Greengrass is Amazon's edge computing platform and the primary alternative for operators whose cloud strategy is AWS-based. Greengrass runs on Linux devices and provides local compute, messaging, ML inference, and data sync capabilities similar to Azure IoT Edge.
Greengrass includes Lambda at the edge, a local MQTT broker for device-to-device communication without cloud connectivity, ML inference via SageMaker Neo, and stream management with configurable batching and compression. OSDU on AWS uses Greengrass for edge collection, and Baker Hughes' Leucipa runs on AWS, making Greengrass the natural choice for operators in that ecosystem. Lower adoption than Azure IoT Edge overall, but significant among AWS-committed operators.
FogHorn Lightning (Johnson Controls)
FogHorn, acquired by Johnson Controls, is a specialized edge AI platform built specifically for industrial environments. Where Azure IoT Edge and Greengrass are general-purpose edge platforms adapted for industrial use, FogHorn was designed from the ground up for streaming sensor data analytics.
FogHorn's CEP engine evaluates multi-variable rules on streaming sensor data -- for example, triggering when tubing pressure drops below 200 PSI AND casing pressure exceeds 500 PSI AND rod pump load matches an anomaly signature, all at the edge without cloud connectivity. It runs on minimal hardware including ARM devices. The trade-off: less cloud ecosystem integration than Azure IoT Edge or Greengrass.
Litmus Edge
Litmus Edge connects to over 250 PLCs and industrial controllers out of the box, solving the "connect everything" problem for operators with diverse field equipment. It provides data normalization, edge analytics, multi-cloud connectors, and no-code configuration. Litmus replaces the combination of OPC-UA gateways, MQTT brokers, and edge compute in a single platform -- simpler architecture, but single vendor dependency.
Emerson Plantweb Edge and SignalFire
Emerson's edge offering (Plantweb Optics) is tightly integrated with DeltaV DCS and ROC RTUs -- the natural choice for Emerson-standardized operators, but less protocol-agnostic for mixed environments.
SignalFire takes a fundamentally different approach: low-power wireless sensors with built-in LTE-M/NB-IoT modems that publish directly to MQTT/SparkplugB brokers. No RTU, no gateway, no edge computer. For wellsites with no existing monitoring (still common among small operators), this is the fastest path to data.
Edge Platform Comparison
| Platform | Best For | ML at Edge | Cloud Tie-In |
|---|---|---|---|
| Azure IoT Edge | Azure-committed operators | Yes (Azure ML) | Azure |
| AWS Greengrass | AWS-committed operators | Yes (SageMaker Neo) | AWS |
| FogHorn Lightning | Real-time pattern detection | Yes (CEP engine) | Cloud-agnostic |
| Litmus Edge | Multi-vendor field equipment | Basic | Multi-cloud |
| Emerson Edge | Emerson DeltaV/ROC shops | Yes | Multi-cloud |
| SignalFire | Greenfield monitoring | No | MQTT broker |
Cloud Ingestion: Landing Data in the Analytics Environment
Once data leaves the edge -- whether via MQTT, REST APIs, or direct upload -- it needs to land in a cloud environment where it can be stored, processed, and analyzed. The cloud ingestion layer handles receiving, buffering, and routing this data.
Azure Event Hubs (Dominant)
Azure Event Hubs is the most widely used cloud streaming ingestion service in upstream oil and gas, driven by Azure's overall 57% market share in the sector. Event Hubs is a fully managed event streaming platform capable of receiving millions of events per second with low latency.
Event Hubs receives MQTT data via IoT Hub, OPC-UA data via the Edge OPC Publisher module, batch uploads via REST APIs, and WITSML data via custom connectors. It also supports Apache Kafka protocol, so existing Kafka applications connect without modification.
The typical pattern: field devices publish MQTT to IoT Hub, which routes to Event Hubs, where Stream Analytics or Databricks Structured Streaming processes data in near-real-time before landing in Azure Data Lake Storage Gen2. Equinor's Omnia platform uses exactly this architecture.
Apache Kafka / Confluent
Apache Kafka deserves special attention in the oil and gas context because of one feature that is critical for unreliable network environments: persistent, replayable logs. When data is published to a Kafka topic, it is stored on disk with a configurable retention period (days, weeks, or forever). Any consumer can read from any point in the log, replay data, or recover from failures without data loss. This is fundamentally different from a message queue where messages are deleted after consumption.
This means no data loss during outages (data persists in the log), multiple independent consumers from one stream (alerting, allocation, and ML training all read the same data), and replay for debugging (re-read raw data when an anomaly model misfires). Kafka Connect provides pre-built connectors for OPC-UA, MQTT, Modbus, and databases. Confluent Cloud removes the operational burden of self-managed Kafka, which is notoriously complex.
The pragmatic Azure guidance: use Event Hubs for straightforward sensor data ingestion and add Kafka when you need replay, multi-consumer patterns, or complex stream processing.
AWS Kinesis and Google Pub/Sub
AWS Kinesis is the equivalent for AWS-based operators. Kinesis Data Firehose is remarkably simple for basic needs (collect sensor data, land in S3), while Kinesis Data Streams paired with Lambda provides real-time processing. OSDU on AWS uses Kinesis as the ingestion layer.
Google Cloud Pub/Sub is the least common in upstream (13% cloud market share), though the Aramco CNTXT joint venture (Google Cloud reseller for MENA) and OSDU partnership may drive adoption. Technically capable, but fewer reference architectures for upstream operations.
AVEVA PI Connectors
For operators running AVEVA PI System (which is 85% of the top oil and gas companies), PI Connectors and Adapters represent a distinct ingestion path. PI includes over 225 protocol-specific connectors that collect data from field systems and write it directly to the PI historian. AVEVA Adapters extend this to modern protocols including OPC-UA, MQTT, and Modbus TCP, and run on Windows, Linux, and Docker.
This is worth highlighting because many operators do not need to build a separate ingestion pipeline -- their PI System already handles data collection from the field. The challenge is getting data OUT of PI and into a modern analytics environment (data lake, Databricks, etc.), which is a different problem covered in the data lake and processing layers of this series. But for the ingestion layer specifically, PI's native connectors often mean that the "build vs. buy" decision is already made.
Cognite Data Fusion Extractors
Cognite Data Fusion (used by Equinor, Saudi Aramco, BP, OMV, Wintershall Dea) provides extractors for PI historians, OPC-UA, MQTT, databases, and files. CDF contextualizes raw data with asset hierarchies and metadata. For Cognite-committed operators, this replaces the need for a separate ingestion pipeline.
Real-Time vs. Batch: What Actually Needs to Be Real-Time?
One of the most common mistakes in oilfield data architecture is treating all data as if it needs real-time ingestion. Most analysis happens on daily or hourly aggregations, not sub-second streams.
Data Latency Tiers
| Tier | Latency | Use Cases | Ingestion Method |
|---|---|---|---|
| Critical alerts | < 5 seconds | Gas detection, pressure exceedances, safety shutdowns | Edge processing + MQTT QoS 1/2 |
| Operational monitoring | 1-15 minutes | Rod pump status, ESP performance, flow rates | MQTT QoS 0/1, polling |
| Production accounting | 1-24 hours | Daily production volumes, allocation, reporting | Batch upload, scheduled extraction |
| Engineering analysis | Daily-weekly | Decline curves, type curves, reservoir modeling | Batch upload, file transfer |
| Regulatory reporting | Monthly-quarterly | State production reports, emissions reports | Batch export from production database |
Only Tier 1 actually requires sub-minute streaming at the edge. Tiers 3-5 work perfectly with batch ingestion. An operator running 500 wells does not need Kafka for decline curve analysis -- Event Hubs for alerts plus a nightly batch job from PI covers 90% of analytical needs. The fully streaming architecture is appropriate only when running real-time ML models at scale.
The Hybrid Pattern
The architecture that works for most operators combines streaming and batch:
CRITICAL PATH (streaming):
Wellsite sensors → Edge (alert rules) → MQTT → Cloud broker → Stream processor → Alert dashboard
OPERATIONAL PATH (near-real-time):
SCADA RTU → MQTT (15-min intervals) → Cloud ingestion → Time-series DB → Monitoring dashboard
ANALYTICAL PATH (batch):
PI Historian → Nightly extraction job → Cloud data lake → dbt transformation → Analytics DB → Spotfire/Power BI
This hybrid approach uses streaming infrastructure only where latency matters, batch infrastructure where it doesn't, and keeps costs and complexity manageable.
Bandwidth Constraints and the Starlink Factor
The Bandwidth Reality
Cellular connectivity in major producing basins varies dramatically by location and carrier:
| Connection Type | Typical Bandwidth | Latency | Monthly Cost |
|---|---|---|---|
| LTE (good signal) | 10-50 Mbps | 30-50 ms | $50-100/device |
| LTE-M / NB-IoT | 0.1-1 Mbps | 100-500 ms | $5-15/device |
| VSAT satellite | 1-10 Mbps | 600+ ms | $200-500+/month |
| Licensed radio | 0.1-1 Mbps | 10-50 ms | Capital cost, no monthly |
| Starlink | 50-200 Mbps | 25-60 ms | $120-500/month |
The key takeaway: most wellsite data systems are designed for 1-10 Mbps effective bandwidth. This is enough for time-series sensor data (small messages, high frequency) but not enough for raw video, high-resolution images, or bulk file transfers. Edge computing exists precisely to keep the heavy data local and send only summaries over the constrained link.
Starlink's Impact on Oilfield Data Architecture
SpaceX's Starlink has changed the connectivity equation in remote oilfield locations more dramatically than any technology in the last decade. Where a wellsite previously had two options -- unreliable cellular or expensive, high-latency VSAT -- Starlink provides 50-200 Mbps with 25-60 ms latency at a fraction of VSAT cost.
The practical impact on data architecture:
What Starlink enables: streaming video from wellsite cameras, full WITSML streams from drilling EDR, remote desktop access to field SCADA, and cloud-hosted SCADA where field devices connect directly via Starlink.
What Starlink does NOT change: the need for edge buffering (Starlink goes down too), the economic case for edge computing in bandwidth reduction (raw 1 kHz vibration data is still wasteful to transmit when a local model can reduce it to anomaly flags), and the power challenge at remote sites (Starlink dishes draw 75-100W, significant for solar-powered sites running 20-50W total).
Operators are deploying Starlink at high-value sites (rigs, multi-well pads, central facilities) while remote single-well sites continue on LTE-M/NB-IoT. The result is a tiered architecture:
| Site Type | Connectivity | Rationale |
|---|---|---|
| Drilling rigs | Starlink + LTE backup | High data volume, high value, crew on site |
| Multi-well pads (8+ wells) | Starlink or LTE | Enough value to justify $120-500/month |
| Central facilities | Starlink or fiber | Aggregation point, high bandwidth needs |
| Remote single wells | LTE-M / NB-IoT | Low power, low cost, limited data needs |
| Zero-infrastructure wells | SignalFire or manual gauging | Lowest cost, battery-operated |
Architecture Patterns by Company Size
The right ingestion architecture depends on the operator's scale, budget, existing infrastructure, and analytical ambitions. Here are the patterns that work for each segment.
Supermajors and Large Independents (5,000+ wells)
Budget: $20M-$1B+ annual digital/IT spend
Pattern: Full streaming architecture with enterprise edge deployment.
[Wellsite RTU/PLC]
↓ (Modbus/HART)
[OPC-UA Gateway (Kepware/Softing)]
↓ (OPC-UA)
[Edge Compute (Azure IoT Edge)]
↓ (MQTT)
[Cloud IoT Hub (Azure IoT Hub)]
↓
[Cloud Streaming (Event Hubs / Kafka)]
↓
[Stream Processing (Databricks / Stream Analytics)]
↓
[Data Lake (ADLS Gen2)]
Real examples:
- •Chevron: Azure IoT Edge at wellsites, scaling multi-site. Azure IoT Hub, Event Hubs, ADLS Gen2.
- •Equinor: Omnia platform on Azure. Cognite Data Fusion extractors for PI, OPC-UA, MQTT. Azure Event Hubs for streaming.
- •Shell: Azure IoT, C3 IoT for predictive maintenance, Azure Databricks for ML.
- •ExxonMobil: Microsoft Azure + IoT, IBM data platforms, Palantir Foundry.
Mid-Size Operators (500-5,000 wells)
Budget: $5-20M annual IT spend
Pattern: Hybrid ingestion with selective edge deployment.
[Wellsite RTU]
↓ (Modbus TCP)
[SCADA Host (CygNet/zdSCADA/eLynx)]
↓ (PI Connectors or direct DB)
[PI Historian or SCADA Database (on-prem)]
↓ (Scheduled extraction)
[Cloud Ingestion (Event Hubs or direct to Data Lake)]
↓
[Data Lake (ADLS Gen2 or S3)]
The biggest mistake mid-size operators make: Trying to build a supermajor-style streaming architecture without the team to maintain it. A scheduled batch extraction from PI or SCADA to a cloud data lake, combined with MQTT alerting for critical events, covers 90% of analytical use cases at 20% of the infrastructure cost.
Small Operators (50-500 wells)
Budget: $100K-$2M annual IT spend
Pattern: SaaS-hosted SCADA with minimal custom infrastructure.
[Wellsite RTU]
↓ (Cellular)
[Hosted SCADA Service (eLynx / zdSCADA cloud)]
↓ (API or data export)
[Spreadsheet or Simple Database (SQL Server / Excel)]
The right advice for small operators: Do not build infrastructure. Use eLynx or zdSCADA's hosted platform and focus your limited IT budget on getting data OUT of those systems and into an analytics tool (Power BI at $10/user/month). The ingestion problem is solved by the SCADA vendor. The analytics problem is what needs your attention.
PE-Backed Operators (Post-Acquisition)
Post-acquisition operators inherit multiple SCADA systems, historians, and data models. The ingestion problem is not "how do I get data from the field to the cloud" -- it is "how do I get data from five different SCADA systems into one dashboard."
The practical approach: do not rip and replace field SCADA immediately. Standardize at the cloud layer using Kafka or Event Hubs as a common ingestion point, normalize data in the processing layer, and migrate field systems gradually as equipment reaches end-of-life. This is one of the highest-value consulting engagements in the mid-market.
Common Ingestion Failures and How to Avoid Them
No store-and-forward at the edge. When cellular drops, data is lost if the field device does not buffer locally. MQTT persistent sessions and edge platforms handle this, but someone has to configure and test it. Operators have lost weeks of production data because the RTU's buffer was set to 2 hours and a cellular outage lasted 3 days.
Timestamp confusion. Oilfield data has at least three timestamps: sensor measurement, RTU collection, and cloud receipt. If your pipeline uses the wrong one, your time-series analysis produces wrong answers. Always capture and propagate the source timestamp, and use NTP/GPS time sync on edge devices. More detail in our SCADA data quality checklist.
Building for peak instead of average. Sizing infrastructure for worst-case leads to expensive systems running at 5% utilization. Use auto-scaling cloud services (Event Hubs throughput units, Kinesis shards) that scale with demand.
Ignoring the existing PI historian. If you already have PI collecting field data (85% of top operators do), extract from PI rather than building a parallel edge-to-cloud pipeline. AVEVA's PI Cloud connectors make this straightforward.
Vendor lock-in at the protocol level. Insist on MQTT and/or OPC-UA support for any new field equipment. For existing equipment, use OPC-UA gateways to translate proprietary protocols to standards-based ones.
What Comes Next
This article covered the ingestion and edge computing layer -- getting data from field sensors through protocols, brokers, gateways, and edge compute into cloud ingestion services. The next article in this series covers data lakes and raw storage -- where the data lands once it arrives in the cloud, how to organize it, and the choices between ADLS Gen2, S3, Databricks Delta Lake, and Snowflake.
For related topics, see:
- •The Complete Data Pipeline for Oil & Gas -- the master article for this series
- •Oil & Gas Data Sources: SCADA Sensors to Drilling EDR -- Sub-Article 1 on field systems and sensors
- •SCADA Data Quality for AI: The Audit Checklist -- data quality issues that originate in the ingestion layer
- •Drilling Data Management: WITSML to Cloud -- deep dive on WITSML-specific ingestion
Ready to Get Started?
Let's Build Something Together
Whether you're exploring AI for the first time or scaling an existing initiative, our team of petroleum engineers and data scientists can help.
Get in Touch →