Software & Data

Databases and Historians for Oil & Gas: AVEVA PI, Snowflake, PostgreSQL, and Everything In Between

Dr. Mehrdad Shirangi | | 24 min read

Editorial disclosure: This article reflects the independent analysis and professional opinion of the author, informed by published research, vendor documentation, and practitioner experience. No vendor reviewed or influenced this content prior to publication. Product capabilities described are based on publicly available information and may not reflect the latest release.

Every upstream oil and gas operator has the same fundamental problem: data lives in too many places, in too many formats, and nobody agrees on which system is the source of truth.

The SCADA historian has the real-time sensor data. The production database has the allocation volumes. The economics software has the decline curves. The ERP has the costs. The land database has the lease terms. The geologist's workstation has the well logs. And the field engineer's laptop has the spreadsheet that somehow reconciles all of it -- at least for the wells they personally manage.

This fragmentation is not a technology problem. Every piece of software in that stack was designed for a specific purpose and does that job reasonably well. The problem is that modern analytics, AI, and decision support require data from across all of these systems, and the storage layer was never designed for cross-system queries.

This article maps the database and historian landscape in upstream oil and gas: what each system does, where it fits, and where it falls short. Whether you are a production engineer wondering why your PI data does not match your production accounting numbers, or a data architect designing a modern analytics stack, this is the practical guide to the storage and query layer.


Process Historians: The Backbone of Operational Data

AVEVA PI System (Formerly OSIsoft PI)

If you work in upstream oil and gas and you deal with operational data, you almost certainly deal with the PI System. AVEVA PI (formerly OSIsoft PI, before AVEVA acquired OSIsoft in 2021 for $5 billion) is used by approximately 85% of the top oil and gas companies worldwide. That number is not marketing -- it reflects a genuine installed base built over more than three decades.

PI is a process historian: a specialized database designed to store high-frequency time-series data from industrial sensors. It ingests data from SCADA systems, DCS controllers, flow meters, pressure transmitters, and other field instruments through more than 225 protocol connectors, and stores it in a compressed, proprietary format optimized for sequential writes and time-range queries.

What PI does well:

Compression and storage efficiency. PI uses a lossy compression algorithm (swinging door compression) that reduces storage requirements by 90% or more while preserving the meaningful shape of the data. A wellhead pressure signal sampled every second generates 86,400 data points per day per tag. PI compresses that to a few hundred points while maintaining the pressure profile that an engineer would care about. This is why operators can store decades of data for thousands of sensors without exotic storage infrastructure.

Ingestion reliability. PI was designed for environments where data must not be lost. Its store-and-forward architecture buffers data at the source during network outages and backfills when connectivity is restored. For upstream operations with intermittent field communications, this is not a nice-to-have -- it is essential.

Broad protocol support. PI Connectors and PI Interfaces support virtually every industrial protocol in use: OPC-UA, OPC-DA, Modbus TCP/RTU, MQTT, and dozens of vendor-specific protocols. If your SCADA system or DCS can output data, PI can probably ingest it.

The installed base itself. Decades of tag configurations, asset frameworks, and institutional knowledge are embedded in PI deployments. The cost of replacing PI is not the software license -- it is the thousands of hours of configuration that maps physical equipment to data tags and the organizational knowledge of how to interpret the data.

What PI does not do well -- and this is where operators get into trouble:

PI is not an analytics engine. PI was designed for time-series storage and retrieval. It can calculate averages, totals, and interpolated values over time ranges, but it cannot perform the kinds of multi-dimensional queries, joins, and aggregations that analytics workloads require. Asking PI to correlate production rates with weather data, completions parameters, and artificial lift settings across 500 wells is asking it to do something it was never designed for.

PI is not a data warehouse. Despite this, many operators treat it like one. They build downstream reporting and dashboards that query PI directly for operational KPIs, daily production summaries, and performance benchmarking. This works at small scale but creates performance problems and architectural fragility at scale. PI's query language (PI SQL, Asset Framework analytics) is not SQL and does not support the kinds of complex queries that production engineers actually need.

PI's data model is flat. A PI tag is a name-value-timestamp triple. The relationships between tags (this pressure sensor belongs to this wellhead, which belongs to this well, which is in this field) are maintained in the AF (Asset Framework), but AF is a configuration layer, not a relational database. Complex queries that traverse the asset hierarchy are possible but slow and awkward compared to a properly modeled relational or dimensional database.

Honeywell PHD and AspenTech IP.21

Honeywell PHD is the second-most-common process historian in the energy industry, with particular strength in downstream refining and chemical processing. In upstream operations, PHD appears primarily at larger production facilities, gas processing plants, and offshore platforms where Honeywell Experion PKS is the DCS platform. AspenTech IP.21 occupies a similar niche, strong in refining and chemical, appearing in upstream at gas processing plants and larger facilities. Inmation (acquired by AspenTech) offers a more modern, cloud-native architecture, also marketed under Emerson's Plantweb Optics brand.

For operators evaluating alternatives to PI, the practical difference is the ecosystem: PI has more third-party connectors, more analytics tools that integrate natively (Seeq was designed PI-first), and a larger community of practitioners. PHD and IP.21 are competent historians but lack the integration breadth.

Canary Labs and dataPARC

Canary Labs and dataPARC represent the next tier -- smaller companies offering historian plus analytics plus visualization in more integrated packages. They are growing as alternatives to PI for operators who find AVEVA's licensing and support model frustrating (a sentiment that has grown since Schneider Electric's acquisition of AVEVA). Canary in particular has gained traction with mid-size operators looking for a historian that is easier to deploy and manage than PI.


Production Databases: Aries, PHDWin, and OFM

Process historians store sensor data. Production databases store a different kind of data: well-level production volumes, reserves estimates, decline curves, economic analyses, and regulatory reporting data. These are the databases that petroleum engineers and reservoir engineers live in every day.

PHDWin (S&P Global)

PHDWin (formerly from TRC Consultants, now S&P Global) is the leading petroleum economics and reserves evaluation software in North America. It stores production forecasts, decline curves, economic assumptions (pricing, operating costs, capital costs), reserves categories, and regulatory data in a proprietary database format.

PHDWin's strength is speed and flexibility for reserves evaluation workflows. A reserves engineer can load production data, fit decline curves, assign reserves categories, and generate SEC-compliant reserves reports from a single application. The database can handle thousands of wells, and the application is purpose-built for the workflow that reserve evaluators actually perform.

PHDWin's weakness is that it is a siloed application database, not a general-purpose data store. Getting data out of PHDWin for use in other analytics tools typically requires exports to CSV or Excel, or use of PHDWin's data access APIs (which exist but are not widely used). The database format is proprietary, and integration with modern data stacks requires custom work.

Aries (Halliburton)

Aries is PHDWin's primary competitor for petroleum economics and reserves evaluation. It offers similar capabilities -- decline curve analysis, economic modeling, reserves reporting -- with a different data architecture and user experience. Aries uses a SQL Server backend, which makes it somewhat more accessible for integration with other systems compared to PHDWin's proprietary format.

The PHDWin vs. Aries choice is often determined by organizational history and preference rather than objective technical superiority. Both are mature, capable tools. PHDWin has a slightly larger market share in North American independents; Aries has strength in companies with existing Halliburton software relationships.

OFM -- Oil Field Manager (SLB)

OFM occupies a different niche from PHDWin and Aries. While those tools focus on economics and reserves, OFM focuses on production surveillance and analysis: decline curves, rate plots, bubble maps, cross-plots, and ad-hoc production data analysis. OFM is the tool a production engineer uses to look at how wells are performing, not the tool a reserves engineer uses to book reserves.

OFM connects to various data sources (including PI, SQL Server databases, and flat files) and provides a flexible environment for production data visualization and analysis. It has been a staple of production engineering workflows for decades, though its age shows in the interface and architecture.

SLB has been migrating OFM functionality into its cloud-based Delfi platform, but the desktop version of OFM remains widely used, particularly among operators who are not yet on Delfi.

Where Production Databases Fit in the Modern Stack

The key limitation of all three tools -- PHDWin, Aries, and OFM -- is that they are application databases, not analytical databases. They store data in formats optimized for their specific applications, not for cross-functional analytics. Getting PHDWin reserves data, OFM production data, and PI sensor data into a single query requires either manual exports and spreadsheet reconciliation (the current state for most operators) or a purpose-built data integration layer.

This is exactly the problem that cloud data warehouses and modern data platforms are designed to solve, which is why the industry is gradually moving production database data into centralized analytical stores.


Relational Databases in Oil and Gas

Microsoft SQL Server

SQL Server is the workhorse relational database of the upstream oil and gas industry. It runs behind production accounting systems, land management databases, regulatory reporting systems, well databases, and countless custom applications built by IT departments over the past two decades.

Devon Energy uses SQL Server as a core part of its data infrastructure. Matador Resources built SiteView, its custom production monitoring system, on SQL Server. Hundreds of mid-size operators run their production accounting, regulatory reporting, and well management on SQL Server databases.

SQL Server's ubiquity in oil and gas stems from Microsoft's dominance in enterprise IT. Operators already have Windows servers, Active Directory, and SQL Server licenses as part of their Microsoft enterprise agreements. The database is familiar to IT staff, supported by a large ecosystem of tools and consultants, and good enough for most transactional workloads.

The limitation is scale and analytical performance. SQL Server can handle the transactional workloads of production accounting and well management, but it was not designed for the analytical workloads that modern data teams need: scanning billions of rows of time-series data, performing complex aggregations across thousands of wells, or supporting concurrent analytical queries from dozens of users.

PostgreSQL

PostgreSQL is the open-source relational database that is gaining ground in oil and gas, particularly in modern data stacks and cloud-native applications. Its appeal is straightforward: enterprise-grade capabilities, no licensing cost, excellent extension ecosystem (PostGIS for spatial data, TimescaleDB for time-series), and strong community support.

PostgreSQL appears in newer applications: custom web-based production dashboards, cloud-native microservices, geospatial data management, and as the backend for open-source tools. For small operators building a data stack from scratch, PostgreSQL offers relational database capabilities without the SQL Server or Oracle licensing costs.

The practical limitation for PostgreSQL in oil and gas is ecosystem support. Most petroleum engineering software assumes SQL Server or Oracle as the backend. PHDWin, OFM, and most production accounting systems do not natively connect to PostgreSQL. Using PostgreSQL in an oil and gas environment typically means using it for new, custom applications rather than as a replacement for existing vendor databases.


Cloud Data Warehouses: The Modern Analytical Layer

Cloud data warehouses represent the most significant shift in oil and gas data architecture in the past decade. They solve the fundamental problem that relational databases and historians do not: providing a single, scalable, SQL-queryable store where data from all sources -- SCADA, production accounting, economics, land, ERP -- can be combined for analytics.

Snowflake

Snowflake's most significant oil and gas reference is ExxonMobil. According to published reports, Snowflake sits at the heart of ExxonMobil's data ecosystem strategy, serving as the centralized data hub that aggregates data from across the company's global operations. For an industry that pays attention to what the supermajors do, ExxonMobil's choice of Snowflake as its central data platform is a powerful signal.

Peloton, the well lifecycle data management company, also delivers oil and gas well data through Snowflake's data marketplace, making production data, drilling reports, and well status information available as shared datasets.

Snowflake's strengths for oil and gas:

  • Separation of storage and compute. Data engineers can load data without worrying about query performance, and analysts can run heavy queries without affecting data loading. This is important in oil and gas where data ingestion is continuous (sensor data) but analytical queries are bursty (morning report runs, weekly reviews).
  • SQL interface. Petroleum engineers and data analysts already know SQL. Snowflake does not require learning a new query language.
  • Data sharing. Snowflake's data sharing and marketplace features enable operators to share data with service companies, regulators, and partners without file exports.
  • Semi-structured data support. Well logs, completion records, and sensor metadata often arrive as JSON, XML, or other semi-structured formats. Snowflake handles these natively.

Databricks SQL and the Lakehouse

Databricks has gained significant traction in oil and gas, with confirmed deployments at Permian Resources, Devon Energy, and Shell. The Databricks lakehouse architecture -- combining data lake storage (Delta Lake on cloud object storage) with SQL query capabilities -- offers an alternative to Snowflake that is better suited for organizations that also need data engineering (ETL/ELT), machine learning, and streaming workloads alongside SQL analytics.

Permian Resources' stack is particularly instructive: Databricks for the lakehouse, Dagster for orchestration, dbt for transformations, Spotfire and Power BI for visualization. This is a modern data stack that would be familiar to a Silicon Valley data team, deployed by a Permian Basin oil company. It works, and it represents where progressive mid-size operators are heading.

The Databricks vs. Snowflake choice for oil and gas comes down to workload mix. If the primary need is SQL analytics and data sharing, Snowflake is simpler. If the organization also needs data engineering, ML model training, and streaming workloads, Databricks offers a more unified platform. Both are legitimate choices, and some large operators use both for different workloads.

Azure Synapse Analytics

Azure Synapse is Microsoft's cloud data warehouse, integrated with the Azure ecosystem and Power BI. For operators already committed to Azure (and 57% of oil and gas operators are, per Kimberlite survey data), Synapse offers a natural extension of their existing cloud infrastructure. The integration with Power BI -- which is rapidly becoming the default enterprise BI tool in oil and gas -- is Synapse's primary advantage over Snowflake or Databricks for Microsoft-heavy shops.

Google BigQuery and Amazon Redshift

BigQuery and Redshift have smaller footprints in upstream oil and gas. BigQuery appears in some midstream and downstream deployments; Redshift is used by AWS-native operators. Neither has established the reference customer presence that Snowflake and Databricks have in the upstream space.


Time-Series Databases: When a Historian Is Not Enough

Process historians like AVEVA PI were built in an era when sensor data arrived at relatively low frequencies (seconds to minutes) from dozens to hundreds of sensors per facility. Modern upstream operations generate data at higher frequencies, from more sensors, and increasingly need to combine sensor data with non-time-series data in ways that historians were not designed to support.

Time-series databases -- purpose-built for high-frequency, high-volume time-stamped data -- offer an alternative or complement to traditional historians.

InfluxDB

InfluxDB is the most widely deployed open-source time-series database and has a growing presence in oil and gas IoT applications. Terega, the French gas transport operator, built a cloud-native historian on InfluxDB to replace legacy infrastructure. InfluxDB's strengths are fast ingestion rates, flexible schema (no need to pre-define tags), and a SQL-like query language (Flux, now transitioning to SQL in InfluxDB 3.0).

For operators building new monitoring systems -- particularly for IoT sensors, emissions monitoring, or edge computing -- InfluxDB offers a modern alternative to PI without the licensing cost or architectural baggage.

TimescaleDB

TimescaleDB is a PostgreSQL extension that adds time-series capabilities to the world's most popular open-source relational database. This is significant for oil and gas because it means you can store time-series sensor data and relational production data in the same database, queryable with standard SQL.

Consider a common query: "Show me the average wellhead pressure for wells in the Delaware Basin that produced more than 500 BOE/d last month." In a PI + SQL Server environment, this query requires pulling time-series data from PI, production volumes from the SQL Server production database, and joining them manually (often in a spreadsheet). In a TimescaleDB environment, both datasets can live in the same PostgreSQL database and be joined with a single SQL query.

When to Use a Time-Series Database vs. a Historian

The practical guidance:

Keep using PI if you have an existing PI deployment, your sensor data needs are standard (pressures, temperatures, flow rates at second-to-minute intervals), and your primary consumers are engineers using PI-native tools (Seeq, PI Vision, AF analytics).

Add a time-series database if you need to combine sensor data with non-time-series data in SQL queries, if you are building new monitoring systems (emissions, IoT, edge), if you need a historian without PI's licensing cost, or if your data volumes and query patterns exceed what PI handles efficiently.

Use a cloud data warehouse instead if your primary need is historical analytics rather than real-time operational monitoring. Snowflake and Databricks can handle time-series data at analytical query latency (seconds) even if they cannot match a historian's real-time ingestion and sub-second query performance.


The Historian vs. Data Warehouse Debate

This is the single most common architectural misunderstanding in oil and gas data infrastructure: treating the PI historian as a data warehouse.

The pattern looks like this: An operator deploys PI to collect SCADA data. Engineers start building PI ProcessBook displays and PI Vision dashboards to monitor operations. Managers want KPI dashboards, so IT builds Power BI reports that query PI directly. Someone wants to correlate production data with weather data, so they write a custom application that pulls from PI, merges with a weather API, and outputs to Excel. Over time, PI becomes the de facto data warehouse -- the system everything queries for operational data.

This works until it does not. PI queries slow down as the number of concurrent analytical consumers grows. Complex queries that join PI data with external datasets require fragile custom code. PI's tag-based data model does not map well to the dimensional model that BI tools expect. Engineers who need historical data for machine learning projects find that PI's compression has discarded the high-frequency detail their models need.

The solution is not to replace PI. PI is excellent at what it does: ingesting, compressing, and serving real-time and near-real-time operational data. The solution is to stop asking PI to be something it is not. The modern architecture uses PI as the operational data store (what happened in the last 24 hours, is this well flowing normally right now) and a cloud data warehouse as the analytical data store (what was the average uptime last quarter, how does this well's performance compare to offsets, what factors correlate with ESP failures).

This means building a data pipeline from PI to the data warehouse -- extracting data from PI, transforming it into an analytical model, and loading it into Snowflake, Databricks, or Synapse. AVEVA provides PI Cloud and AVEVA CONNECT for cloud integration, and third-party tools like Seeq, Cognite Data Fusion, and custom Kafka-based pipelines can bridge the gap.

The operators who get this right treat PI as the operational layer and the cloud data warehouse as the analytical layer. The operators who struggle are the ones who try to make PI do both.


Cloud vs. On-Prem in 2026: Where the Industry Actually Is

The cloud migration narrative in oil and gas has been running for nearly a decade. Here is where things actually stand in 2026:

What Has Moved to the Cloud

Analytics and BI. Power BI, Spotfire Cloud, and cloud data warehouses (Snowflake, Databricks) are increasingly cloud-hosted. This is the workload that has migrated most completely, because it does not have real-time latency requirements and benefits from cloud scalability.

Data lakes and long-term storage. Azure Data Lake Storage Gen2 (the dominant choice), AWS S3, and Google Cloud Storage serve as the landing zone for data that was previously stuck on on-prem file servers and NAS devices. Seismic data, well logs, historical production data, and engineering documents are migrating to cloud object storage.

SaaS applications. Cloud SCADA (Quorum zdSCADA, eLynx), cloud production monitoring (OspreyData, Ambyint), and cloud engineering tools (Corva, whitson) are SaaS-native. New deployments are overwhelmingly cloud.

AI/ML workloads. Training machine learning models on sensor data, production data, and geological data requires compute that is easier to provision in the cloud. Databricks, Azure ML, and AWS SageMaker are the platforms operators use for ML, and they are all cloud services.

What Is Staying On-Prem (And Why)

SCADA and control systems. Real-time process control requires sub-second latency and cannot depend on internet connectivity. SCADA host systems, DCS controllers, and safety systems remain on-prem (or at the edge) and will stay there. This is not conservatism -- it is physics and safety.

Process historians. PI historian servers remain on-prem at most operators, even as AVEVA pushes PI Cloud. The reasons are practical: the historian needs to be close to the SCADA system for reliable data ingestion, historical data volumes are large enough that cloud egress costs matter, and IT teams are cautious about moving operational infrastructure to the cloud. Hybrid deployments (on-prem PI with cloud replication) are the emerging pattern.

ERP and sensitive data. SAP and Oracle ERP migrations are multi-year, multi-million-dollar projects. Some operators keep certain data on-prem for data sovereignty, regulatory compliance, or security policy reasons.

The Practical Reality

McKinsey's 2024 finding that 70% of oil and gas digital transformation initiatives remain stuck in pilot phase tells the real story. Most mid-size operators have not completed a full cloud migration. They have a hybrid environment: on-prem SCADA and historians, some cloud storage, a few SaaS applications, and a lot of manual data movement between systems.

The trend is clearly toward cloud, but the timeline is years, not months. The operators who are farthest along (ExxonMobil with Snowflake, Equinor with Azure Omnia, Permian Resources with Databricks) are the exceptions, not the norm.


Database Choices by Company Size

Supermajors (ExxonMobil, Chevron, Shell, BP, Equinor)

The supermajor stack is extensive:

  • Process historian: AVEVA PI System (near-universal), often with massive deployments spanning thousands of facilities
  • Cloud data warehouse: Snowflake (ExxonMobil), Azure Databricks (Shell), Azure-based custom platforms (Equinor Omnia)
  • Data platform: Cognite Data Fusion (Equinor, Aramco), Palantir Foundry (BP, ExxonMobil)
  • ERP: SAP S/4HANA or Oracle (enterprise-wide)
  • Standards: OSDU adoption leaders, pushing for data interoperability
  • Budget: $100M to $1B+ annual digital/IT spend
  • Staff: In-house data science teams of 100-500+ people

Large Independents (Devon, EOG, Diamondback, ConocoPhillips)

  • Process historian: AVEVA PI System
  • Relational database: SQL Server (Devon confirmed), SAP HANA (Devon, Diamondback)
  • Cloud data warehouse: Databricks (Devon confirmed), Snowflake (some)
  • BI: Power BI + Spotfire
  • Engineering: PHDWin or Aries, OFM, WellView
  • Budget: $20-100M annual digital/IT spend

Devon's stack (Databricks, SQL Server, SAP HANA) reflects a transitional architecture -- modern cloud analytics layered on top of established on-prem systems.

Mid-Size Operators (Permian Resources, Matador, Crescent, Ring)

  • Process historian: AVEVA PI (some), or no historian at all
  • Relational database: SQL Server (common), sometimes custom (Matador's SiteView)
  • Cloud data warehouse: Ranges from Databricks (Permian Resources) to none
  • BI: Spotfire and/or Power BI, heavy Excel usage
  • Engineering: PHDWin, Enverus, WellView, Corva
  • Budget: $5-20M annual IT spend

The spread within this category is enormous. Permian Resources runs a modern data stack (Databricks, Dagster, dbt) that would not be out of place at a tech company. Matador runs SiteView on SQL Server with a "CTO" who is actually a reservoir engineer. Most mid-size operators are closer to Matador's profile than Permian Resources'.

The typical mid-size operator stack in 2026 looks like this:

[Basic SCADA or manual gauging]
    → [PI historian on-prem (maybe)]
    → [SQL Server production database]
    → [Spotfire + Excel for analysis]
    → [Manual morning report via email]

The modern target state looks like this:

[Cloud SCADA (zdSCADA/eLynx)]
    → [PI historian or cloud ingestion]
    → [Cloud data warehouse (Databricks/Snowflake)]
    → [dbt for transformations]
    → [Spotfire + Power BI + Grafana]
    → [Automated alerts and reports]

Small Operators (100-500 Wells)

  • Process historian: None
  • Database: Excel. Maybe SQL Server. Maybe Access (seriously).
  • Cloud data warehouse: None
  • SCADA: eLynx ($10/asset/month) or manual gauging
  • Production: GreaseBook (lowest cost), OGsys
  • Economics: PHDWin or spreadsheets
  • BI: Excel, maybe Power BI ($10/user/month)
  • Budget: $100K-$2M annual IT spend

For small operators, the database question is not "Snowflake vs. Databricks" -- it is "can we stop running our production data in Excel." The affordable path forward is: eLynx for SCADA, PostgreSQL for a production database (free), Power BI for dashboards ($10/user/month), and Enverus for public well data (subscription). Total cost: under $50K/year for a meaningful upgrade from the spreadsheet-driven status quo.


Integration Patterns: Getting Data Where It Needs to Be

The most valuable -- and most underestimated -- part of oil and gas data architecture is the integration layer. Getting data from where it lives (historians, production databases, engineering tools) to where it needs to be (analytics platforms, AI models, dashboards) is where most projects fail.

PI to Cloud Data Warehouse

This is the most common integration pattern for operators with existing PI deployments:

AVEVA PI Cloud / CONNECT. AVEVA's own cloud offering replicates PI data to the cloud. This is the path of least resistance for operators who want to keep PI as the operational layer and add cloud analytics on top. The limitation is that it keeps you in the AVEVA ecosystem.

Cognite Data Fusion extractors. Cognite provides PI extractors as part of its data platform. This is the right choice if you are adopting Cognite as your data contextualization layer (Equinor, Aramco, and others have taken this path).

Custom Kafka/MQTT pipeline. For operators who want full control, a Kafka-based pipeline that reads from PI (via PI Web API or AVEVA Adapters), transforms the data, and writes to Snowflake or Databricks is the most flexible option. This requires data engineering expertise but avoids vendor lock-in.

Seeq as a bridge. Seeq connects natively to PI and can push analysis results and cleansed data to cloud destinations. This is useful if Seeq is already in the analytics stack.

SCADA to Cloud Pipeline

For operators with cloud-native SCADA (zdSCADA, eLynx) or MQTT-enabled field devices:

[Field sensors/RTUs]
    → [MQTT broker (HiveMQ/EMQX/Azure IoT Hub)]
    → [Stream processing (Kafka/Event Hubs)]
    → [Cloud data warehouse (Databricks/Snowflake)]

This pattern bypasses the traditional historian entirely, landing SCADA data directly in a cloud-native store. It is simpler and cheaper than PI for operators who do not have PI and do not need its specific capabilities (compression, store-and-forward, AF hierarchy).

Production Database to Analytics Layer

Getting data out of PHDWin, Aries, OFM, and production accounting systems into a central analytics store:

PHDWin and Aries typically require scheduled exports (CSV, database dumps) or API-based extraction. Neither tool was designed for real-time data sharing, so batch extraction (nightly or weekly) is the practical approach.

OFM can connect to various data sources but is primarily a consumer, not a provider. Getting OFM data into a cloud warehouse typically means extracting from OFM's source databases (SQL Server, PI) rather than from OFM itself.

Production accounting systems (Quorum, P2 Energy Solutions, Enertia) generally have SQL Server backends that can be extracted via standard database replication or ETL tools.

The pattern for all of these is:

[Source application database]
    → [ETL tool (Dagster/Airflow/Azure Data Factory)]
    → [Staging area in cloud storage]
    → [dbt transformation]
    → [Cloud data warehouse analytical tables]

Enverus and Novi Labs: Where Third-Party Analytics Databases Fit

Not all data lives inside the operator's infrastructure. Third-party analytics databases provide curated, enriched data from public sources and proprietary models.

Enverus (Formerly DrillingInfo)

Enverus is the dominant provider of public well data, production data, permits, land records, and market intelligence for the upstream oil and gas industry, serving more than 300 financial institutions and 1,000+ operators. Enverus aggregates data from state regulatory agencies, FracFocus, county records, and proprietary sources into a searchable, queryable database.

For operators, Enverus serves as the benchmarking and market intelligence layer -- comparing your wells' performance against offsets, evaluating acquisition targets, scouting competitor activity, and tracking drilling permits. The data is available through web applications, APIs, and increasingly through data feeds that can be integrated into operators' own data warehouses.

Enverus fits in the modern stack as an external data source that feeds into the cloud data warehouse alongside internal production and operational data. The integration pattern is typically API-based extraction on a daily or weekly schedule.

Novi Labs

Novi Labs occupies a specialized niche: AI-driven production forecasting and completions optimization, trusted by Shell, ExxonMobil, Chevron, and Devon among others (the company raised $35 million in Series funding in June 2025). Novi's platform uses machine learning models trained on large public datasets to predict well performance based on geological and completions parameters.

Novi is not a database in the traditional sense -- it is an analytics platform with its own curated dataset. But it functions as an external analytical database that operators query for well performance predictions, acreage quality assessments, and completions benchmarking. Novi's data is complementary to an operator's internal data: Novi provides basin-wide statistical models while the operator's internal data provides proprietary operational context.

How Third-Party Data Fits the Architecture

The modern pattern is to treat third-party data sources as additional inputs to the cloud data warehouse:

[Enverus API] ──→ [Cloud Data Warehouse] ←── [Internal SCADA/PI data]
[Novi Labs]   ──→         ↓               ←── [Internal production data]
[WellDatabase]──→ [Unified analytical layer]←── [Internal land/ERP data]
                          ↓
                 [BI tools, ML models, reports]

This unified view -- internal operational data combined with external market intelligence and AI-driven predictions -- is where operators extract the most analytical value. But it requires the cloud data warehouse as the integration point, which brings us back to the central argument of this article: the storage and query layer matters, and getting it right is the prerequisite for everything else.


Practical Recommendations

If you have PI, do not rip it out. PI is excellent at operational data storage. Instead, build a pipeline from PI to a cloud data warehouse for analytical workloads. Stop querying PI directly from BI tools for anything beyond real-time operational dashboards.

If you do not have PI, think carefully before buying it. For new deployments, especially at mid-size and small operators, a cloud-native SCADA-to-cloud pipeline (MQTT to Kafka/Event Hubs to Snowflake/Databricks) may be simpler, cheaper, and more analytically capable than deploying PI. PI's value is in its installed base and integration ecosystem, not in its architecture.

Pick Snowflake or Databricks, not both (at first). If your primary need is SQL analytics and you do not have a strong data engineering team, start with Snowflake. If you need data engineering, ML, and analytics on one platform and you have Python-literate staff, start with Databricks. You can always add the other later.

Do not underestimate PostgreSQL. For small operators who need a relational database without licensing costs, PostgreSQL with the TimescaleDB extension gives you both relational and time-series capabilities in a single, free database. It will not replace PI at a supermajor, but it can replace SQL Server and a time-series database at a 200-well operator.

Invest in the integration layer, not just the storage layer. The database you choose matters less than the data pipelines that feed it. Budget at least as much for ETL/ELT tooling (Dagster, Airflow, dbt, Azure Data Factory) as you do for the database itself.

Start with the morning report. Every operator produces a daily production report. Automating that report -- from data ingestion through transformation to dashboard delivery -- is the fastest way to prove the value of a modern data architecture. It touches SCADA data, production volumes, artificial lift status, and field operations data. If your data architecture can automate the morning report, it can support everything else.

The storage and query layer is not the most exciting part of oil and gas data architecture. Nobody puts "migrated from PI to Snowflake" on a conference slide. But it is the foundation that everything else -- analytics, AI, automation, decision support -- depends on. Getting the database architecture right does not guarantee success, but getting it wrong guarantees failure.


Need help selecting and integrating database infrastructure for your operations? Get in touch.

Talk to an Expert

Book a Free 30-Min Consultation

Discuss your operational challenges with our team of petroleum engineers and data scientists. No sales pitch — just honest technical guidance.

Book Your Free Consultation →