What a Stanford PhD Learned Building AI for Oil & Gas

Editorial disclosure

This is a personal perspective piece. The views expressed here are my own, informed by my academic research, industry experience, and work building AI solutions for the upstream oil and gas industry. No vendor, university, or organization reviewed or influenced this content.

I defended my PhD at Stanford in energy systems optimization, and within six months I understood that almost nothing I learned in graduate school had prepared me for the actual work of deploying AI in the oilfield.

That is not a criticism of Stanford. It is a statement about the distance between academic research and industrial reality -- a distance that is, in petroleum engineering, wider than most people realize. I have spent the years since founding Groundwork Analytics in 2018 trying to bridge that gap, and the lessons have been humbling, surprising, and occasionally painful.

This article is not a sales pitch. It is a reflection on what I have learned, written for the PE students, faculty, and early-career engineers who are considering the AI path. I wish someone had written something like this when I was starting out.

What a Stanford PhD in Energy Systems Optimization Actually Means

When people hear "Stanford PhD," they hear prestige. When they hear "energy systems optimization," they usually nod politely and change the subject. So let me be specific about what the work actually involved.

My research focused on optimization under uncertainty for petroleum reservoir systems. In practical terms, that means: given a reservoir with uncertain properties (permeability, porosity, fluid saturations), how do you make well placement and completion design decisions that perform well across a range of possible reservoir realizations? The tools are computational -- ensemble-based optimization, proxy modeling, sensitivity analysis -- but the problems are physical. You are optimizing real systems governed by real physics: multiphase flow through porous media, wellbore hydraulics, surface facility constraints.

The work I published through SPE on prescriptive analytics for well completion design sits at the intersection of reservoir simulation, statistical modeling, and decision theory. It is intellectually demanding. It involves serious mathematics. And it produces results that, in a controlled academic setting, are genuinely useful.

The key phrase there is "in a controlled academic setting."

In graduate school, you work with simulation models that have been carefully constructed. The grid is well-defined. The boundary conditions are known. The fluid properties come from a PVT report that has already been quality-checked. You run your optimization algorithm, produce beautiful convergence plots, and write a paper.

In the oilfield, the data is late, incomplete, contradictory, and sometimes just wrong. The reservoir model was built three years ago by an engineer who has since left the company. The completion data lives in a PDF that was scanned from a handwritten field report. The production data has a six-month gap because someone changed the SCADA configuration and forgot to tell anyone.

That is the gap. Not a gap in mathematical sophistication -- a gap in operational reality.

What Surprised Me About the Industry

The Physics Is Genuinely Hard

There is a narrative in Silicon Valley that oil and gas is a "legacy" industry, waiting to be disrupted by smart technologists who will bring modern AI to bear on antiquated problems. I bought into that narrative more than I should have during graduate school.

The reality is that petroleum engineering involves some of the most challenging physics in any engineering discipline. Multiphase flow in fractured porous media. Geomechanics coupled with fluid flow at reservoir pressures and temperatures. The thermodynamics of hydrocarbon mixtures under conditions that would make a chemical engineer nervous. The uncertainty quantification required when your measurement points are spaced a mile apart and your reservoir is three miles underground.

The domain experts in this field are not waiting for someone to explain gradient descent to them. Many of them have more mathematical depth than the data scientists who show up to "disrupt" their workflow. What they lack is not intelligence or sophistication -- it is time, modern software tools, and organizational support for adopting new computational methods.

Understanding this changed how I approach every project. I stopped trying to bring AI to petroleum engineers and started trying to build AI with petroleum engineers. The distinction matters more than anything else I have learned.

Oilfield Data Is a Different Animal

In graduate school, I worked with datasets that were clean by design. Simulation output is inherently structured. Even when you add noise for uncertainty quantification, the noise is well-characterized.

Oilfield data is not like this. I have written extensively about the data quality challenges in production operations and drilling data management, but let me give you the visceral version.

On one of my early projects, I spent three weeks building a production forecasting model. The model performed beautifully on the training data. When I deployed it on new wells, the predictions were nonsensical. The problem was not the model. The problem was that the production data for the training wells had been manually entered by a pumper who used a consistent convention for reporting -- and the new wells were reported by a different pumper who used a different convention. Same field. Same operator. Same database. Completely different data semantics.

This is not an edge case. This is Tuesday. The SCADA data quality reality across the industry is that 20-40% of sensor readings at any given time are either missing, stale, or unreliable. Building AI on top of that requires a level of data engineering rigor that no academic program teaches.

Trust Is Everything

The most technically elegant model in the world is worthless if the field engineer does not trust it. And field engineers have good reasons to be skeptical. They have seen vendor demos where the AI performs miracles on cherry-picked datasets. They have seen "predictive analytics" tools that generate false alarms every hour until someone turns them off. They have been burned.

Building trust requires three things that have nothing to do with algorithm selection:

First, transparency. The model needs to explain why it is making a recommendation, in terms that map to the engineer's mental model of the system. "The model predicts ESP failure in 72 hours because motor temperature has been trending 15 degrees above baseline for the past week" builds trust. "The model's anomaly score exceeded the threshold" does not.

Second, humility. The model needs to know when it does not know. Calibrated uncertainty intervals are not a nice-to-have -- they are a requirement. An engineer will trust a model that says "I am 60% confident this well needs intervention" far more than one that says "this well will fail" with false certainty.

Third, track record. The model needs to be right enough, often enough, on enough wells, that the engineer's personal experience confirms the tool's value. This takes months. There is no shortcut.

Lessons Learned Building AI for Oil & Gas

Physics-Informed Models Beat Pure ML for Most PE Problems

This is the hill I will die on, and I have the scars to prove it.

Early in my career, I got caught up in the excitement around deep learning. If a neural network could classify images and translate languages, surely it could predict well performance, right? So I built pure ML models for production forecasting. They fit the training data beautifully. They generalized poorly. They violated material balance. They produced physically impossible negative flow rates on edge cases.

The problem is fundamental. Pure ML models learn statistical correlations from data. They do not know that oil cannot flow uphill without energy input. They do not know that cumulative production cannot exceed the original oil in place. They do not know that decline rates are bounded by physical constraints.

I have written about this in detail in my article on physics-informed approaches to decline curve analysis. The short version: for most petroleum engineering problems, the right approach is to use physics to define the structure of the model and machine learning to calibrate the parameters. The physics provides the guardrails. The ML provides the flexibility. Together, they produce models that are both accurate and trustworthy.

This is not a novel insight -- the research community has been moving in this direction for years. But the gap between "physics-informed ML is better" as an academic statement and "here is how you actually implement physics-informed ML on messy oilfield data with incomplete boundary conditions" as a practical reality is enormous. Bridging that gap is most of what Groundwork Analytics does.

The Last Mile Problem Is Real

In my experience, getting a model from a Jupyter notebook to a production deployment that actually runs reliably on an operator's infrastructure accounts for 70-80% of the total project effort. The modeling itself -- the part that gets published in SPE papers and presented at conferences -- is maybe 20%.

The last mile includes:

Data pipeline engineering. Getting data from SCADA, production databases, drilling systems, and completion records into a format the model can consume. Reliably. Every day. Without manual intervention.
Edge case handling. What does the model do when it receives null values? What about when a well comes back online after a workover and the historical pattern no longer applies? What about when someone changes the sensor configuration?
Monitoring and retraining. Models drift. The reservoir changes. New wells come online in areas with different geology. You need infrastructure to detect when predictions are degrading and retrain the model with updated data.
User interface. The field engineer does not interact with a Jupyter notebook. They need a dashboard, alerts, and recommendations integrated into the tools they already use.

None of this is glamorous. None of it gets published. All of it is essential. The production operations software landscape is littered with AI tools that demonstrated impressive results in a pilot and then died in full deployment because the last mile was not solved.

Data Integration Is the Real Bottleneck

Ask an AI vendor what the biggest challenge in deploying AI for oil and gas is, and they will probably talk about model accuracy or compute requirements. Ask an operator, and they will say data.

Not data volume -- operators have plenty of data. Data integration. Getting drilling data, completion data, production data, geological data, and facility data to talk to each other. These systems were built by different vendors, at different times, with different data models, and they do not naturally interoperate.

I have seen operators where the drilling data lives in a Pason EDR, the completion data is in a spreadsheet, the production data is in an OFM database, the geological data is in a Petrel project, and the facility data is in an Emerson SCADA system. Building an AI model that uses all five data sources requires building custom integrations for each one. That takes months of engineering effort before you even start on the AI.

This is why I believe the Model Context Protocol is the most important technology development for AI in oil and gas in the past several years. Not because MCP is technically revolutionary -- it is a protocol specification, not a breakthrough algorithm. But because it standardizes the interface between AI systems and data sources. Build an MCP server for your production database once, and every AI agent can use it. Build an MCP server for your WITSML drilling data once, and it works with any LLM-based tool.

The integration problem is the problem. MCP is the infrastructure layer that makes the solution scalable.

The Industry Needs Bridges, Not Disruptors

Every year, I see new startups announce that they are going to "disrupt" oil and gas with AI. Most of them are gone within two years.

The ones that survive are the ones that understood something fundamental: this industry does not need disruption. It needs bridges. Bridges between legacy systems and modern analytics. Bridges between domain expertise and computational methods. Bridges between the field and the cloud. Bridges between the engineer who has 30 years of experience watching pumping unit cards and the data scientist who can build a convolutional neural network.

Building bridges is slower and less exciting than disruption. It requires patience, humility, and a genuine respect for the knowledge that already exists in the industry. It means sitting with a production engineer and understanding their workflow before suggesting changes. It means building tools that fit into existing processes rather than demanding that everyone adopt a new platform.

This is not the kind of work that gets venture capital excited. But it is the kind of work that actually delivers value.

Why I Built petro-mcp as Open Source

In 2025, I released petro-mcp, an open-source MCP server for petroleum engineering data. It provides standardized tools for accessing production data, well information, decline curve analysis, and other common PE data operations through the Model Context Protocol.

The decision to make it open source was deliberate, and it reflects several convictions I have developed over the years.

The PE Python Ecosystem Is Small and Fragmented

If you work in data science outside of oil and gas, you take for granted that there are mature, well-maintained open-source libraries for everything. pandas for data manipulation. scikit-learn for machine learning. matplotlib for visualization. These tools have thousands of contributors, extensive documentation, and active communities.

The petroleum engineering Python ecosystem has nothing comparable. There are scattered libraries for specific tasks -- reading LAS files, basic decline curve fitting, PVT calculations -- but nothing approaching the breadth and maturity of what exists in other domains. The upstream software landscape is dominated by proprietary tools with closed data formats and limited APIs.

This fragmentation hurts everyone. Every PE data scientist reinvents the same wheel. Every operator builds custom scripts for the same data access patterns. The collective effort wasted on duplicated work is staggering.

petro-mcp is a small step toward changing that. By providing a standardized, open-source foundation for PE data access, it gives the community something to build on rather than starting from scratch.

Open Source Builds Trust

In an industry that has been burned by proprietary vendor lock-in repeatedly, open source is a statement. It says: here is the code. Read it. Audit it. Understand exactly what it does. If you do not like something, change it. If you find a bug, fix it. You are not dependent on my release schedule or my business decisions.

For a small company like Groundwork Analytics, open source also serves a practical purpose. It demonstrates competence more credibly than any marketing material. You can read the code and judge for yourself whether we know what we are doing.

The Real Value Is What You Build on Top

The MCP server itself is infrastructure. It is plumbing. The value is in the AI agents, workflows, and applications that become possible when the data access layer is solved.

When an operator can deploy an AI agent that automatically pulls production data, runs decline curve analysis, identifies underperforming wells, and generates a prioritized workover candidate list -- all through standardized MCP tool calls -- the value of the infrastructure becomes obvious. But the infrastructure has to exist first.

I would rather give away the plumbing and build the applications than try to sell the plumbing as a product. The agentic AI opportunity in upstream oil and gas is large enough that commoditizing the infrastructure layer makes strategic sense.

Advice for Students and Early-Career Engineers

I get emails from PE students and young engineers asking about the AI path. Here is what I tell them.

The Combination Is Rare and Valuable

The number of people who deeply understand petroleum engineering and can write production-quality code for data science applications is vanishingly small. I do not say this to be elitist -- I say it because it is a market reality.

Most data scientists who enter oil and gas do not have the domain knowledge to ask the right questions. They build models that are statistically impressive and operationally useless. Most petroleum engineers who want to use AI do not have the software engineering skills to build robust, deployable solutions. They build prototypes in Excel that cannot scale.

If you can do both -- if you can look at a production dataset and simultaneously see the engineering physics and the statistical patterns -- you are extraordinarily valuable. The skills gap in petroleum engineering is real, and it is not closing quickly.

You Do Not Need a PhD

I have a PhD, and I would not trade the experience. The depth of thinking, the rigor of methodology, the experience of pushing the boundary of human knowledge on a narrow problem -- these shaped how I approach every project.

But you do not need a PhD to do this work. You need three things:

Curiosity about both domains. Not just tolerance -- genuine curiosity. You need to find reservoir simulation interesting and gradient descent interesting. If you only care about one side, you will always be limited.

Willingness to be bad at something. If you are a PE, your first Python code will be terrible. If you are a data scientist, your first reservoir engineering analysis will be embarrassing. That is fine. Competence comes from persistence, not talent.

A portfolio of real work. Not Kaggle competitions. Not course projects. Real work on real oilfield data, solving real problems. Contribute to an open-source project. Build a tool that solves a problem you encountered in a field study. Write an article analyzing a real dataset. The breaking into oil and gas guide I wrote covers this in more detail, but the short version is: demonstrated ability to solve real problems beats credentials every time.

The Industry Needs You

This is not cheerleading. It is arithmetic.

The AI opportunity in oil and gas is measured in billions of dollars. The number of people who can capture that opportunity -- who can build, deploy, and maintain AI systems that work in the operational reality of the oilfield -- is measured in hundreds, maybe low thousands. The demand-supply mismatch is enormous.

The major service companies -- SLB, Halliburton, Baker Hughes -- are all investing heavily in AI. The operators are hiring data scientists and analytics engineers. The digital oilfield startups need people who can talk to both the engineering team and the software team.

And beyond the large organizations, there is a massive opportunity for mid-size operators who cannot afford enterprise AI platforms but desperately need analytics capabilities. These operators need solutions that are practical, affordable, and built by people who understand their specific challenges. That is a market that is wide open.

The Road Ahead

Eight years after founding Groundwork Analytics, I am more optimistic about AI in oil and gas than I have ever been. Not because the hype is justified -- most of it is not. But because the foundational problems are finally being addressed.

Data integration is getting better, driven by standards like MCP and OSDU. Physics-informed machine learning is maturing from research papers into deployable tools. The talent pipeline is slowly growing as more PE programs incorporate data science into their curricula. And the industry itself is more receptive to computational approaches than it was even five years ago.

The work that remains is unglamorous. It is data engineering. It is building trust with field teams. It is writing robust code that handles the edge cases that academic papers ignore. It is sitting in an operator's office and understanding their workflow before proposing a solution.

It is bridge-building. And it is the most rewarding work I have ever done.

If you are a student or early-career engineer reading this and wondering whether the AI path is worth pursuing -- it is. The problems are real, the impact is tangible, and the field needs more people who can do the work. Reach out if you want to talk about it. I am always happy to help someone who is genuinely curious about this space.

Dr. Mehrdad Shirangi is the founder of Groundwork Analytics and holds a PhD from Stanford University in Energy Systems Optimization. His SPE-published research on prescriptive analytics for well completion design bridges the gap between academic optimization theory and operational petroleum engineering. He founded Groundwork Analytics in 2018 to bring physics-informed AI to the upstream oil and gas industry. Connect on X/Twitter and LinkedIn, or reach out at info@petropt.com.

The Petroleum Engineering Skills Gap -- The structural gap between PE education and industry needs.
Agentic AI for Upstream Oil & Gas -- The broader context for deploying AI agents in upstream operations.
From Spreadsheets to AI Agents -- A practical career roadmap for engineers who want to follow the AI path.

Have questions about this topic? Get in touch.