From Spreadsheets to AI Agents: How Entry-Level Petroleum Engineers Can Future-Proof Their Careers

Editorial disclosure

This article reflects the independent analysis and professional perspective of the author, informed by a decade of experience at the intersection of petroleum engineering, optimization, and AI. No university, operator, or vendor reviewed or influenced this content prior to publication. Where we reference Groundwork Analytics' open-source tools, we say so explicitly.

You passed your reservoir simulation final. You can run a nodal analysis. You know the difference between Arps hyperbolic and harmonic decline. You graduated with a PE degree, and maybe you already landed your first job -- or you are interviewing for one.

Congratulations. You have the foundation. But I need to be honest with you about what comes next.

The petroleum engineering profession is going through a structural transformation that has nothing to do with oil prices. The tools, workflows, and expectations that define what it means to be a "good engineer" at an E&P company are changing faster than any PE curriculum can keep up with. The engineers who recognize this early -- and deliberately build the right skills alongside their domain expertise -- will have career trajectories that look very different from those who do not.

This is not a "learn to code or die" article. It is a concrete, level-by-level roadmap. Five stages, each with specific skills, tools, timelines, and project ideas you can start this weekend using publicly available data. Think of it as the career map a mentor would draw on a whiteboard if you asked: "What should I actually be learning right now?"

The Reality Check: Why This Matters More Than You Think

Before we get to the roadmap, let me lay out the numbers that define the landscape you are entering.

The talent pipeline is shrinking. PE enrollment has dropped roughly 75% since its 2014 peak, yet 90%+ of seniors still have job offers before graduation. That sounds like a seller's market -- and it is, for now. But operators are hiring from a smaller pool, which makes them increasingly selective about which skills those graduates bring.

You are entering the highest-paid engineering discipline for new grads. The average starting salary for PE graduates is approximately $100,750, consistently topping engineering salary surveys. That buys you time to invest in skills that compound over a career. Do not waste it.

The workforce is aging out. The average O&G worker age is 56. Only about 12% are under 30. Industry projections estimate a 40,000-worker shortage in the coming years. Operators do not need to replace the retiring generation one-for-one. They need engineers who can do the work of two or three people, augmented by technology.

The AI skills gap is widening. AI job postings in oil and gas are up 78% year over year, but the AI-skilled talent pool grew only 24%. Only 15% of reservoir engineers frequently use machine learning, and 45% of companies offer zero formal AI or data training.

As I covered in detail in The Petroleum Engineering Skills Gap, the gap between what universities teach and what operators need is structural, not cyclical. The question is whether you close that gap on your own terms, on your own timeline, or whether it closes on you.

Where You Are Now: The Spreadsheet Floor

Let us be honest about the starting point. If you are in your first year or two as a petroleum engineer -- or still in school -- your daily tools probably look something like this:

Excel. Lots of Excel. Production data in Excel. Decline curves in Excel. Economics in Excel. If it exists, someone has put it in a spreadsheet.
Copy-paste workflows. Pulling production data from one system, pasting it into another, formatting it, emailing it. Every morning, every week, every month.
Manual reporting. Daily production reports assembled by hand. Variance reports that take hours. Monthly reviews built on numbers that were stale by the time you present them.
Vendor software you do not fully understand. You click buttons in OFM, PHDWin, or Aries. You know which inputs produce which outputs. But the software is a black box, and you cannot customize it beyond what the menus allow.

There is nothing wrong with this starting point. Every petroleum engineer begins here. The problem is staying here. Every hour copying production data from a SCADA export into an Excel template is an hour you are not spending on actual engineering -- the interpretation, the decisions, the judgment calls that make you valuable.

The engineers who break out of this cycle learn to automate the tedious parts and redirect that time toward higher-value work. That is what the next four levels are about.

Level 1: Python Basics -- Automate the Tedious Stuff

Timeline: 2-4 months of evening/weekend work Goal: Eliminate repetitive manual tasks from your daily workflow

This is where most PE programs are starting to catch up. Texas A&M, Colorado School of Mines, OU, and others are adding data analytics certificates and Python-based courses to their curricula. If your program included one of these, you have a head start. If it did not, you can close the gap in a few months.

What to Learn

Python fundamentals. Variables, data types, loops, functions, file I/O. You do not need to learn computer science theory. You need to learn enough to read a CSV file, manipulate the data, and write the results somewhere useful.

pandas. This is the library that makes Python useful for engineers. It handles tabular data -- the kind of data you work with every day. Loading Excel files, filtering rows, calculating aggregates, joining datasets. If you learn one Python library, make it pandas.

matplotlib and plotly. Visualization. The ability to generate production plots, decline curves, and comparison charts programmatically instead of manually formatting Excel charts.

Basic file handling. Reading and writing CSV, Excel, and JSON files. Interacting with directories. Scheduling scripts to run automatically.

Starter Projects Using Public Data

These projects use freely available data. No proprietary access required.

1. Texas RRC production data automation. The Texas Railroad Commission publishes monthly production data for every well and lease in Texas. Download the production data files, write a Python script that loads them into a pandas DataFrame, filters for a specific operator or county, calculates per-well monthly decline rates, and generates a summary plot. What would take 2 hours in Excel takes 30 seconds once the script is written -- and it runs identically every month.

2. FracFocus completion data analysis. FracFocus publishes chemical disclosure data for hydraulic fracturing operations nationwide. Download a regional dataset, parse the CSV files, and build a summary of completion trends by operator -- total fluid volume, proppant mass, number of stages, average lateral length. Plot these metrics over time to identify how completion designs have evolved.

3. Automated decline curve analysis. Take the Texas RRC production data from Project 1, fit Arps decline models (exponential, hyperbolic, harmonic) to individual well production histories using scipy's curve_fit, and generate a report that compares forecasted vs. actual production. This is a project that directly applies your PE fundamentals through code.

Tools

Python 3.x (install via Anaconda or directly from python.org)
Jupyter Notebooks (for exploration and learning)
VS Code (for writing reusable scripts)
pandas, numpy, matplotlib, scipy (core libraries)

What Success Looks Like

You know you have completed Level 1 when you can look at a repetitive task in your daily workflow and think: "I can write a script for this." Not every time. Not for everything. But for the common patterns -- pulling data, reformatting it, generating a standard plot, calculating a standard metric -- you have the ability to automate instead of manually grinding.

Level 2: Data Engineering -- Understand Where the Data Lives

Timeline: 3-6 months, ideally with on-the-job exposure Goal: Navigate production databases, SCADA systems, and data quality issues

Level 1 gets you automating individual tasks. Level 2 gets you understanding the data infrastructure that underlies your entire operation. This is where most petroleum engineers hit a wall -- not because the concepts are hard, but because nobody teaches them this in school.

What to Learn

SQL. The language of databases. Your company's production data, well headers, completion records, and equipment histories almost certainly live in a relational database somewhere -- even if you only ever see them through a GUI. Learning SQL means you can query that data directly, join tables, aggregate across wells and leases, and get answers to questions that would take hours of manual filtering in Excel.

SCADA systems and historians. SCADA systems collect real-time data from wellsites -- pressures, temperatures, flow rates, pump parameters, tank levels. Understanding how this data flows from the wellsite to the historian to your screen is essential context for any data or AI work. Our article on SCADA Data Quality for AI covers the specific data quality issues you will encounter.

Data quality awareness. The unsexy but critical skill. Real production data is messy. Sensors freeze. Values get recorded in wrong units. Wells go offline and the database records zeros instead of nulls. The difference between an engineer who can work with real data and one who cannot is almost entirely about handling data quality issues.

APIs and data access. State regulatory databases, commodity price feeds, and other sources are increasingly accessible through APIs. Learning to make HTTP requests and parse JSON responses opens up automated data ingestion.

Starter Projects Using Public Data

4. Build a SQL-based well database. Download well header data from a state regulator (Texas RRC, COGCC, or North Dakota's DMR). Load it into a SQLite database using Python. Write SQL queries to answer questions like: How many horizontal wells were permitted in Reeves County in 2025? What is the average initial production rate by operator in the DJ Basin? Which operators have the longest average lateral lengths?

5. Data quality audit on public production data. Take a year of Texas RRC production data and systematically identify data quality issues. How many wells show zero production for more than 3 consecutive months followed by a sudden spike? How many wells have reported production that exceeds their stated allowable? How many records have missing fields? Write a script that generates a data quality report card.

6. Multi-source data integration. Combine well header data from one source, production data from another, and completion data from FracFocus. Join them on API number. Build a unified dataset that lets you answer questions that require information from all three sources -- for example, "What is the correlation between proppant loading per foot and 12-month cumulative production for Bone Spring wells completed in 2024?"

Tools

SQLite (embedded database, no server needed) or PostgreSQL (if you want to learn a production-grade database)
DBeaver or DB Browser for SQLite (GUI tools for exploring databases)
requests (Python library for API calls)
sqlalchemy (Python library for database interaction)

What Success Looks Like

You can trace the path of a production data point from the wellsite sensor to the report on your screen. You know what happens at each step, where data quality issues creep in, and how to query the underlying database directly when the GUI does not give you what you need.

Level 3: ML/AI Literacy -- Know What Is Possible

Timeline: 4-8 months of structured learning Goal: Understand machine learning well enough to evaluate AI tools, work with data scientists, and identify where ML can (and cannot) help

This is where the roadmap diverges from "technical skill-building" into something closer to "strategic literacy." You do not necessarily need to build production-grade ML models yourself. But you absolutely need to understand what machine learning can do, what it cannot do, and how to evaluate the AI tools that vendors and internal data science teams are putting in front of you.

Why? Because the petroleum engineer who can bridge the gap between domain expertise and data science is absurdly valuable. Data scientists know algorithms but rarely understand wellbore hydraulics. Petroleum engineers understand wellbore hydraulics but rarely know which algorithm to apply. The person who speaks both languages becomes the translator -- and translators get promoted.

What to Learn

Core ML concepts. Supervised vs. unsupervised learning. Regression vs. classification. Overfitting and the bias-variance tradeoff. Train/test splits. Cross-validation. Feature engineering. Understand these at an intuitive level, not at the level of deriving the math.

Common algorithms and when to use them. Linear regression, decision trees, random forests, gradient boosting (XGBoost), neural networks. For each: What kind of problem is it good for? What are the failure modes?

Domain-specific applications. Decline curve prediction, production optimization, artificial lift troubleshooting, sweet spot identification, completion design optimization. Our article on physics-informed decline curve AI goes deep on one of the most practical applications.

AI tool evaluation. How to ask the right questions when a vendor presents an "AI-powered" solution: What data does it train on? How is the model validated? What is the out-of-sample accuracy? What happens outside the training distribution?

Starter Projects Using Public Data

7. ML-based decline curve prediction. Using the Texas RRC dataset, train a random forest model to predict 24-month cumulative production based on completion parameters (lateral length, proppant loading, fluid volume) and well location. Compare the ML model's predictions against traditional Arps decline fits. Where does each approach win?

8. Completion design clustering. Use unsupervised learning (k-means or DBSCAN) on FracFocus completion data to identify distinct completion design clusters in a specific basin. Are there clear groupings by operator? By vintage? Do clusters correlate with production outcomes?

9. Anomaly detection on production data. Train an isolation forest or autoencoder on historical production data to automatically flag anomalous production behavior -- sudden drops, unlikely spikes, gradual degradation patterns. Compare the algorithm's flags against known events (workovers, shut-ins, equipment failures) to evaluate how well it detects real operational issues.

Tools

scikit-learn (the standard ML library for Python)
XGBoost (gradient boosting -- the workhorse of tabular data ML)
Jupyter Notebooks (essential for iterative ML work)
Google Colab (free GPU access for neural network experiments)

What Success Looks Like

A vendor demos an "AI-powered production optimization" platform. Instead of nodding along, you ask: "What is your training data? How do you handle wells with less than six months of production history? What is your RMSE on holdout wells? Does the model account for offset well interference?" You get useful answers -- or you expose that they do not have them.

Level 4: AI Agent Builder -- Deploy Autonomous Workflows

Timeline: 3-6 months, building on everything before Goal: Build AI agents that autonomously execute engineering workflows using real data

This is the frontier. And it is where the career leverage becomes exponential.

An AI agent is not a chatbot. It is not a dashboard. It is an autonomous workflow that can perceive data, reason about what it means, plan a course of action, and execute it -- using tools, APIs, and databases -- within boundaries you define. If Level 3 was about understanding what AI can do, Level 4 is about building systems that actually do it.

As I detailed in Agentic AI for Upstream Oil & Gas, the energy industry is at an inflection point. SLB's Tela platform, Baker Hughes' Leucipa agents, and Cognite's Atlas AI are bringing agentic capabilities to the largest operators. But mid-size and small operators -- the companies that employ a disproportionate share of entry-level engineers -- do not have access to these enterprise platforms. That is where engineer-built agents fill the gap.

What to Learn

The Model Context Protocol (MCP). MCP is the open standard for connecting AI models to external data sources and tools -- a universal interface that lets any LLM-powered application connect to databases, APIs, and file systems. The ecosystem has grown to over 8,600 servers, but only 3 exist for energy and oil & gas data. Massive whitespace. Our article on MCP Servers for Oilfield Data explains the architecture in detail.

LLM fundamentals. How large language models work conceptually. Prompting strategies. Context windows. Tool use (function calling). Retrieval-augmented generation (RAG). You do not need to train models. You need to use them effectively and understand their limitations.

Agent frameworks. LangChain, CrewAI, and the Anthropic Agent SDK provide scaffolding for building agents that use LLMs as reasoning engines connected to real data. The real value you bring is domain knowledge -- knowing which workflows to automate and which guardrails to enforce.

Engineering guardrails. AI agents in petroleum engineering must operate within physics-based and regulatory constraints. A production optimization agent cannot recommend a choke setting that exceeds the wellhead pressure rating. A decline curve agent cannot forecast negative production. Translating your PE knowledge into machine-enforceable boundaries is where domain expertise meets AI capability.

Starter Projects

10. Production reporting agent. Build an agent that automatically pulls daily production data, compares it against forecasts, identifies wells with significant variance, generates a written summary, and produces formatted plots -- all triggered by a single command. Our article on AI agents for production reporting walks through exactly this use case.

11. Regulatory data research agent. Build an agent that queries public regulatory databases (Texas RRC, FracFocus, COGCC) to answer ad hoc questions: "What were the average 30-day IPs for Wolfcamp A wells in Loving County completed in Q4 2025?" "Which operators in the DJ Basin had the highest water cut increase year over year?" The agent should pull the data, perform the analysis, and return a written answer with supporting charts.

12. Hands-on with petro-mcp. Groundwork Analytics maintains an open-source MCP server for petroleum engineering data at github.com/petropt/petro-mcp. It includes tools for decline curve analysis, production data processing, and well log parsing. Install it, connect it to an AI assistant like Claude Desktop, and use it to analyze public well data. Then extend it -- add a new tool, connect a new data source, build a workflow that solves a real problem you face at work.

Tools

Claude, GPT-4, or other capable LLMs (for agent reasoning)
MCP SDKs (Python and TypeScript)
petro-mcp (open-source petroleum engineering MCP server)
LangChain, CrewAI, or Anthropic Agent SDK (agent orchestration frameworks)
Docker (for deploying agents reliably)

What Success Looks Like

Your team lead asks: "Can you put together a variance report for the Midland Basin wells?" Instead of spending three hours in Excel, you say: "The agent runs every morning at 6 AM. Here is today's report." You just saved three hours of manual work, every day, forever. And the report is more thorough and more consistent than any manual version could be.

The Progression Is the Strategy

Let me zoom out from the individual levels and talk about why this specific progression matters.

Each level multiplies the value of the one before it. Python without data engineering knowledge means you can automate files on your desktop but cannot access the data that matters. Data engineering without ML literacy means you can build clean datasets but cannot extract non-obvious insights from them. ML knowledge without agent-building skills means you can evaluate AI tools but cannot deploy them. And agent-building without PE fundamentals means you are just another programmer who does not understand the domain.

The combination is what creates disproportionate career value. Not any single skill in isolation.

You do not need to be an expert at every level. The goal is not to become a professional software engineer, a database administrator, a machine learning researcher, and a petroleum engineer simultaneously. The goal is to be a petroleum engineer who is competent at each level and fluent enough to bridge the gap between the domain and the technology. That bridging function is where the industry has its most acute shortage.

The timeline is realistic. If you start at Level 1 today and work through the progression at a reasonable pace -- a few hours per week, with projects that connect to your actual job -- you can reach Level 4 competency within 18-24 months. That is fast enough to matter for your career trajectory and slow enough to build genuine understanding rather than surface-level familiarity.

What the Industry Looks Like From the Other Side

I want to paint a picture of what your career can look like if you follow this progression, because the endpoint matters as much as the path.

The petroleum engineer who combines domain expertise with data and AI skills does not just get promoted faster (though they do). They become a fundamentally different kind of contributor to their organization.

You become the person who solves problems nobody else can. Data quality issues in SCADA? They call you. Evaluating whether an ML-based decline curve tool is legitimate? They call you. Management wants to know if a vendor's "AI-powered" product is worth the price tag? They call you. You are not replaceable by someone who only knows PE fundamentals, and you are not replaceable by someone who only knows data science.

You have optionality. Engineers with both domain knowledge and technical skills have options that pure PE engineers do not: data engineering roles at operators, AI product roles at service companies, technical consulting, and the ability to build tools for an industry that desperately needs them. The 127 agent and MCP opportunities we have identified in oil and gas are just the beginning.

You age-proof your career. The engineers who replace the retiring workforce will not do the same job the same way. They will work augmented by AI, with smaller teams managing more wells, with agents handling routine monitoring while engineers focus on judgment calls. If you are already building and deploying those agents, you are not waiting for the future. You are building it.

Where to Start This Weekend

If you have read this far and you are motivated but overwhelmed, here is your first assignment. Pick one. Do it this weekend.

If you have never written Python: Install Anaconda. Open Jupyter Notebook. Go to the Texas RRC production data page. Download a county-level production file. Load it into a pandas DataFrame. Calculate the top 10 producing wells in that county by cumulative oil production. Plot their production histories. That is it. You just did something in code that would have taken 45 minutes in Excel, and you can do it again for any county in 30 seconds.

If you already know Python basics: Pick Project 4 or 5 from Level 2 above. Get your hands dirty with data quality issues. Learn SQL. These skills are more valuable per hour of learning investment than almost anything else you can do right now.

If you already have data skills: Read the MCP Servers for Oilfield Data article. Install petro-mcp. Connect it to Claude Desktop. Ask it to analyze a decline curve. Then think about what tools are missing -- and build one.

The gap between "petroleum engineer who uses Excel" and "petroleum engineer who deploys AI agents" is not talent. It is not IQ. It is not access to expensive tools or elite university programs. It is the decision to start, and the discipline to keep going.

The industry needs 40,000 workers. But it does not need 40,000 more spreadsheet jockeys. It needs engineers who can think like engineers and build like technologists.

You have everything you need to be one of them. Start this weekend.

Dr. Mehrdad Shirangi is the founder of Groundwork Analytics and holds a PhD from Stanford University in Energy Systems Optimization. He has been building AI solutions for the energy industry since 2018. Connect on X/Twitter and LinkedIn, or reach out at info@petropt.com.

State of Oil & Gas Hiring 2026 -- Data on who is hiring, salary benchmarks, and the AI skills premium.
Breaking Into Oil & Gas in 2026 -- Practical guidance for entering the industry as a student or career changer.
5 Open-Source Projects Every PE Student Should Contribute To -- Specific projects where contributions build both skills and portfolio.

Looking for O&G Jobs?

Petro-Jobs uses AI to match your resume to 79+ curated oil & gas positions.

Try Petro-Jobs

Have questions about this topic? Get in touch.

The Reality Check: Why This Matters More Than You Think

Where You Are Now: The Spreadsheet Floor

Level 1: Python Basics -- Automate the Tedious Stuff

What to Learn

Starter Projects Using Public Data

Tools

What Success Looks Like

Level 2: Data Engineering -- Understand Where the Data Lives

What to Learn

Starter Projects Using Public Data

Tools

What Success Looks Like

Level 3: ML/AI Literacy -- Know What Is Possible

What to Learn

Starter Projects Using Public Data

Tools

What Success Looks Like

Level 4: AI Agent Builder -- Deploy Autonomous Workflows

What to Learn

Starter Projects

Tools

What Success Looks Like

The Progression Is the Strategy

What the Industry Looks Like From the Other Side

Where to Start This Weekend

Related Articles