Predicting ESP and Rod Pump Failures Before They Happen: A Practical Guide to AI-Driven Artificial Lift Optimization

Dr. Mehrdad Shirangi | | Published by Groundwork Analytics LLC

Editorial disclosure

This article reflects the independent analysis and professional opinion of the author, informed by published research, vendor documentation, and practitioner experience. No vendor reviewed or influenced this content prior to publication. Product capabilities described are based on publicly available information and may not reflect the latest release.

Roughly 85% of oil wells in the United States require some form of artificial lift. Of those, approximately 85% are on rod pump, 10% on gas lift, and 5% on ESPs or hydraulic pumps. These are not optional systems. When they fail, production stops.

And yet most operators still run their artificial lift equipment reactively. The pump fails. The well goes down. Someone dispatches a workover rig. The rig costs $30,000-50,000 per day. The workover takes two to five days. Deferred production adds another $20,000-100,000 depending on the well's rate. Equipment replacement -- a new ESP, a rod string, a downhole pump -- adds more. All in, a single artificial lift failure can cost $50,000-200,000 per event.

Multiply that across a portfolio of 500 or 2,000 wells, and you are looking at one of the largest controllable cost categories in upstream operations. The question is no longer whether AI can help predict and prevent these failures. The question is why most operators have not deployed it yet.

This article covers what works, what does not, and what it actually takes to move from reactive lift management to predictive optimization -- with real numbers from operators who have done it.


The True Cost of Reactive Lift Management

Before diving into the AI approaches, it is worth quantifying what reactive management actually costs. Most operators track workover spend and failure frequency, but they undercount the real impact because deferred production, accelerated wear from running equipment outside optimal ranges, and the opportunity cost of engineering time spent on fire drills rarely show up in the same spreadsheet.

Direct Costs Per Failure Event

Cost CategoryRod PumpESPGas Lift
Workover rig mobilization + daily rate$60,000-150,000$100,000-250,000$15,000-40,000
Equipment replacement$15,000-40,000$80,000-200,000$5,000-20,000
Deferred production (3-10 days)$15,000-80,000$30,000-150,000$10,000-50,000
Total per event$90,000-270,000$210,000-600,000$30,000-110,000

ESP failures are the most expensive because they require pulling the entire completion string. A single ESP failure in a high-rate well can easily exceed $400,000 when you account for the full cycle: failure detection, rig scheduling (which can take days or weeks in a tight market), pulling the completion, running a new ESP, and the production lost during the entire window.

The Hidden Costs

The numbers above capture direct costs. They miss several categories that often dwarf the workover bill:

  • Cascade failures. A rod pump running with gas interference or fluid pound does not just underperform -- it accelerates wear on the rod string, tubing, and downhole pump. What starts as a correctable operating condition becomes a rod part if left unaddressed for weeks.
  • Overpumping damage. Chord Energy discovered that two-thirds of their rod lift wells were overpumping before deploying AI optimization. Overpumping does not just waste electricity -- it drives rods into compression, accelerates tubing wear, and shortens run life.
  • Engineering time. A production engineer manually reviewing dynacard data, ESP sensor trends, and well test results across 200+ wells cannot catch problems early consistently. The math does not work. Even a good engineer can review perhaps 20-30 wells per day with any depth. Problems in the other 170+ wells develop unmonitored.
  • Suboptimal production. Wells running at non-optimal pump speeds, injection rates, or cycle times produce less than they could. This is not a failure -- it is chronic underperformance that never shows up as a line item.

Rod Pump AI: Dynacard Classification and Beyond

Rod pump (sucker rod pump or beam pump) wells represent the largest share of the artificial lift population, and they also generate the richest diagnostic data through dynamometer cards -- the load-versus-position plots that capture the mechanical behavior of the pumping system on every stroke.

How Dynacard Classification Works

A dynamometer card -- or dynacard -- plots the load on the polished rod against the position of the polished rod through one complete pump stroke. The shape of this card tells an experienced production engineer what is happening downhole: normal pumping, fluid pound, gas interference, worn pump, leaking traveling valve, leaking standing valve, rod part, tubing movement, and a dozen other conditions.

Traditionally, this classification has been done by human experts. A skilled rod pump engineer can look at a dynacard and diagnose the well condition in seconds. The problem is scale. With SCADA systems now collecting dynacards every few minutes across thousands of wells, no human team can keep up.

Machine learning models -- particularly convolutional neural networks (CNNs) -- can classify dynacards with accuracy that matches or exceeds human experts:

  • CNN-based models have achieved 99.5% accuracy in classifying dynacard patterns across 12 distinct fault conditions, including fluid pound, gas interference, worn pump barrel, rod part, and tubing anchor failure.
  • Transfer learning approaches using pre-trained architectures like GoogLeNet have demonstrated strong performance even with limited labeled training data -- an important consideration since most operators do not have tens of thousands of labeled cards when they start.
  • Real-time deployment has reduced average downtime from 4.8 hours per month per well to 1.7 hours per month per well (a 64% reduction) and cut deferred production from 57 barrels per month to 29 barrels per month per well.

Surface Cards vs. Downhole Pump Cards

An important distinction: most SCADA systems capture surface dynamometer cards, measured at the polished rod. The downhole pump card -- what is actually happening at the pump -- must be computed from the surface card using wave equation models that account for rod stretch, damping, and fluid loads along the rod string.

The quality of your downhole pump card calculation directly affects the quality of your AI diagnostics. Physics-based wave equation solvers (like those in ChampionX's XSPOC platform) remain essential even in an ML-driven workflow. The best approaches combine physics-based downhole card synthesis with ML-based pattern classification -- a hybrid approach that is more robust than either method alone.

Beyond Classification: Predictive Failure Detection

Classification tells you what is happening now. Prediction tells you what will happen next. The more advanced rod pump AI systems go beyond classifying the current card shape to detecting trends that precede failure:

  • Gradual load changes -- A slow increase in peak polished rod load over days or weeks may indicate scale buildup, sand accumulation, or progressive tubing wear.
  • Intermittent fault patterns -- A well that shows gas interference cards during certain hours but normal cards during others may have a developing gas handling problem.
  • Rod compression tracking -- Monitoring the percentage of the stroke where the rod string goes into compression provides an early indicator of rod fatigue and potential rod part.

Field case studies have demonstrated detection of "hole in barrel" conditions and rod part precursors 1-2 days before production decline occurs -- enough lead time to schedule intervention proactively rather than reactively.

Chord Energy: 2,500 Wells, 38% Fewer Failures

The most publicly documented large-scale rod pump AI deployment is Chord Energy's use of Ambyint's InfinityRL platform across 2,500 rod lift wells in the Bakken.

The deployment connected approximately 2,400 VFD/POC (variable frequency drive / pump-off controller) wells for autonomous min/max SPM (strokes per minute) optimization, plus roughly 100 POC wells for idle time optimization. Results over a six-month measurement period:

  • 38% reduction in failure rates across the portfolio
  • $1 million in estimated operational cost savings
  • 7% increase in oil production
  • 19% increase in liquid production
  • 28% reduction in SPM from deployment baseline (meaning less wear on equipment while producing more)
  • 19% reduction in rods-in-compression events

The key insight from Chord's deployment: the system discovered that two-thirds of their wells were overpumping. The AI was not just preventing failures -- it was correcting chronic misoperation that human-driven surveillance had not caught at scale.

A separate Ambyint deployment for an Eagle Ford operator found a similar pattern: automated dynacard analysis and setpoint optimization gave engineers visibility into problems across wells they simply could not monitor manually, shifting their time from reactive troubleshooting to proactive decision-making.


ESP AI: Current Signatures, Vibration, and Thermal Trends

Electrical submersible pumps present a different AI challenge than rod pumps. ESPs do not generate dynacards. Instead, they produce a continuous stream of electrical and sensor data that contains failure signatures -- if you know what to look for.

The ESP Sensor Suite

A modern ESP completion with downhole gauges typically provides:

  • Motor current and voltage (three-phase) -- The primary diagnostic signal. Changes in current draw reflect changes in pump loading, gas slugging, and motor degradation.
  • Motor temperature -- Rising temperature trends indicate insufficient cooling (typically from reduced fluid flow past the motor) and precede motor burnout.
  • Intake pressure -- Pressure at the pump intake, critical for monitoring drawdown and detecting gas interference.
  • Discharge pressure -- Pressure at the pump outlet, useful for detecting pump wear (declining differential pressure at constant speed).
  • Vibration (X and Y axes) -- Indicates shaft misalignment, bearing wear, scale buildup on impellers, and sand damage.
  • Wellhead pressure and temperature -- Surface measurements that complement downhole data.

What AI Models Can Detect

The research and field deployments on ESP predictive maintenance have converged on several high-value prediction targets:

Motor Failure Prediction. XGBoost and LSTM models trained on ESP operational data can predict motor failures 7 days before the event with F1-scores exceeding 0.71. While a 0.71 F1-score may not sound impressive in a machine learning context, in operational terms it means catching the majority of failures with an acceptable false alarm rate -- and each caught failure avoids a $200,000-600,000 workover.

Gas Slug Detection. ESPs do not handle free gas well. Gas slugging causes current fluctuations, pump cycling, and accelerated wear. ML models trained on high-frequency current data can detect gas interference patterns and recommend speed adjustments or gas handler activation before efficiency drops.

Pump Degradation Tracking. By monitoring the relationship between pump speed, intake pressure, and production rate over time, ML models can track pump performance degradation curves and predict when a pump will no longer maintain target production rates -- allowing planned workovers during optimal windows rather than emergency interventions.

Scale and Sand Detection. Vibration signature analysis using frequency-domain features can distinguish between scale buildup (which causes gradual vibration increase) and sand damage (which causes characteristic high-frequency vibration patterns). Early detection enables chemical treatment rather than pump replacement.

ESP Market Context

The global ESP market was valued at approximately $6.4 billion in 2025 and is projected to grow to $13-14.4 billion by 2035. AI-integrated monitoring, predictive analytics, and variable speed drive controls are rapidly becoming standard features. Predictive failure analytics and remote surveillance have cut unplanned ESP pullouts by 17% in documented deployments.

Neural Networks for Autonomous ESP Control

The frontier of ESP AI is closed-loop autonomous control. Neural network models trained on ESP datasets in the Permian Basin now recommend and directly write optimal pump setpoints, delivering 2-4% oil production uplift and longer run life while demonstrating full self-pumping capability. This is a meaningful shift: from models that advise engineers to systems that control equipment directly.


Gas Lift AI: Injection Rate Optimization and Valve Diagnostics

Gas lift optimization is mathematically different from rod pump or ESP optimization. The core problem is allocation: given a fixed amount of lift gas (constrained by compressor capacity or gas availability), how do you distribute injection rates across dozens or hundreds of wells to maximize total field production?

The Optimization Problem

Each gas lift well has a characteristic performance curve: oil production rate as a function of gas injection rate. This curve is not linear -- production increases with injection rate up to a point, then flattens, and eventually decreases as excess gas causes flow instabilities. The optimal injection rate sits at the knee of this curve.

The challenge is that these curves change over time as reservoir conditions evolve, water cuts increase, and wellbore conditions change. Static allocation based on well tests performed months ago leaves significant production on the table.

What ML Brings to Gas Lift

A closed-loop gas lift optimization workflow deployed across more than 1,300 wells in the Permian Basin demonstrated what AI can deliver at scale. The system conducts automated multirate tests through remote control of gas lift injection rate setpoints combined with automated well data acquisition, then uses ML models to determine optimal allocation.

Key results from documented deployments:

  • 2.0% average oil production uplift across the optimized well population -- modest per well, but significant across 1,300+ wells.
  • Gradient Boosting models achieved prediction of optimal gas injection rate with a mean absolute error of 7.0% and corresponding liquid production rate prediction with an MAE of just 1.3%.
  • Artificial Neural Networks achieved R-squared scores of 0.9959, 0.9972, and 0.9977 for oil, water, and gas rate predictions respectively.

Intermittent Gas Lift Scheduling

For lower-rate wells or fields with limited gas supply, intermittent gas lift -- where gas is injected in cycles rather than continuously -- introduces a scheduling optimization problem. AI models optimize injection timing, duration, and volume for each cycle, adapting to changing well conditions automatically.

Valve Diagnostics

Gas lift valve performance degrades over time. Valves may not fully close (causing gas bypass), may not open at design pressures (reducing lift efficiency), or may have erosion damage. ML models trained on casing pressure and tubing pressure data during injection cycles can identify valve performance problems without pulling the completion -- enabling targeted workovers rather than trial-and-error troubleshooting.


What Data Infrastructure You Actually Need

The most common reason artificial lift AI projects stall is not the model -- it is the data. Here is what you need before the ML discussion even starts.

SCADA Requirements

Lift TypeMinimum SensorsRecommended SamplingCritical Data
Rod PumpPolished rod load cell, position sensor, motor currentDynacard every 5-15 min; surface parameters every 1-5 minFull-resolution dynacards (not just summary cards)
ESPMotor current (3-phase), intake pressure, motor temperature1-30 second intervals for predictive models; 1-minute for monitoringHigh-frequency current waveforms for motor diagnostics
Gas LiftCasing pressure, tubing pressure, injection rate (meter)1-5 minute intervalsWellhead pressures during rate changes for curve building

The Sampling Frequency Question

Higher-frequency data enables more sophisticated models but generates significant data volumes. A practical approach:

  • 1-second data is necessary for ESP motor current signature analysis and vibration diagnostics, but can be processed at the edge and stored as feature summaries rather than raw waveforms.
  • 5-15 minute dynacard intervals capture the operating envelope for rod pump classification. More frequent cards (every 1-2 minutes) help during transient events (well startups, rod string changeouts) but are not necessary for steady-state monitoring.
  • 1-5 minute intervals for gas lift pressures and flow rates provide sufficient resolution for performance curve updates and valve diagnostics.

Data Quality Over Data Quantity

The real bottleneck is usually data quality, not data volume:

  • Sensor calibration. A load cell that has drifted 5% produces dynacards that look like a different well condition. Pressure transducers without regular calibration create phantom trends.
  • Timestamp alignment. When SCADA data, well test data, and workover records exist in different systems with different timestamps, correlating events becomes unreliable. If your well test says the well makes 200 BOPD but your SCADA-derived rate estimate says 150, every model built on that data will be wrong.
  • Failure labeling. Supervised ML models need labeled examples: "this card pattern preceded a rod part by 3 days." If your failure records are in a spreadsheet maintained by a pumper with inconsistent coding, your training data is noise.
  • Completions and workover history. Pump changes, rod string changes, tubing replacements, and chemical treatments all create step changes in the data. Without linking these events to the sensor data, the model cannot distinguish a legitimate trend from a hardware change.

Integration Architecture

Most operators already have SCADA data flowing from wellsites to a central historian or cloud platform. The AI layer needs to sit on top of this existing infrastructure, not replace it:

  1. Data ingestion from SCADA historian (OSIsoft PI, Aveva, Ignition, or cloud-native platforms like AWS IoT)
  2. Feature engineering -- computing derived parameters (pump fillage, downhole pump cards, ESP performance index) from raw sensor data
  3. Model inference -- running classification and prediction models on the computed features
  4. Alert and recommendation delivery -- surfacing results through existing dashboards, mobile apps, or SCADA alarm systems
  5. Closed-loop control (optional, advanced) -- writing optimized setpoints back to VFDs and pump-off controllers

Tools like petro-mcp, our open-source MCP server for production engineering calculations, can serve as the computational bridge between raw well data and AI-ready feature sets -- handling unit conversions, IPR calculations, and nodal analysis that feed into lift optimization models.


Proven Deployments: Who Is Actually Doing This

The gap between conference paper results and field deployment remains wide in upstream oil and gas. Here are the deployments with publicly documented results:

Chord Energy + Ambyint (Bakken, 2,500 wells)

As detailed above: 38% failure rate reduction, $1M savings, 7% oil production increase across 2,500 rod lift wells. The deployment uses physics-based models combined with AI for autonomous setpoint optimization through closed-loop SCADA control. This is the most comprehensive publicly documented rod pump AI deployment in North America.

Baker Hughes Leucipa + Expand Energy (Marcellus, Utica, Haynesville)

Baker Hughes deployed its Leucipa automated field production platform across Expand Energy's natural gas portfolio -- thousands of wells across the Marcellus, Utica, and Haynesville shales. The deployment includes Production Management and Field Optimizer services, deployed as a SaaS platform on AWS.

Expand Energy is also piloting "Lucy," the Leucipa AI Production Assistant -- a generative AI-powered conversational interface that provides real-time analysis of production data and simplifies field decision-making. This represents the next evolution: from dashboards and alerts to natural language interaction with production data.

Baker Hughes + Repsol (Leucipa AI-Powered Functionality)

Baker Hughes and Repsol are jointly launching new AI-powered functionality within the Leucipa platform, expanding the system's autonomous capabilities for production optimization. Details of specific lift optimization features are still emerging, but the partnership signals that major international operators -- not just US independents -- are moving toward AI-driven production management.

ChampionX XSPOC + LOOKOUT

ChampionX's XSPOC platform provides physics-based diagnostics and AI for rod pump, ESP, and gas lift optimization. The XSPOC 3.2 release expanded AI-driven autonomous control capabilities, added uplift and economic opportunity identification for rod and gas lift wells, and introduced plunger lift analytics. Their LOOKOUT service provides remote monitoring and optimization with human-in-the-loop expert oversight -- a pragmatic approach for operators not ready for fully autonomous control.

Permian Basin Gas Lift (1,300+ wells)

A documented closed-loop gas lift optimization workflow across 1,300+ wells in the Permian achieved 2.0% average oil production uplift through automated multirate testing and ML-based injection rate optimization.


Implementation Roadmap: Pilot to Scale for a Mid-Size Operator

For a mid-size operator running 300-1,000 artificial lift wells, here is a realistic path from reactive lift management to AI-driven optimization.

Phase 1: Data Foundation (Months 1-3)

Objective: Get your data house in order before touching any ML.

  • Audit SCADA coverage. What percentage of your wells have SCADA? What data is being collected? At what frequency? Are dynacards being captured, or only summary parameters?
  • Assess data quality. Pull 90 days of data for 50 representative wells. Check for gaps, sensor drift, timestamp issues, and calibration problems.
  • Link operational records. Connect failure records, workover histories, and well test data to the SCADA time series. This creates the labeled dataset your models will need.
  • Identify quick wins. Even before ML, you will likely find wells with obvious problems visible in the data -- pumps running at wrong speeds, excessive cycling, failed sensors that nobody has noticed.

Cost: $50,000-100,000 (internal engineering time + possible data integration consulting) Expected outcome: Clean, labeled dataset for 50+ wells; list of immediate operational fixes.

Phase 2: Pilot Deployment (Months 3-6)

Objective: Deploy AI on a subset of wells and validate results against a control group.

  • Select 50-100 pilot wells -- ideally a mix of lift types (rod pump, ESP, gas lift) and problem histories (frequent failures, chronic underperformers, wells with good sensor coverage).
  • Deploy a commercial platform (Ambyint, ChampionX XSPOC, or a custom solution) or build internal models using your labeled dataset.
  • Run in advisory mode first. The system generates recommendations; engineers decide whether to act. This builds trust and catches model errors before they affect operations.
  • Measure against control group. Compare failure rates, production volumes, and operating costs for pilot wells versus non-pilot wells over the same period.

Cost: $100,000-300,000 (platform licensing, integration, engineering time) Expected outcome: Validated 15-30% failure rate reduction on pilot wells; quantified production uplift; identified model gaps.

Phase 3: Controlled Expansion (Months 6-12)

Objective: Scale to the full well population with increasing automation.

  • Expand to all SCADA-connected wells based on pilot results.
  • Enable closed-loop control on well categories where the pilot showed reliable model performance. Start with SPM optimization on rod pumps (lowest risk) before moving to ESP setpoint control (higher consequence of errors).
  • Build internal ML operations capability. Model performance degrades over time as well conditions change. You need a process for monitoring model accuracy, retraining on new data, and deploying updated models -- not a one-time science project.
  • Integrate with planning workflows. AI predictions should feed into workover scheduling, equipment procurement, and rig planning -- not just alarm dashboards.

Cost: $200,000-500,000 (platform expansion, SCADA upgrades for uncovered wells, ML ops) Expected outcome: Portfolio-wide failure rate reduction of 25-40%; 2-7% production uplift; measurable reduction in workover spend.

Phase 4: Autonomous Operations (Year 2+)

Objective: Move from AI-assisted to AI-driven operations.

  • Autonomous setpoint optimization across all lift types.
  • Predictive workover scheduling -- the system recommends when to pull a well based on predicted remaining run life, rig availability, and economic optimization.
  • Cross-well optimization -- allocating gas lift, managing field-level power consumption, and balancing production targets across the portfolio.
  • Continuous learning -- models automatically retrain on new failure events, well interventions, and production changes.

This is where companies like Chord Energy are now: AI running on 99% of rod lift wells, with human oversight focused on exceptions rather than routine surveillance.


What Does Not Work

For honesty's sake, here is what the vendor presentations usually skip:

  • ML without domain knowledge. A pure data-driven approach that treats dynacards as generic image classification will fail on edge cases that a production engineer would catch immediately. The best systems combine physics (wave equation, IPR models, pump performance curves) with ML (pattern classification, anomaly detection).
  • Deploying to wells without adequate SCADA. If your dynacard data has 4-hour gaps, your ESP current data is sampled every 30 minutes, or your gas lift injection rates are manually recorded once a day, no amount of ML sophistication will help. Fix the data infrastructure first.
  • One-time model deployment. Wells change. Reservoirs deplete. Pump conditions degrade. Water cuts increase. A model trained on 2024 data and never retrained will lose accuracy by mid-2025. ML ops -- continuous monitoring, retraining, and validation -- is not optional.
  • Ignoring the human workflow. The best model in the world is useless if the alert goes to an inbox nobody checks, or if the recommended setpoint change requires a field visit to implement because the well does not have remote control capability.
  • Expecting immediate ROI on every well. Some wells simply do not fail often enough or produce enough oil to justify the monitoring cost. Focus AI efforts on the wells where the economic case is clear: high-rate wells, frequent failers, and wells with expensive interventions.

The Bottom Line

Artificial lift AI is not a research project anymore. Chord Energy is running it on 2,500 wells. Baker Hughes is deploying Leucipa across thousands of Expand Energy wells. ChampionX's XSPOC platform is processing dynacards and ESP data with AI at scale. A gas lift optimization workflow has delivered 2% production uplift across 1,300 Permian Basin wells.

The results are consistent across deployments: 25-40% failure rate reductions, 2-7% production increases, and payback periods measured in months rather than years. For a mid-size operator with 500+ artificial lift wells, the annual value of these improvements is typically $2-5 million -- primarily from avoided workovers and recovered deferred production.

The barriers are not technical. They are organizational: data infrastructure that has not kept up with SCADA capabilities, siloed operational data that has never been linked to failure records, and engineering teams that are understaffed for the surveillance workload they already have.

AI does not replace the production engineer. It gives the production engineer superhuman surveillance capacity -- the ability to watch every well, every minute, and catch the problems that would otherwise develop unnoticed until they become failures.

The operators who have figured this out are already seeing the results. The ones who have not are still paying $200,000 per ESP failure and wondering why their lifting costs keep climbing.


For production engineering calculations and well analysis tools, explore petro-mcp, our open-source MCP server for petroleum engineering. For more on software platforms across the upstream value chain, see our guides on drilling operations software, production operations and AI, and reservoir management software.


Dr. Mehrdad Shirangi is the founder of Groundwork Analytics and holds a PhD from Stanford University in Energy Systems Optimization, with research focused on computational methods for reservoir management and production optimization. He has been building AI solutions for the energy industry since 2018. Connect on X/Twitter and LinkedIn, or reach out at info@petropt.com.


Related Articles

Have questions about this topic? Get in touch.