The Engineer's Checklist for Evaluating AI Vendors in Oil and Gas: 20 Questions to Ask Before You Sign

Dr. Mehrdad Shirangi | | Published by Groundwork Analytics LLC

Editorial disclosure

This article is vendor-agnostic by design. The author is founder of Groundwork Analytics, an AI solutions provider for the energy industry. Every question in this checklist applies equally to Groundwork and its competitors. No vendor reviewed or influenced this content prior to publication.

If you are a VP of Engineering or VP of Operations at a mid-size operator right now, you are probably being pitched by AI vendors. A lot of them. Ten, maybe fifteen different companies, each with a polished demo, a compelling case study, and a sales team that speaks just enough petroleum engineering to be dangerous.

The AI in oil and gas market hit $4.28 billion in 2026 and is growing at 13% CAGR. That kind of money attracts real innovation -- and real noise. For every vendor with a genuinely useful product, there are three whose demos were built on cherry-picked data from a single well in a cooperative dataset. The demo looks great. The deployment does not.

The problem is not that AI does not work in oil and gas. It does. Physics-informed decline curve models outperform traditional DCA in many basins. Machine learning-based anomaly detection catches failures that rule-based systems miss. Production optimization agents are reducing the daily triage burden on overworked production engineers. The technology is real.

The problem is that evaluating AI vendors is hard, especially if you have never deployed machine learning in a production environment. Vendor demos are designed to impress, not to inform. The questions that matter -- about data requirements, failure modes, integration complexity, and commercial lock-in -- are exactly the questions that sales teams are trained to deflect.

This article gives you 20 specific questions to ask before you sign. They are organized into five categories: data requirements, model and methodology, deployment and integration, results and validation, and commercial terms. For each question, I explain why it matters and what a red flag answer looks like.

This checklist is not designed to help you pick a specific vendor. It is designed to help you avoid getting burned.


Before You Start: Set the Ground Rules

Before any vendor demo, establish two ground rules with your evaluation team:

Ground rule 1: Bring your own data to the demo. Ask the vendor to run their model on your data, from your wells, in your basin. Any vendor that refuses -- or says they "need to prepare the data first" and will get back to you in two weeks -- is telling you something important about how their product handles real-world data.

Ground rule 2: Include a skeptic. Put your most detail-oriented reservoir or production engineer in the room. The person who asks uncomfortable questions about assumptions. The person who will notice when the vendor's "predicted vs. actual" chart conveniently stops before the model diverged. That person is your most valuable asset in vendor evaluation.

With those ground rules set, here are the 20 questions.


Category 1: Data Requirements

Every AI model is only as good as the data it consumes. Data is where most AI deployments in oil and gas fail -- not because the model is wrong, but because the data pipeline was never properly scoped. The questions in this section expose whether a vendor has actually deployed in messy, real-world operator environments or only in sanitized datasets.

Question 1: What specific data do you need, and at what frequency and history depth?

Why it matters: Vague answers here indicate a product that has not been battle-tested. A vendor that has deployed successfully at mid-size operators should be able to tell you exactly which data streams they need (surface pressures, production volumes, well tests, completions data, SCADA tags), at what frequency (daily, hourly, sub-minute), and how much historical data is required to train a useful model.

Red flag: "We can work with whatever data you have." This sounds flexible. It usually means the vendor has not done enough deployments to know what they actually need. A credible vendor will give you a specific data manifest -- and will be honest about what their model cannot do if certain data streams are unavailable.

Question 2: What format does the data need to be in, and who is responsible for data preparation?

Why it matters: Data preparation -- cleaning, normalizing, mapping field names, reconciling time zones, handling unit conversions -- consumes 60-80% of the effort in most AI deployments. If the vendor assumes you will deliver clean, formatted data through an API, and your data lives in spreadsheets, SCADA historians, and production accounting systems that do not talk to each other, you have a deployment timeline problem before the project starts.

For more context on the data fragmentation challenge, see Drilling Data Management: WITSML, Cloud, and the Integration Problem.

Red flag: "Just give us a CSV export and we'll handle it." This trivializes what is often the hardest part of the project. Ask specifically: who writes the data extraction scripts? Who maps your SCADA tag names to their model's expected inputs? Who handles the inevitable data quality issues?

Question 3: What happens when our data is messy, incomplete, or inconsistent?

Why it matters: Every operator's data is messy. Wells go offline and SCADA stops recording. Production volumes get manually entered with typos. Gauge pressures drift. Well tests are months apart. Completions records from 2015 are in a different format than records from 2023. A vendor that has not built robust data quality handling into their pipeline will deliver a model that works beautifully on the training set and falls apart in production.

For a deeper look at data quality issues specific to production environments, see SCADA Data Quality and AI Readiness: A Practical Checklist.

Red flag: "Our model is robust to noisy data." Press on this. Ask for specifics. What percentage of missing data can the model tolerate? How does it handle sensor drift? What does the output look like when it receives obviously bad input -- does it flag it, impute it, or silently produce a bad prediction?

Question 4: Do you access our raw data, or does the data stay within our environment?

Why it matters: Data governance is a real concern. Some vendors require you to push data to their cloud environment. Others can operate within your infrastructure. Neither approach is inherently wrong, but you need to know the answer before your IT security team kills the project six months in. If the vendor needs your data in their cloud, ask where it is stored, who can access it, and what happens to your data if you terminate the contract.

Red flag: Evasive answers about data residency or unwillingness to sign a clear data processing agreement. Also watch for vendors who say they need your data "to improve the model" -- this may mean your proprietary production data is being used to train models that benefit their other clients.


Category 2: Model and Methodology

This is where demos are most misleading. A well-designed demo can make any model look brilliant. These questions force the vendor to explain what is actually happening under the hood -- and whether it will hold up in your operating environment.

Question 5: What type of model are you using, and why did you choose it?

Why it matters: "We use AI" is not an answer. There is a meaningful difference between a gradient-boosted decision tree, a neural network, a physics-informed model, and a large language model. Each has different data requirements, different failure modes, different explainability characteristics, and different computational costs. A vendor should be able to explain their model choice in terms you understand and justify why it is appropriate for your specific use case.

Red flag: "Our proprietary AI engine" with no further explanation. Proprietary is fine. Opaque is not. If the vendor cannot or will not explain the general class of model they use, they are either hiding something or their technical team is not involved in the sales process -- both are problems.

Question 6: How explainable are the model's recommendations?

Why it matters: In oil and gas, engineers need to understand why a model is recommending a specific action. A production engineer will not change a rod pump's stroke length because a black box said so. They need to see which variables drove the recommendation and whether the reasoning aligns with their field experience. Explainability is not a nice-to-have; it is a deployment requirement.

Red flag: "The model just works -- trust the output." No experienced operator will trust AI recommendations they cannot interrogate. Ask to see the feature importance rankings. Ask what happens when the model's recommendation contradicts engineering judgment. If the vendor has no answer for that scenario, they have not deployed in a real operating environment.

Question 7: Is the model physics-informed, pure machine learning, or a hybrid?

Why it matters: Pure ML models trained on historical data can interpolate well but extrapolate poorly. If your operating conditions change -- new completion design, different artificial lift, infill drilling that changes reservoir pressure -- a pure ML model trained on old data may produce unreliable predictions. Physics-informed models that incorporate reservoir mechanics, nodal analysis, or material balance tend to generalize better to new conditions, but they require more domain expertise to build and calibrate.

For a detailed comparison of these approaches, see Decline Curve Analysis Meets Machine Learning: Physics-Informed AI for Production Forecasting.

Red flag: A vendor that does not know the difference, or that claims pure ML is always superior to physics-based approaches. The best solutions in oil and gas are almost always hybrid -- machine learning for pattern recognition, physics for constraints and extrapolation.

Question 8: How does the model handle sparse data or edge cases?

Why it matters: Most mid-size operators do not have thousands of wells with complete, high-frequency data histories. They have a few hundred wells, some with good data and some with gaps. They have edge cases: wells with unusual completions, wells that were shut in for years, wells on the boundary of the field with different geology. A model that only works well on the fat part of the distribution is not useful if your most expensive problems are in the tails.

Red flag: "We need at least 500 wells to train the model." If you have 150 wells, this vendor is not for you -- and they should have told you that before the third demo. A credible vendor will tell you honestly what their minimum data requirements are and what performance trade-offs come with smaller datasets.


Category 3: Deployment and Integration

This is where projects die. The model works in the lab. The demo was impressive. But deploying it into your actual operating environment -- with your SCADA system, your production accounting software, your IT security policies, and your field engineers who are already overworked -- is a different challenge entirely.

Question 9: Cloud, on-premise, or hybrid -- and what does each option require?

Why it matters: Cloud deployment is faster and cheaper to start, but it requires reliable connectivity from field locations and comfort with data leaving your environment. On-premise deployment keeps data internal but requires your IT team to maintain infrastructure. Hybrid approaches exist but add complexity. There is no universally right answer, but the vendor should be able to support your preferred architecture.

Red flag: "Cloud only, no exceptions." Some operators have legitimate security or connectivity constraints that make cloud-only solutions impractical. A vendor that offers only one deployment model is either early-stage (limited engineering resources) or inflexible (not a good long-term partner).

Question 10: What is the realistic deployment timeline, from contract to first useful output?

Why it matters: Vendor timelines are almost always optimistic. "Up and running in four weeks" usually means "the software is installed in four weeks; useful, validated output takes four months." Ask specifically about each phase: data integration, model training, validation, user training, and iterative refinement. Ask what the longest deployment they have done was, and why it took longer than expected.

Red flag: A timeline that does not include a data integration phase. If the vendor says deployment takes six weeks and none of that time is allocated to connecting to your data systems, they are either assuming you will do that work or they have not thought about it.

Question 11: What IT resources do we need to provide?

Why it matters: If the vendor needs a dedicated data engineer on your side for six months, a VPN tunnel to your SCADA network, and a Kubernetes cluster for model serving -- and your IT team is three people managing everything from email to ERP -- you have a resourcing problem. Get the IT requirements in writing before you sign. Compare them honestly against your internal capacity.

Red flag: "Minimal IT involvement required." This is almost never true. Push for a specific list of IT tasks, estimated hours, and required skill sets. If the vendor cannot provide this, they have not done enough deployments to know.

Question 12: Is this a standalone application, or does it integrate with our existing systems via API?

Why it matters: The best AI tool in the world is useless if it lives in a separate browser tab that no one opens after the first month. Integration with your existing workflows -- SCADA dashboards, production reporting systems, daily morning meetings -- is what determines whether the tool actually gets used. Ask whether the vendor provides APIs, webhooks, or native integrations with common oil and gas platforms.

The distinction between open and proprietary integration approaches matters here. Open standards like the Model Context Protocol (MCP) allow AI systems to connect to diverse data sources through standardized interfaces, rather than requiring custom integrations for each data system. Projects like petro-mcp demonstrate the open-source approach: the integration layer is transparent, auditable, and not locked to a single vendor. Compare this with proprietary integration layers where you cannot see how data flows through the system.

Red flag: "Our platform replaces your existing tools." Operators do not rip and replace production accounting systems or SCADA platforms for an AI overlay. A vendor that requires you to abandon existing systems is either naive about enterprise deployments or positioning for maximum lock-in.


Category 4: Results and Validation

This is where you separate vendors who have real deployments from vendors who have PowerPoint deployments. The questions in this section are uncomfortable -- and that is the point.

Question 13: Show me a case study from a company similar to mine -- similar size, similar basin, similar data maturity.

Why it matters: A case study from a supermajor with a dedicated data science team, a centralized data lake, and 10,000 wells tells you nothing about what the vendor can do for your 200-well operation with data in three different SCADA systems and no data engineer on staff. Relevance matters more than impressiveness.

For context on how company size and digital maturity affect AI readiness, see The Mid-Size Operator's Guide to AI: Where to Start, What to Skip.

Red flag: "We can't share specific client names due to NDAs, but trust us, it works." NDAs are real, but a vendor with successful deployments can share anonymized case studies with enough detail to be verifiable -- basin, well count, problem type, timeline, quantified results. If they cannot do this, they may not have the deployments they claim.

Question 14: What is the typical ROI timeline, and how do you measure it?

Why it matters: AI projects in oil and gas typically take 6-12 months to deliver measurable ROI, after the initial deployment and validation period. A vendor claiming ROI in 30 days is either defining ROI very loosely (cost avoidance on a single event) or being dishonest. Ask how they define ROI. Is it production uplift? Cost reduction? Avoided failures? And how is it measured -- against a control group, a baseline period, or a counterfactual model?

Red flag: "Our customers typically see 10x ROI in the first quarter." Extraordinary claims require extraordinary evidence. Ask for the methodology behind the ROI calculation. Often, vendor ROI numbers count the value of the most dramatic single event (one avoided ESP failure, one optimized well) and extrapolate across the entire asset, ignoring all the wells where the model had no impact.

Question 15: How do you validate that the model is actually working, not just fitting historical patterns?

Why it matters: Overfitting is the silent killer of ML projects. A model that perfectly matches historical data but fails on new data is worthless. Ask the vendor how they validate: out-of-sample testing, walk-forward validation, blind tests on wells the model has never seen? Ask what happens during the validation phase -- who decides the model is "good enough" to deploy?

Red flag: "Our model has 95% accuracy." Accuracy on what? Training data? A held-out test set from the same wells? A completely different set of wells? The number is meaningless without context. A vendor who leads with a single accuracy metric and cannot explain the validation methodology is either statistically unsophisticated or deliberately obscuring poor generalization.

Question 16: What happens when the model is wrong?

Why it matters: Every model is wrong sometimes. The question is whether the system fails gracefully -- flagging low-confidence predictions, alerting users to data quality issues, falling back to simpler methods -- or fails silently, producing confident-looking predictions that are quietly incorrect. In oil and gas, a wrong prediction can mean an unnecessary workover, a missed failure, or a safety incident.

Red flag: "Our model is rarely wrong." This is either a lie or a sign that the vendor is not monitoring model performance in production. Ask specifically: what is the false positive rate? The false negative rate? What is the process for detecting model degradation over time? Who is responsible for retraining?


Category 5: Commercial Terms and Support

The technical evaluation is only half the battle. Commercial terms determine whether a successful pilot becomes a long-term partnership or a costly dependency.

Question 17: What is your pricing model, and how does it scale?

Why it matters: AI vendor pricing in oil and gas varies wildly: per-well per-month, per-user, per-API-call, fixed platform fee, or some combination. Understand how costs scale as you add wells, users, or use cases. A product that costs $50,000 for a 50-well pilot may cost $500,000 when you roll it out to 500 wells -- or it may cost $75,000. The difference matters for your business case.

Red flag: "Let's discuss pricing after you see the demo." Pricing should be transparent early in the process. A vendor that hides pricing until you are emotionally committed to the product is using a sales tactic, not building a partnership. Also watch for pricing models that penalize data volume -- if the cost increases every time you add a SCADA tag, you will eventually have to choose between data coverage and budget.

Question 18: What does vendor lock-in look like, and what happens if we cancel?

Why it matters: This is the question most operators forget to ask until it is too late. If you cancel the contract, what do you keep? Can you export your data, your models, your dashboards, your custom configurations? Or does everything disappear? Some vendors build their products so that your data and workflows become inseparable from their platform. That is not a partnership; it is a dependency.

Red flag: "Your data is always yours." This sounds reassuring but means nothing without specifics. Ask: can I export the trained model? Can I export the feature engineering pipeline? Can I export the dashboards and reports in a usable format? If the answer to all three is no, you are renting, not buying -- and you should price that into your evaluation.

Question 19: Who owns the model that was trained on our data?

Why it matters: If a vendor trains a model on your production data from 200 wells in the Delaware Basin, does that model belong to you, the vendor, or both? Can the vendor use the model (or insights derived from your data) to serve your competitors? This is not a hypothetical concern -- some AI vendors explicitly aggregate client data to improve their models, which means your operational data may be contributing to a product that your neighbor is also using.

Red flag: "We use aggregated, anonymized data to improve our models." This is technically common but commercially significant. If the vendor is using your data to build better models for everyone, including your competitors in the same basin, you should at least be aware of it -- and potentially negotiate a pricing discount that reflects the value you are contributing.

Question 20: What does ongoing support look like, and what is the SLA?

Why it matters: AI models are not static. They degrade as operating conditions change, as new wells come online, as completions designs evolve. A vendor that deploys a model and walks away is selling software, not a solution. Ask about model monitoring, retraining frequency, support response times, and who is responsible when model performance degrades. And ask whether 45% of companies receiving zero AI training from their vendors -- a documented industry-wide problem -- applies to their customers.

Red flag: "We have a customer success team." Ask how many customers each customer success manager supports. If the answer is 50+, your "dedicated support" is a shared email queue. Ask for the SLA in writing: response time for critical issues, scheduled model reviews, retraining commitments, and what happens when your point of contact leaves the company.


How to Use This Checklist

Do not send these 20 questions to every vendor and ask them to fill out a spreadsheet. That produces marketing-approved answers that tell you nothing.

Instead:

  1. Use the first demo to listen. Let the vendor present. Take notes on what they show and -- more importantly -- what they do not show.
  1. Use the second meeting to ask. Pick the 8-10 questions from this checklist that are most relevant to your situation. Ask them in conversation, not as a written questionnaire. Watch for hesitation, deflection, and answers that start with "typically" (which means "it depends, and I don't want to tell you on what").
  1. Bring your data to the third meeting. Ask the vendor to run their model on your actual data, on your wells, with your data quality issues. This single step eliminates more bad vendors than all 20 questions combined.
  1. Check references independently. Ask the vendor for three client references -- then find two more on your own. Talk to operators at SPE events, PBOG meetups, or through your professional network. The references the vendor gives you are curated. The ones you find yourself are not.
  1. Run a paid pilot before a full commitment. A 90-day pilot on 20-50 wells, with clearly defined success metrics agreed upon before the pilot starts, is the only reliable way to evaluate an AI vendor. If the vendor will not agree to a pilot with defined exit criteria, that tells you something.

The Uncomfortable Truth

Most AI vendors in oil and gas are not trying to deceive you. They genuinely believe their product works. The problem is that "works" in a controlled demo environment and "works" in your operating environment are very different things. The gap between the two is filled with data integration challenges, change management resistance, IT security reviews, and the messy reality of field operations.

The vendors who will be your best long-term partners are the ones who acknowledge this gap honestly. They tell you what their product cannot do. They give you a realistic timeline that includes the hard parts. They show you case studies with warts included, not just the highlight reel. They welcome your toughest questions because they have already answered them at other operators.

The vendors who will burn you are the ones who make everything sound easy.

This checklist will not tell you which vendor to choose. But it will help you identify which vendors deserve a serious evaluation -- and which ones deserve a polite "thank you, we'll be in touch."

Your wells are too important, and your engineering team's time is too valuable, to learn these lessons the hard way.


Related Reading


Dr. Mehrdad Shirangi is the founder of Groundwork Analytics and holds a PhD from Stanford University in Energy Systems Optimization. He has been building AI solutions for the energy industry since 2018. He is also the creator of petro-mcp, an open-source MCP server for petroleum engineering data. Connect on X/Twitter and LinkedIn, or reach out at info@petropt.com.

Have questions about this topic? Get in touch.