- →What "BOM extraction" actually means
- →The OCR layer is the boring part
- →The hard part: mapping descriptions to part numbers
- →A real example
- →Edge cases that break naive BOM extraction
Every mechanical engineer has done it. You receive a drawing pack — sometimes a clean PDF set, sometimes a stack of scans with handwritten annotations — and you have to produce a bill of materials. You sit down with SOLIDWORKS or AutoCAD open on one monitor, your ERP part-master on the other, and you copy line items one by one. For a moderately complex assembly, this takes a working day.
This is the workflow that AI eats first, because it is high-volume, structured-output, and the accuracy requirements are auditable. Let us look at how getting a bill of materials from drawing inputs actually works in production, what fails, and what the realistic time savings look like.
What “BOM extraction” actually means
There are three flavors of BOM extraction and people use the term sloppily:
- BOM-block extraction. The drawing has a BOM table embedded as part of the drawing sheet. The AI reads the table cells and emits a structured list. This is the easiest case — it is essentially OCR with table-structure understanding.
- Callout-driven extraction. The drawing has balloon callouts pointing to parts in the views, and an associated parts table. AI must associate the balloon number with the table row, then extract the part description, material, and quantity.
- No-BOM-block extraction. The drawing shows parts in views with annotated dimensions and notes but no formal BOM table. AI must infer parts from views, identify standard hardware (bolts, washers, gaskets) by visual recognition, and assemble a BOM from scratch. This is the hardest.
Most real engineering drawing packs are a mixture. A top-level assembly drawing has a BOM block. The detail drawings underneath do not. The customer sends a P&ID with a sketchy equipment list. You need to extract a complete bill of materials from drawing inputs across all three.
The OCR layer is the boring part
For structured BOM blocks, OCR has been good enough since around 2020. Tesseract on a clean scan hits 99 percent character accuracy. Modern transformer-based OCR (Donut, TrOCR) hits 99.5+ on the same data. This is solved.
What is not solved is everything that happens after OCR.
The hard part: mapping descriptions to part numbers
A real BOM line in an engineering drawing looks like this:
4 GASKET, SPIRAL WOUND, 4" 150# RF, SS316/GRAFOIL GS-104-150-SS316 8 EA
The ERP part-master has a corresponding entry:
Part Number: GSK-04-150-SS316-GRF
Description: Gasket Spiral Wound 4 inch 150# Raised Face SS316 Graphite Filler
UOM: EA
Material: SS316
The drawing wrote GS-104-150-SS316. The ERP wants GSK-04-150-SS316-GRF. Same physical part, different identifier. The drawing description is shorthand; the ERP description is canonical. A junior engineer would spend two minutes on Google plus a vendor catalog to verify. Times 200 line items, that is your day.
This is where AI earns its place. The model does fuzzy matching across description fields, weighs material code, weighs size and rating, and emits a candidate ERP part number with a confidence score. A typical implementation looks like:
for line in extracted_bom_lines:
candidates = part_master.search(
keywords=tokenize(line.description),
size=line.size,
material=line.material,
class_rating=line.rating
)
ranked = ai_rerank(candidates, line)
if ranked[0].confidence > 0.92:
line.part_number = ranked[0].part_number
line.review_required = False
else:
line.candidates = ranked[:3]
line.review_required = True
The engineer reviews only the lines flagged as review_required. On a clean dataset that is typically 10 to 20 percent of line items. On a legacy scan with bad ink it can be 40 percent. Either way, the engineer is reviewing 20 to 80 lines instead of typing all 200 from scratch.
A real example
We tested this workflow against a 312-line BOM extracted from a chemical-plant skid drawing pack. Manual reconstruction took a senior designer 4 hours 40 minutes. The AI-assisted workflow:
- 3.2 minutes to OCR and structure-detect across 14 sheets.
- 1.8 minutes to run part-master matching.
- 22 minutes for the engineer to review 47 flagged lines (15 percent of total).
- 4 minutes to commit to ERP via API.
Total: 31 minutes versus 4 hours 40 minutes. Net accuracy after review was 99.7 percent (one part-number disagreement caught downstream during procurement). The starting accuracy before review was 94.1 percent.
This is what “AI eliminates 4 hours of lookup work” actually means. It is not magic. It is a 30-minute workflow replacing a 4-hour workflow, with a human still in the loop.
Edge cases that break naive BOM extraction
Be honest about the failure modes:
- Hand-written annotations. A revision to a drawing was added by hand at the construction site. OCR fails on cursive. Solution: flag any region with handwriting for human review, do not silently skip.
- Quantity inferred from a pattern. “4X” beside a bolt circle means four of those bolts. The AI must understand the geometric pattern, not just read the text.
- “AS REQ’D” or “AR” quantities. Common on weld filler, sealant, lubricant. The ERP may need a specific quantity. The AI should flag these for engineering judgment.
- Substitution lists. “USE PART A OR EQUIVALENT.” The AI should not silently pick A; it should flag the substitution clause.
- Multi-language drawings. A drawing with English title block and German part descriptions is real, especially in semiconductor equipment built in Saxony.
- Stacked revisions. A drawing with REV 0, REV A, and REV B BOM blocks visible because the older blocks were not deleted. The AI must use only the latest revision block.
A tool that produces a bill of materials from drawing inputs without flagging these cases will quietly give you wrong answers, and the engineering team will lose trust within two weeks.
ERP integration
The extracted BOM is only useful if it lands in your ERP correctly. The integration paths in practice:
- SAP. BOM creation via BAPI
BAPI_MATERIAL_BOM_GROUP_CREATEor via S/4HANA OData services. The AI tool should emit a structured payload and let SAP enforce part-master integrity. - Oracle E-Business Suite / Fusion. Open Interface tables or REST endpoints. Same pattern: AI proposes, ERP validates.
- Teamcenter PLM. BOM upload via Teamcenter Integration Framework. Common in aerospace and semiconductor equipment shops.
- Spreadsheet handoff. The pragmatic option for smaller shops: emit a CSV that maps onto the ERP import template. Less elegant but works.
The key is that the ERP is the source of truth for part numbers, not the drawing. The AI’s job is to bridge the drawing’s description to the ERP’s identifier with explicit confidence and explicit review.
Where DrawingDiff fits
DrawingDiff treats BOM extraction as one half of a larger workflow — the other half is detecting BOM changes between revisions, which is a separate article. The principle is the same: AI extracts structure, the engineer reviews flagged items, the ERP commits the canonical record.
A note on multi-level BOMs
Most real engineering work involves multi-level bills, not flat ones. A skid has a top-level BOM with 30 line items, each pointing to a sub-assembly with its own BOM. Ten levels deep is not unusual on a complex equipment package.
Getting a multi-level bill of materials from drawing inputs requires the AI to understand parent-child relationships, not just to read individual sheets. The patterns that work:
- Each sub-assembly drawing has its own BOM block; the AI extracts each one independently and links them via the parent-level callout.
- When a sub-assembly references a part that also appears at the top level, the AI must respect the deduplication rule used by the company’s ERP.
- Phantom assemblies (logical groupings that do not have a physical sub-assembly drawing) need explicit handling.
- Customer-supplied items (CSI) and free-issue parts must be tagged separately from purchased parts.
The best AI BOM tools surface a tree view of the multi-level structure for engineering review, not just a flat line-item list. Reviewing 312 line items in a flat view is exhausting; reviewing the same data as a 6-level tree with parent-child relationships is tractable.
Material certification chains
For pressure-retaining equipment under ASME, PED, or CRN, the bill of materials from drawing data must support the material certification chain. Every pressure-retaining component needs traceability to a heat number, mill cert, and inspection record. The BOM line item is the entry point to that chain.
A mature BOM extraction tool emits not just the part number and quantity but the certification class, traceability requirements, and the downstream documentation needed. This is the difference between a BOM useful for procurement and a BOM useful for fabrication and code compliance. The first is easy; the second is where most tools stop short.
What this means for you
- Audit how much of your engineering team’s time is spent on BOM lookup. At most equipment OEMs the answer is 8 to 15 percent — invisible because it is spread across many people.
- Do not trust any BOM extraction tool that does not produce per-line confidence scores and review flags.
- The bottleneck is part-master quality. A clean, well-maintained ERP part-master is what makes AI BOM extraction work; a messy one will defeat any tool.
- Pilot on a single drawing pack with known-correct BOM. Compare AI output line-by-line. The miss patterns tell you whether the tool is ready for your workflow.
NeuroBox D generates native SolidWorks 3D assemblies from P&ID in 4 hours. Auto BOM, zero errors.
Book a Demo →See how NeuroBox reduces trial wafers by 80%
From Smart DOE to real-time VM/R2R — our AI runs on your equipment, not in the cloud.
Book a Demo →