Data Extraction (OCR & Structured)
Turn unstructured documents into myDATA-ready data — automatically.
🧠 How TaxLayer Extracts Your Data
Once you upload a file, TaxLayer goes to work:
OCR reads PDFs and images (Greek + English supported).
AI interprets content into structured fields (VAT, totals, dates, parties, line items).
Confidence scoring flags uncertain extractions so you know what needs a manual check.
💡 Why it matters: No more manual typing or copy-pasting invoice data. Clean extraction means fewer myDATA rejections, faster validation, and a smoother submission process.
🪄 OCR Capabilities That Match Greek Business Reality
Greek + English support: Handles ΑΦΜ, Ημερομηνία, ΦΠΑ just as easily as “VAT” and “Invoice Date.”
PDFs, JPGs, PNGs: Both scanned and native documents work.
Smart correction: Fixes orientation, recognises multi-page docs, and optimises resolution.
💡 Why it matters: Most Greek invoices mix languages and formats. Our OCR is tuned for that complexity, so critical fields don’t get lost.
📋 Best Practices for Clean Extraction
For best results:
Scan at 300 DPI or higher (low quality scans = low confidence scores).
Keep documents upright (avoid sideways photos).
Avoid glare and shadows in mobile photos.
Keep files under 10MB for faster processing.
🗂️ What Gets Extracted
TaxLayer pulls out everything myDATA needs:
Headers & IDs: Invoice number, type, series, dates.
Amounts: Net, VAT, gross totals, currency detection, rounding checks.
Parties: Issuer + recipient VAT, name, address, country.
Line items: Descriptions, quantities, units, prices, discounts, VAT categories.
myDATA fields: Auto-suggests document types (1.1, 1.2, etc.) and classification codes.
💡 Why it matters: Getting these right up front prevents downstream schema errors and AADE rejections.
📊 Confidence Scoring Explained
Every field is graded for reliability:
✅ High (90–100%) → Safe to auto-accept.
⚠️ Medium (70–89%) → Worth a quick check.
❌ Low (<70%) → Needs manual review.
Factors that affect confidence: image clarity, text quality, document structure, language mix.
🔎 Smart workflow: High-confidence fields flow through automatically, while medium/low confidence ones are queued for review — saving time without sacrificing accuracy.
🧩 Smarter Matching with Context
Extraction isn’t just text recognition — TaxLayer also:
Matches VAT numbers against EU databases.
Links issuers/recipients to existing client/vendor records.
Suggests myDATA classifications based on history.
Flags odd values (e.g. €10,000 invoice from a small vendor).
💡 Why it matters: You don’t just get raw data — you get data enriched and pre-checked for compliance.
🛠️ When Extraction Needs a Human Touch
Even the best OCR has limits. Issues arise with:
Blurry scans or faxed copies.
Custom invoice layouts.
Mixed Greek/English fields with unusual terminology.
Totals that don’t add up.
Quick fixes:
Correct inline in the Document Detail view.
Use XLSX/CSV uploads for bulk structured data.
Re-scan documents at higher quality.
Let TaxLayer “learn” — every correction trains the system to get smarter next time.
🎯 Pro Tips for Maximum Accuracy
Work with vendors to standardise invoice layouts where possible.
Use structured formats (XLSX/CSV) when you have bulk data.
Correct consistently (same error, same fix) so the AI learns faster.
Update client/vendor records regularly — accurate VATs and addresses improve matching.
🔗 Related Features
Validation & Quality Control – What happens after extraction.
Batch Processing – Organize and monitor uploads.
Knowledge Management – Train the system on your company rules.
AI Chat – Ask “What fields are missing?” or “How do I fix these VAT mismatches?” for instant guidance.
Last updated