Skip to main content
Screen Shot2025 10 09at2 57 50PM Pn

Supported File Types

The 4Minds platform accepts the following file formats for dataset uploads:
  • Text files (.txt) - Plain text documents
  • Markdown files (.md) - Formatted text documents with markup
  • CSV files (.csv) - Comma-separated value spreadsheets
  • JSON files (.json) - Structured data in JSON format
  • Parquet files (.parquet) - Columnar storage format (not supported for Hugging Face imports)
  • PDF files (.pdf) - Portable document format files
  • Word documents (.docx) - Microsoft Word documents
  • Excel spreadsheets (.xlsx) - Microsoft Excel workbooks
  • JPEG images (.jpg, .jpeg) - Compressed image files
  • PNG images (.png) - Portable network graphics
  • GIF images (.gif) - Graphics interchange format
  • BMP images (.bmp) - Bitmap image files
  • TIFF images (.tiff) - Tagged image file format
  • ZIP archives - Compressed folders containing multiple files

Automatic OCR Processing

The 4Minds platform automatically extracts text from images and scanned documents using built-in Optical Character Recognition (OCR). This feature works seamlessly across all base models, no configuration required. When OCR is used:
  • PDF files with scanned or non-selectable text
  • Image files (JPG, PNG, TIFF, BMP, GIF) containing text
  • Documents with embedded images
How it works: When you upload files, our Reflex Router™ automatically detects content that requires OCR processing and extracts the text. The extracted content is then made available for model training and inference, just like any other text data. Key benefits:
  • Works with any base model you select for inline tuning
  • No manual configuration needed
  • Seamlessly integrated into the data processing pipeline
OCR accuracy depends on image quality and resolution. For best results, use clear, high-resolution scans.

Upload Size Limit

  • You can upload up to 100 MB of data at a time. This applies to single files, multiple files, or integration datasets. A progress bar will display the total upload size.
  • To upload more data, simply reopen the dataset and upload the next 100 MB batch.
    There is no limit on the overall dataset size, only on each individual upload batch.

Adding Data to Existing Datasets

As your business evolves, your model’s knowledge needs to evolve with it. Adding new data to existing datasets keeps your AI current and effective without starting from scratch.

Why Continuous Data Updates Matter

Maintain accuracy - Product features change, policies update, and new edge cases emerge. Without fresh data, your model provides outdated information that frustrates users and erodes trust. Capture new patterns - Each customer interaction reveals new ways people describe problems, ask questions, or use your product. Adding these examples helps your model understand diverse communication styles. Improve coverage - Initial training datasets rarely cover every scenario. As you discover gaps in your model’s knowledge, you can fill them by adding targeted data. Adapt to business changes - New products, services, pricing models, or support processes require corresponding updates to your training data.

How to Add New Data

From the Datasets tab

Screen Shot2025 11 18at7 31 17PM Pn
  1. Navigate to the Datasets tab
  2. Locate the dataset you want to update
  3. Click on the dataset to open it
  4. Click Upload Additional Data
  5. Choose your data source: Upload Files, Integrations, or URL.
  6. Upload your new data based on the selected source.
The platform automatically processes and integrates the new data into your knowledge graph

From the Model tab

Screen Shot2025 11 18at7 27 46PM Pn
  1. Navigate to Models tab
  2. Select the model you want to update with additional data.
  3. Click the three-dot menu (⋮) in the Actions column for the model you want to update with additional data.
  4. Click the Add Training Data button in the shortcuts section
  5. Click Upload Additional Files
  6. Choose your data source: Upload Files, Integrations, or URL.
  7. Upload your new data based on the selected source.
  8. The data is processed and added to your model’s knowledge graph automatically

From the Control Center

Screen Shot2025 11 18at7 32 14PM Pn
  1. Open your model in the Control Center
  2. Click the Add Training Data button in the shortcuts section
  3. Click Upload Additional Files
  4. Choose your data source: Upload Files, Integrations, or URL.
  5. Upload your new data based on the selected source.
  6. The data is processed and added to your model’s knowledge graph automatically

From the Playground tab

Screen Shot2025 11 20at2 03 02PM Pn
  1. Navigate to the Playground tab
  2. Select the model you want to interact with
  3. While querying or testing your model, click the + icon next to the message box
  4. Choose your data source: Attach File or Add URLs.
  5. Upload your new data based on the selected source.
  6. The data is automatically processed and integrated into your model’s knowledge graph
Adding data directly from the Playground is useful when you discover knowledge gaps during testing. You can immediately upload relevant information without leaving your testing workflow.
New data is automatically integrated into your existing knowledge graph. Nodes and edges update to reflect the new information without disrupting existing knowledge structures.

Best practices for Data Updates

Add incrementally - Rather than waiting to upload large batches, add new data regularly as it becomes available. This keeps your model current and makes it easier to track what information was added when. Document your updates - Keep notes on what data you added and why. This helps you understand model behavior changes and plan future updates. Test after updates - Use the Inference Model feature to verify that new data is being used correctly and hasn’t introduced conflicts with existing knowledge. Mix data types - When adding new information, include multiple formats when possible. For example, if you’re adding a new product feature, include documentation (PDF), example support tickets (CSV), and screenshots (images). Retrain when needed - After significant data additions, retrain your model to fully integrate the new knowledge. Minor updates may not require retraining, but substantial changes benefit from it.

Building Comprehensive Datasets

Training an effective AI model requires more than uploading a single file type. Just as you wouldn’t hire a customer support agent and only give them a product manual, your model needs diverse perspectives and contexts to develop true understanding.

Example: Training on Customer Support Excellence

Let’s say you want your model to handle customer support inquiries effectively. Here’s how to structure a robust, multimodal dataset using 4Minds’ supported formats: Visual understanding (images & screenshots) Upload visual content showing real customer interactions:
  • Product interfaces - Screenshots of your software, dashboard views, error messages, feature locations
  • Troubleshooting visuals - Common configuration issues, installation steps, system architecture diagrams
  • Documentation - Annotated screenshots showing workflows, setup guides, integration diagrams
  • Error states - What customers see when things go wrong, loading states, failure modes
  • Customer-submitted images - Photos of hardware issues, setup problems, packaging damage
  • Competitor products - Interface comparisons, feature differences, migration guides
Conceptual knowledge (PDFs & documents) Add comprehensive written content:
  • Product documentation - Technical specifications, API references, user guides, release notes
  • Internal knowledge bases - Troubleshooting playbooks, known issues, workaround procedures
  • Policy documents - SLA agreements, refund policies, terms of service, data privacy guidelines
  • Training materials - Onboarding docs for new support agents, escalation procedures, quality standards
  • Industry context - Regulatory compliance guides, security best practices, industry standards
  • Best practices - Customer service frameworks, communication guidelines, de-escalation techniques
  • Competitive intelligence - How competitors solve similar problems, market positioning, feature comparisons
Structured data (CSV & spreadsheet files) Include quantitative patterns and history:
  • Support ticket history - Ticket IDs, timestamps, issue categories, resolution times, customer satisfaction scores
  • Customer data - Account types, subscription tiers, usage patterns, feature adoption rates
  • Product usage analytics - Most-used features, error rates, session durations, drop-off points
  • Response metrics - First response time, resolution time, reopened tickets, escalation rates
  • Customer sentiment - NPS scores, CSAT ratings, survey responses, sentiment analysis results
  • Seasonal patterns - Ticket volume by time/day/season, spike events, capacity planning data
  • Agent performance - Resolution rates, customer satisfaction per agent, specialization areas
Communication history (email & chat logs) Provide real conversation examples:
  • Resolved tickets - Successful interactions showing problem identification and resolution
  • Escalated cases - Complex issues requiring multiple touchpoints or specialist involvement
  • Edge cases - Unusual requests, policy exceptions, creative problem-solving examples
  • Tone variations - Professional responses, empathetic communications, frustrated customer de-escalation
  • Multi-channel interactions - Email threads, chat transcripts, phone call summaries, social media responses
  • Follow-ups - Post-resolution check-ins, proactive outreach, account management communications
Audio (coming soon) Add dynamic training materials:
  • Call recordings - Customer support calls showing tone, pacing, active listening, problem resolution
  • Product demos - Video walkthroughs of features, setup processes, advanced use cases
  • Training sessions - Internal workshops, role-playing scenarios, best practice reviews
  • Customer feedback sessions - User interviews, usability testing, feature request discussions
Contextual business data (mixed formats) Round out understanding with operational context:
  • Product roadmap - Upcoming features, deprecation schedules, beta programs
  • Billing systems - Invoice examples, pricing tiers, renewal processes, refund workflows
  • Integration documentation - Third-party connections, API partnerships, data sync processes
  • Company information - Team structure, hours of operation, regional support coverage, contact escalation paths
  • Legal & compliance - GDPR requirements, data handling procedures, audit trails, security protocols

Why this Matters

When you combine these diverse data types, your model develops:
  • Contextual problem-solving that understands not just what the issue is, but why it matters and how it impacts the customer’s business
  • Tone awareness from seeing thousands of interactions, knowing when to be technical vs empathetic, formal vs conversational
  • Pattern recognition identifying common issues before customers fully describe them, predicting follow-up questions
  • Operational intelligence understanding SLAs, escalation paths, when to involve specialists, and business constraints
  • Proactive guidance suggesting solutions based on similar past cases, usage patterns, and product knowledge
A model trained only on product documentation would fail when a frustrated customer describes a problem in non-technical terms, or when an edge case requires policy interpretation. But a model trained with this comprehensive, multimodal approach develops the nuanced intelligence to handle real customer interactions effectively.