Skip to main content
Screen Shot2025 10 09at2 57 50PM Pn

Supported File Types

The 4Minds platform accepts the following file formats for dataset uploads:
  • Text files (.txt) - Plain text documents
  • Markdown files (.md) - Formatted text documents with markup
  • CSV files (.csv) - Comma-separated value spreadsheets
  • JSON files (.json) - Structured data in JSON format
  • Parquet files (.parquet) - Columnar storage format (not supported for Hugging Face imports)
  • PDF files (.pdf) - Portable document format files
  • Word documents (.docx) - Microsoft Word documents
  • Excel spreadsheets (.xlsx) - Microsoft Excel workbooks
  • JPEG images (.jpg, .jpeg) - Compressed image files
  • PNG images (.png) - Portable network graphics
  • GIF images (.gif) - Graphics interchange format
  • BMP images (.bmp) - Bitmap image files
  • TIFF images (.tiff) - Tagged image file format
  • ZIP archives - Compressed folders containing multiple files

Automatic OCR Processing

The 4Minds platform automatically extracts text from images and scanned documents using built-in Optical Character Recognition (OCR). This feature works seamlessly across all base models, no configuration required. When OCR is used:
  • PDF files with scanned or non-selectable text
  • Image files (JPG, PNG, TIFF, BMP, GIF) containing text
  • Documents with embedded images
How it works: When you upload files, our Reflex Router™ automatically detects content that requires OCR processing and extracts the text. The extracted content is then made available for model training and inference, just like any other text data. Key benefits:
  • Works with any base model you select for inline tuning
  • No manual configuration needed
  • Seamlessly integrated into the data processing pipeline
OCR accuracy depends on image quality and resolution. For best results, use clear, high-resolution scans.

Upload Size Limit

  • You can upload up to 100 MB of data at a time. This applies to single files, multiple files, or integration datasets. A progress bar will display the total upload size.
  • To upload more data, simply reopen the dataset and upload the next 100 MB batch.
    There is no limit on the overall dataset size, only on each individual upload batch.

Adding Data to Existing Datasets

As your business evolves, your model’s knowledge needs to evolve with it. Adding new data to existing datasets keeps your AI current and effective without starting from scratch.

Why Continuous Data Updates Matter

Maintain accuracy - Product features change, policies update, and new edge cases emerge. Without fresh data, your model provides outdated information that frustrates users and erodes trust. Capture new patterns - Each customer interaction reveals new ways people describe problems, ask questions, or use your product. Adding these examples helps your model understand diverse communication styles. Improve coverage - Initial training datasets rarely cover every scenario. As you discover gaps in your model’s knowledge, you can fill them by adding targeted data. Adapt to business changes - New products, services, pricing models, or support processes require corresponding updates to your training data.

How to Add New Data

From the Datasets tab

Screen Shot2025 11 18at7 31 17PM Pn
  1. Navigate to the Datasets tab
  2. Locate the dataset you want to update
  3. Click on the dataset to open it
  4. Click Upload Additional Data
  5. Choose your data source: Upload Files, Integrations, or URL.
  6. Upload your new data based on the selected source.
The platform automatically processes and integrates the new data into your knowledge graph

From the Model tab

Screen Shot2025 11 18at7 27 46PM Pn
  1. Navigate to Models tab
  2. Select the model you want to update with additional data.
  3. Click the three-dot menu (⋮) in the Actions column for the model you want to update with additional data.
  4. Click the Add Training Data button in the shortcuts section
  5. Click Upload Additional Files
  6. Choose your data source: Upload Files, Integrations, or URL.
  7. Upload your new data based on the selected source.
  8. The data is processed and added to your model’s knowledge graph automatically

From the Control Center

Screen Shot2025 11 18at7 32 14PM Pn
  1. Open your model in the Control Center
  2. Click the Add Training Data button in the shortcuts section
  3. Click Upload Additional Files
  4. Choose your data source: Upload Files, Integrations, or URL.
  5. Upload your new data based on the selected source.
  6. The data is processed and added to your model’s knowledge graph automatically

From the Playground tab

Screen Shot2025 11 20at2 03 02PM Pn
  1. Navigate to the Playground tab
  2. Select the model you want to interact with
  3. While querying or testing your model, click the + icon next to the message box
  4. Choose your data source: Attach File or Add URLs.
  5. Upload your new data based on the selected source.
  6. The data is automatically processed and integrated into your model’s knowledge graph
Adding data directly from the Playground is useful when you discover knowledge gaps during testing. You can immediately upload relevant information without leaving your testing workflow.
New data is automatically integrated into your existing knowledge graph. Nodes and edges update to reflect the new information without disrupting existing knowledge structures.

Best practices for Data Updates

Add incrementally - Rather than waiting to upload large batches, add new data regularly as it becomes available. This keeps your model current and makes it easier to track what information was added when. Document your updates - Keep notes on what data you added and why. This helps you understand model behavior changes and plan future updates. Test after updates - Use the Inference Model feature to verify that new data is being used correctly and hasn’t introduced conflicts with existing knowledge. Mix data types - When adding new information, include multiple formats when possible. For example, if you’re adding a new product feature, include documentation (PDF), example support tickets (CSV), and screenshots (images). Retrain when needed - After significant data additions, retrain your model to fully integrate the new knowledge. Minor updates may not require retraining, but substantial changes benefit from it.

Building Comprehensive Datasets

Training an effective AI model requires more than uploading a single file type. Just as you wouldn’t hire a customer support agent and only give them a product manual, your model needs diverse perspectives and contexts to develop true understanding.

Example: Training on Customer Support Excellence

Let’s say you want your model to handle customer support inquiries effectively. Here’s how to structure a robust, multimodal dataset using 4Minds’ supported formats: Visual understanding (images & screenshots) Upload visual content showing real customer interactions:
  • Product interfaces - Screenshots of your software, dashboard views, error messages, feature locations
  • Troubleshooting visuals - Common configuration issues, installation steps, system architecture diagrams
  • Documentation - Annotated screenshots showing workflows, setup guides, integration diagrams
  • Error states - What customers see when things go wrong, loading states, failure modes
  • Customer-submitted images - Photos of hardware issues, setup problems, packaging damage
  • Competitor products - Interface comparisons, feature differences, migration guides
Conceptual knowledge (PDFs & documents) Add comprehensive written content:
  • Product documentation - Technical specifications, API references, user guides, release notes
  • Internal knowledge bases - Troubleshooting playbooks, known issues, workaround procedures
  • Policy documents - SLA agreements, refund policies, terms of service, data privacy guidelines
  • Training materials - Onboarding docs for new support agents, escalation procedures, quality standards
  • Industry context - Regulatory compliance guides, security best practices, industry standards
  • Best practices - Customer service frameworks, communication guidelines, de-escalation techniques
  • Competitive intelligence - How competitors solve similar problems, market positioning, feature comparisons
Structured data (CSV & spreadsheet files) Include quantitative patterns and history:
  • Support ticket history - Ticket IDs, timestamps, issue categories, resolution times, customer satisfaction scores
  • Customer data - Account types, subscription tiers, usage patterns, feature adoption rates
  • Product usage analytics - Most-used features, error rates, session durations, drop-off points
  • Response metrics - First response time, resolution time, reopened tickets, escalation rates
  • Customer sentiment - NPS scores, CSAT ratings, survey responses, sentiment analysis results
  • Seasonal patterns - Ticket volume by time/day/season, spike events, capacity planning data
  • Agent performance - Resolution rates, customer satisfaction per agent, specialization areas
Communication history (email & chat logs) Provide real conversation examples:
  • Resolved tickets - Successful interactions showing problem identification and resolution
  • Escalated cases - Complex issues requiring multiple touchpoints or specialist involvement
  • Edge cases - Unusual requests, policy exceptions, creative problem-solving examples
  • Tone variations - Professional responses, empathetic communications, frustrated customer de-escalation
  • Multi-channel interactions - Email threads, chat transcripts, phone call summaries, social media responses
  • Follow-ups - Post-resolution check-ins, proactive outreach, account management communications
Audio (coming soon) Add dynamic training materials:
  • Call recordings - Customer support calls showing tone, pacing, active listening, problem resolution
  • Product demos - Video walkthroughs of features, setup processes, advanced use cases
  • Training sessions - Internal workshops, role-playing scenarios, best practice reviews
  • Customer feedback sessions - User interviews, usability testing, feature request discussions
Contextual business data (mixed formats) Round out understanding with operational context:
  • Product roadmap - Upcoming features, deprecation schedules, beta programs
  • Billing systems - Invoice examples, pricing tiers, renewal processes, refund workflows
  • Integration documentation - Third-party connections, API partnerships, data sync processes
  • Company information - Team structure, hours of operation, regional support coverage, contact escalation paths
  • Legal & compliance - GDPR requirements, data handling procedures, audit trails, security protocols

Why this Matters

When you combine these diverse data types, your model develops:
  • Contextual problem-solving that understands not just what the issue is, but why it matters and how it impacts the customer’s business
  • Tone awareness from seeing thousands of interactions, knowing when to be technical vs empathetic, formal vs conversational
  • Pattern recognition identifying common issues before customers fully describe them, predicting follow-up questions
  • Operational intelligence understanding SLAs, escalation paths, when to involve specialists, and business constraints
  • Proactive guidance suggesting solutions based on similar past cases, usage patterns, and product knowledge
A model trained only on product documentation would fail when a frustrated customer describes a problem in non-technical terms, or when an edge case requires policy interpretation. But a model trained with this comprehensive, multimodal approach develops the nuanced intelligence to handle real customer interactions effectively.

Tutorial: Fine-Tune a Model with Hugging Face Datasets

This tutorial walks you through importing datasets from Hugging Face to train a custom model in 4Minds.

What you’ll build

By the end of this tutorial, you’ll have a custom model trained on Hugging Face data that can:
  • Understand domain-specific terminology and concepts
  • Extract relevant information from your training data
  • Provide accurate, contextual responses to queries in your domain

Prerequisites

Fine-tuning overview

Fine-tuning allows you to customize base models for your specific use case by training them on your own data. The fine-tuning feature enables you to:
  • Create custom models tailored to your domain (e.g., financial analysis, customer support)
  • Train on proprietary datasets to improve accuracy for specific tasks
  • Deploy models via API or test them in the interactive Playground
  • Monitor performance metrics including response time, token speed, and success rate

Model status types

StatusDescription
ReadyModel is trained and available for use
Building GraphModel is currently being compiled (shows percentage progress)
TrainingModel is actively learning from training data
ArchivedModel is stored but not actively deployed

Selecting a base model

Base ModelParametersBest For
Phi14bLightweight tasks, faster inference
Gemma27bBalanced performance and capability
Nemotron70bComplex reasoning, highest accuracy

Training data best practices

  1. Provide diverse examples – Include variations of similar questions to improve generalization
  2. Maintain consistency – Use a consistent format and tone across all training samples
  3. Include edge cases – Add examples of boundary conditions and unusual queries
  4. Quality over quantity – 500 high-quality examples often outperform 5,000 poor ones

Step 1: Access the data upload screen

During the model creation process (Step 3 of 4), you’ll reach the Data Upload screen. Here you can choose how to provide training data to customize your model.
  1. Under Choose Data Source, ensure the Upload New Data tab is selected
  2. You have three options under Add Files from Sources:
    • Upload Files - Local files from your computer
    • Integrations - External data sources
    • URL - Import from a web address
  3. To import from Hugging Face, click the Integrations button

Step 2: Select Hugging Face integration

On the Select Integration screen, you’ll see a list of available data source integrations including Amazon S3, Azure Blob Storage, Google Cloud Storage, and others.
  1. Scroll down and select Hugging Face from the list
If you see “Not configured” next to an integration, you may need to set up credentials first via Configure Integrations at the top of the list.

Step 3: Search for your dataset

The Import from HuggingFace screen allows you to search the Hugging Face Hub for datasets.
  1. Enter your search query in the search bar (e.g., “finQA”)
  2. Click Search
  3. Browse the results using the available tabs:
    • Popular Datasets – Trending datasets on Hugging Face
    • My Datasets – Your personal Hugging Face datasets
    • Search Results – Results matching your query
Each dataset card displays helpful information including:
  • Dataset name and author
  • Description
  • Download count
  • Size and format
  • Task type and modality
Click on the dataset you want to import.

Step 4: Configure dataset import settings

On the Dataset Details screen, you can configure import settings for your selected dataset. Review the dataset information:
  • Name and author
  • Description
  • Download statistics
  • Task IDs, size, and format
Configure the following options:
  • Configuration – Select the dataset configuration (e.g., “Default”)
  • Split – Choose which data split to import (e.g., “Test”, “Train”, “Validation”)
When ready, click + Add Dataset to import the files.

Step 5: Review attached files

After importing, you’ll return to the Data Upload screen. Your imported files now appear under Attached Files with details including:
  • File name
  • Source (Hugging Face icon)
  • File size
  • Row count
For example, importing a dataset might result in files like:
  • relevance.jsonl – 66.68 KB, 341 rows
  • queries.jsonl – 137.71 KB, 705 rows
  • corpus.jsonl – 1.44 MB, 7549 rows
Rsync settings (optional) Enable Rsync Settings to automatically sync new files from your Hugging Face sources when you log in. This keeps your training data up to date. Click Next to proceed.

Step 6: Review and launch training

On the Review & Launch screen (Step 4 of 4), verify your configuration summary:
SettingValue
Use CaseYour selected use case
Base Modele.g., Phi-4-14B AWQ
Data FilesImported from Hugging Face
Rsync ConfigurationHuggingFace - All folders
PersonaYour selection or default
Deploymente.g., Cloud API
If everything looks correct, click Confirm & Train to start the training process.

Step 7: Monitor training progress

After launching, you’ll be taken to the Models dashboard in Control Center. Your new model will appear in the list with:
  • Status – “New” badge with “Building Graph” progress indicator
  • Parameters – Model size (e.g., 14b)
  • Base – Base model used (e.g., Phi)
  • Created – Timestamp
The status will update as training progresses through the pipeline. Once complete, the status will change to Ready.

Step 8: Test in the Playground

The Playground provides an interactive environment to evaluate your fine-tuned model before deployment. Accessing the Playground:
  1. From the model dashboard, click the menu on any model
  2. Select Run Model
  3. Or navigate to Control Center → Playground and select your model
Playground features:
  • Real-time responses – See model outputs as they generate
  • Conversation history – Maintain context across multiple turns
  • View Graph – Visualize model reasoning and token flow
  • Clear All Chats – Reset the conversation history
  • Add Model – Compare multiple models side-by-side
Example test queries for a financial analysis model:
  • “What is the ratio of operating income to total revenue?”
  • “What is the total of all lease obligations?”
  • “What was the percentage change in revenue from 2018 to 2019?”

Model actions

Access these options via the menu on any model:
ActionDescription
Run ModelOpen the model in the Playground for testing
API AccessView API endpoints and authentication details
Edit ModelModify model configuration and settings
Add Training DataUpload additional training examples
Full ScreenExpand the model view
DuplicateCreate a copy of the model with its settings
ArchiveMove to archived storage (can be restored)
DeletePermanently remove the model

API integration

Deploy your fine-tuned model via API for production use. Getting API credentials:
  1. Click on your model
  2. Select API Access
  3. Copy your API endpoint and authentication token
Example request:
curl -X POST https://api.4minds.ai/v1/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "customer-faq-expert",
    "prompt": "How do I reset my password?",
    "max_tokens": 500
  }'

Performance optimization

Improving response quality:
  1. Add more training data – Expand coverage of your use case
  2. Refine existing data – Remove low-quality or contradictory examples
  3. Adjust the persona – Use “Technical Expert” for specialized domains
  4. Choose appropriate model size – Larger models (70b) handle complex reasoning better
Improving speed:
  1. Use smaller base models – Phi (14b) offers faster inference
  2. Optimize prompt length – Shorter prompts reduce processing time
  3. Enable caching – Reuse responses for common queries

Troubleshooting

IssueSolution
Model stuck on “Building Graph”Large models may take longer; check progress percentage
Low success rateReview training data for errors or inconsistencies
Slow response timeConsider a smaller base model or optimize prompts
Inaccurate responsesAdd more diverse training examples

FAQs

Q: How long does training take? Training time depends on model size and dataset. Expect 30 minutes to several hours for large models. Q: Can I update a model after deployment? Yes, use “Add Training Data” to incrementally improve your model. Q: What’s the difference between Archive and Delete? Archived models can be restored; deleted models are permanently removed. Q: How many models can I have active? Check your plan limits in the account settings.

Supported file formats

4Minds supports the following file formats for training data from Hugging Face: BMP, CSV, DOCX, GIF, HTML, JPEG, JPG, JSON, JSONL, MD, ODT, PARQUET, PDF, PNG, TIFF, TSV, TXT, XLSX Multiple files are supported per upload.

Tips

  • Choose appropriate splits – For fine-tuning, you typically want the “Train” split. Use “Test” or “Validation” for evaluation datasets.
  • Check dataset size – Larger datasets may take longer to import and process.
  • Enable Rsync – If you’re working with frequently updated datasets, enable Rsync to stay current automatically.
For best results, combine the Hugging Face dataset with your organization’s proprietary documents. This creates a model that understands both general concepts and your specific business context.

Tutorial: Fine-Tune a Model with Hugging Face Datasets via API

This tutorial shows how to fine-tune a 4Minds model using the FinQA dataset from Hugging Face through the API. Since the API requires manual dataset uploads, you’ll download the dataset from Hugging Face and upload it to 4Minds.

What you’ll build

A custom model trained on financial Q&A data, created entirely through API calls, ideal for automation and CI/CD pipelines.

Prerequisites

  • A 4Minds account with API access
  • Your API key (found in Account Settings)
  • A Hugging Face account with a generated access token
  • Python 3.7+ with the requests and datasets libraries installed

Step 1: Download the FinQA dataset from Hugging Face

First, download the FinQA dataset locally using the Hugging Face datasets library:
from datasets import load_dataset
import json

# Load the FinQA dataset
dataset = load_dataset("ibm/finqa", split="train")

# Convert to JSON format for upload
data = [{"question": item["question"], "answer": item["answer"]} for item in dataset]

# Save to a local file
with open("finqa_training_data.json", "w") as f:
    json.dump(data, f, indent=2)

print(f"Saved {len(data)} training examples to finqa_training_data.json")

Step 2: Upload the dataset to 4Minds

Use the 4Minds API to create a dataset and upload your file:
import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.4minds.ai/api/v1"

headers = {
    "Authorization": f"Bearer {API_KEY}",
}

# Create a new dataset with the uploaded file
with open("finqa_training_data.json", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/user/dataset",
        headers=headers,
        files={"file": ("finqa_training_data.json", f, "application/json")},
        data={"name": "FinQA Training Data"}
    )

dataset_response = response.json()
dataset_id = dataset_response["id"]
print(f"Created dataset with ID: {dataset_id}")

Step 3: Create a model with the dataset attached

Now create a new model and attach your dataset for training:
# Create a new model with the dataset
model_payload = {
    "name": "Financial QA Assistant",
    "description": "Fine-tuned on FinQA dataset for financial question answering",
    "dataset_id": dataset_id
}

response = requests.post(
    f"{BASE_URL}/user/model",
    headers={**headers, "Content-Type": "application/json"},
    json=model_payload
)

model_response = response.json()
model_id = model_response["id"]
print(f"Created model with ID: {model_id}")
print(f"Training status: {model_response['status']}")

Step 4: Monitor training progress

Poll the API to check when training completes:
import time

while True:
    response = requests.get(
        f"{BASE_URL}/user/model/{model_id}",
        headers=headers
    )
    status = response.json()["status"]
    print(f"Training status: {status}")
    
    if status == "ready":
        print("Training complete!")
        break
    elif status == "failed":
        print("Training failed. Check the dashboard for details.")
        break
    
    time.sleep(30)  # Check every 30 seconds

Step 5: Test your model via API

Once training completes, send inference requests to your fine-tuned model:
# Send a test query
inference_payload = {
    "model_id": model_id,
    "message": "What was the revenue growth percentage year-over-year?"
}

response = requests.post(
    f"{BASE_URL}/user/model/{model_id}/inference",
    headers={**headers, "Content-Type": "application/json"},
    json=inference_payload
)

print("Model response:")
print(response.json()["response"])

Complete script

Here’s the full workflow in a single script:
from datasets import load_dataset
import requests
import json
import time

# Configuration
API_KEY = "your_api_key_here"
BASE_URL = "https://api.4minds.ai/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Step 1: Download FinQA from Hugging Face
print("Downloading FinQA dataset...")
dataset = load_dataset("ibm/finqa", split="train")
data = [{"question": item["question"], "answer": item["answer"]} for item in dataset]
with open("finqa_training_data.json", "w") as f:
    json.dump(data, f, indent=2)

# Step 2: Upload to 4Minds
print("Uploading dataset to 4Minds...")
with open("finqa_training_data.json", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/user/dataset",
        headers=headers,
        files={"file": ("finqa_training_data.json", f, "application/json")},
        data={"name": "FinQA Training Data"}
    )
dataset_id = response.json()["id"]

# Step 3: Create model
print("Creating model...")
response = requests.post(
    f"{BASE_URL}/user/model",
    headers={**headers, "Content-Type": "application/json"},
    json={
        "name": "Financial QA Assistant",
        "description": "Fine-tuned on FinQA for financial Q&A",
        "dataset_id": dataset_id
    }
)
model_id = response.json()["id"]

# Step 4: Wait for training
print("Waiting for training to complete...")
while True:
    response = requests.get(f"{BASE_URL}/user/model/{model_id}", headers=headers)
    status = response.json()["status"]
    if status == "ready":
        break
    time.sleep(30)

# Step 5: Test the model
print("Testing model...")
response = requests.post(
    f"{BASE_URL}/user/model/{model_id}/inference",
    headers={**headers, "Content-Type": "application/json"},
    json={"message": "What was the revenue growth percentage?"}
)
print(f"Response: {response.json()['response']}")
Store your API key in environment variables rather than hardcoding it. Use os.environ.get("FOURMINDS_API_KEY") for production scripts.