Datasets in Mira Flows support Retrieval-Augmented Generation (RAG) to enhance flows with specific knowledge. This capability allows flows to leverage custom knowledge bases for improved accuracy and contextual awareness in their responses. 🤖📈


Dataset Attributes


ComponentDescriptionRequiredExample
NameUnique identifier for datasetYes"author/dataset_name"
DescriptionPurpose and content overviewYes"Optional description"
Source TypeType of data being addedYes"url" or "file_path"
Source PathLocation of source dataYes"example.com" or "path/to/file"
AuthorCreator’s usernameYes"your-username"

Supported File Formats


FormatDescriptionProcessing Method
PDF (.pdf)Document filesText extraction from document
Markdown (.md)Formatted textText extraction from document
URLWeb contentWeb content scraping
CSV (.csv)URL listingsURL extraction and content scraping
Text (.txt)Plain textDirect text extraction

Creating and Configuring Datasets

Creating a Dataset

Python
from mira_sdk import MiraClient

client = MiraClient(config={"API_KEY": "YOUR_API_KEY"})

# Create dataset
client.dataset.create("author/dataset_name", "Optional description")

Adding Data Sources

Python
# Add URL to your dataset
client.dataset.add_source("author/dataset_name", url="example.com")

# Add file to your dataset
client.dataset.add_source("author/dataset_name", file_path="path/to/my/file.csv")

Linking Dataset with Flow

Add the following configuration to your flow.yaml file:

.yaml
dataset:
  source: "author/dataset_name"