Dataset Attributes
| Component | Description | Required | Example |
|---|---|---|---|
| Name | Unique identifier for dataset | Yes | "author/dataset_name" |
| Description | Purpose and content overview | Yes | "Optional description" |
| Source Type | Type of data being added | Yes | "url" or "file_path" |
| Source Path | Location of source data | Yes | "example.com" or "path/to/file" |
| Author | Creator’s username | Yes | "your-username" |
Supported File Formats
| Format | Description | Processing Method |
|---|---|---|
| PDF (.pdf) | Document files | Text extraction from document |
| Markdown (.md) | Formatted text | Text extraction from document |
| URL | Web content | Web content scraping |
| CSV (.csv) | URL listings | URL extraction and content scraping |
| Text (.txt) | Plain text | Direct text extraction |
Creating and Configuring Datasets
Creating a Dataset
Python
Adding Data Sources
Python
Linking Dataset with Flow
Add the following configuration to yourflow.yaml file:
.yaml