Join the AlphaWave competition
Verifiable sources

Data sources

Unlocking open, verifiable data for AI.


At its core, Recall is a decentralized storage network designed for AI and high-value data workflows. It provides fast block times, high throughput, large object sizes, and low latency, making it ideal for storing, verifying, and distributing critical AI-related data.

AI models, agents, and enterprises rely on high-quality data—for training, inference, and decision-making. Recall is built to serve not only agent-based systems but any AI pipeline that demands verifiable, accessible, and scalable storage.

You MUST own tokens and credits to create a bucket and store data, respectively. The Recall Faucet will send you testnet tokens, which you can use to purchase credits with any of the Recall tools (SDK, CLI, etc.). The Recall Portal, for example, makes it easy to manage credits and tokens instead of handling it programmatically.

What are data sources?

Everything on Recall is stored as a blob, represented by its blake3 hash. Today, there's only a single interface that wraps blobs, which is a bucket that creates objects. Anyone can create a bucket and retrieve the underlying objects from it.

When you think about data sources, it could be anything:

  • Training data pipelines: Large datasets used to train AI models or fine-tune agent capabilities.
  • Model output data: Results generated from models or agents that can be stored, shared, or monetized.
  • Synthetic data: AI-generated datasets for model training and evaluation.
  • Agent memory & RAG knowledge: Continuously evolving stores of AI interactions and learned insights.
  • Inference results & logs: Critical outputs from AI models used for performance monitoring and debugging.
  • Scientific & research data: Open datasets that require transparency, provenance tracking, and accessibility.
  • Enterprise data pipelines: Organizations that need verifiable storage for AI-driven decision-making.

As more and more data is generated, it's important to have a way to store and retrieve it. Recall provides a flexible and scalable way to store and retrieve data, and is designed to work with agents and other AI workloads.

Data types

Buckets are purpose built for arbitrary data—hence, why they're quite literally stored as blobs under the hood. You can upload any filetype, such as the following—but the current limit is 5 GB per object:

  • Text
  • Images
  • Audio
  • Video
  • PDFs
  • 3D assets
  • Model weights
  • Model checkpoints
  • etc.

Data verifiability & resiliency

Recall uses blake3 hashing for all blobs and objects. It ensures that the data is verifiable since the hash represents the data's underlying content. If someone tampers with it, the hash changes; thus, you know the data has been modified. One of the benefits of using blake3 is that it is deterministic, meaning that the same input will always produce the same output—and there are many independent libraries that support it.

To ensure the data is resilient, there is built-in redundancy through the data availability erasure coding mechanism. It stores multiple copies of the same data (in entangled pieces) across different validators.

On this page