Simple parallel dataflow programming library in Rust

[redacted] <[redacted]@cmu.edu>

[redacted] <[redacted]@andrew.cmu.edu>

View this document on the web at https://dayli.ly/15418-project.

This is our 15-418 (Fall 2025) Project Proposal. We plan to implement a single-node, in-memory, parallel dataflow programming library in Rust. Moreover, we are hoping to incorporate optimizations like work-stealing, operator pipelining, and task parallelism, in order to bring our implementation’s speedups closer to hand-written parallel code.

Background

Dataflow programming is a programming paradigm where programs are constructed by specifying a directed acyclic graph (DAG) of high-level operations, such as map, filter, and reduce, that data flows through.

The declarative nature of dataflow programming, as well as the inherent parallelizability of most of the primitive operations involved, makes it easy to implement massively parallel data transformations on big datasets without requiring the programmer to orchestrate the parallelism themselves.

Our goal is to develop a single-node, in-memory, parallel dataflow programming library in Rust. This involves designing the API and writing the execution engine of the dataflow graph.

Challenges

A naive parallel execution engine of a dataflow graph could work by traversing the DAG in topological order, statically splitting each node into parallel workloads before moving to the next. In practice, this has several problems:

Workload imbalance: since functions supplied to map, etc. are user-defined, it is impossible to always evenly divide the work in a static scheme.
Lack of pipelining: Each node must wait until the previous node produces all the output data, potentially leaving compute idle.
Lack of task parallelism: it does not execute independent tasks in parallel, also potentially leaving compute idle.
High overhead: compared to procedural programs, the graph traversal itself involves many indirections.

In our project, we will attempt to explore methods to address these problems and see if we can speed up parallel dataflow execution close to how procedural parallel programs perform.

Resources

We will be coding the project from scratch. However, in terms of libraries, rayon is the prevalent work-stealing parallel scheduler in Rust, and we will likely make use of it. rayon also includes a dataflow-like interface, which we will not make use of. Alternatively, we may write a centralized work queue ourselves with tunable task granularity.

There are plenty of dataflow programming systems in use in the industry, such as Apache Spark, Flink, and Beam, which we can use as references for designing both the API and the execution engine. However, it’s important to note that these are very complex systems that work from non-volatile storage and across many nodes, so they potentially have different design concerns from our project.

We will be working on conventional multicore CPUs. We anticipate our own computers to suffice for speedup evaluation, but access to machines of higher memory or core counts might provide better data.

Goals and deliverables

We consider the bare minimum of our project (75% goal) to include:

An API for dataflow programming consisting of at least map, filter, and reduce operations;
The implementation of a naive parallel execution engine as described above;
Speedup evaluation across workloads varying in their degrees of parallelism. This includes:
- Comparison with sequential programs;
- Comparison with hand-written parallel programs;
- Showing at least 2x speedup on 4 CPUs on compute-bound, massively parallel, and balanced workloads.

We consider a full implementation of our project (100% goal) to implement some or all of the following optimizations, alongside with measurement showing improved speedup over naive execution:

Using dynamic work distrbution like work queues or work-stealing to address workload imbalance;
Implement operator pipelining or fusion (if possible) for simple cases, such as map/map and map/filter;
Task parallelism in the executor;
Exploring implementation techniques that allows the Rust compiler to optimize away more indirections during graph execution.

Stretch goals (125% goal) include:

A variety of other operations in the API, including sorting, joining, and sampling;
Speedup on par with manually tuned procedural programs.

Platform choice

We have chosen Rust as our implementation language as it is:

Modern: Rust has language features that speed up development, like traits, algebraic datatypes, and closures.
Safe: Rust has mechanisms built into the type system to facilitate memory and concurrency safety.
Fast: Rust is a native language with manual memory management and aggressive optimization, which allows us to better measure the speedup of our implementation.

Schedule

Below is a tentative schedule for our project.

Week 1: plan the API;
Week 1-2: implement a naive parallel execution engine;
Week 2-3: perform initial speedup measurements;
Week 3-4: implement additional optimizations and monitor the speedup improvement;
Week 4: work on any of the stretch goals, should there be time.

Metadata

Id

nylon/proposal
(Language)

en-US
Title

Simple parallel dataflow programming library in Rust
Created

2025/11/17 20:17 (utc-4)
(Updated)

2025/11/17 20:17 (utc-4)
Attributes

delist
Banner

∅
Description

∅

Simple parallel dataflow programming library in Rust

Background§

Challenges§

Resources§

Goals and deliverables§

Platform choice§

Schedule§