Dataflow programming on the JVM

CloudKeeper is a domain-specific language and runtime system for implementing and running dataflows on the Java Virtual Machine. Designed to facilitate “programming in the large”, CloudKeeper is entirely general-purpose and abstracts away concerns such as data transfer, serialization, scheduling, checkpointing, and package/dependency management.

Learn more »

Programming in the Large. Declarative Style.

The functional units in CloudKeeper dataflows are called modules, and they have in- and out-ports. Orchestrate domain logic by simply instantiating modules and creating connections between ports.

Scalable. Write once, run everywhere.

Debug dataflows in a single JVM on a laptop, and deploy in the cloud – without changing a single line of code. CloudKeeper abstracts away low-level details such as serialization, data movement, check-pointing, scheduling, or dependency/package management.

@CompositeModulePlugin("Analyzes DNA")
public abstract class GenomeAnalysisModule
    extends CompositeModule<GenomeAnalysisModule> {
  public abstract InPort<FASTQ> dnaFragments();
  public abstract OutPort<PDF> report();

  AlignModule alignModule = child(AlignModule.class)
  StatsModule statsModule = child(StatsModule.class)
  ReportModule reportModule = child(ReportModule.class)

  { report().from(reportModule.pdf()); }

For Engineers. By Engineers.

Embed CloudKeeper dataflows into other software-engineering projects. Write dataflows textually in the CloudKeeper internal domain-specific language that inherits Java’s type system as well as its excellent IDE support.

Easy refactoring in IDE

Lightweight. Library, not a framework.

Use CloudKeeper as alternative to lower-level concurrency concepts such as threads, Java executor services, actor systems, etc. CloudKeeper is modular and versatile: Keep intermediate results as in-memory Java objects, in the file system, or in a cloud-storage service. Similarly, processing of individual tasks may be as different as using an existing thread pool or a distributed resource manager like Grid Engine.