Sam Ritchie archive about projects race results

Notable Open Source Projects

The following is a list of open source projects I've authored or co-authored.

Om-Bootstrap

Om-Bootstrap is a ClojureScript library of Bootstrap 3 components built on top of Om. This is my first big client side library! Definitely a change of scenery.

Summingbird

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations. So, while a word-counting aggregation in pure Scala might look like this:

def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
  source.flatMap { sentence =>
    toWords(sentence).map(_ -> 1L)
  }.foreach { case (k, v) => store.update(k, store.get(k) + v) }

Counting words in Summingbird looks like this:

def wordCount[P <: Platform[P]]
  (source: Producer[P, String], store: P#Store[String, Long]) =
    source.flatMap { sentence =>
      toWords(sentence).map(_ -> 1L)
    }.sumByKey(store)

The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in "batch mode" (using Scalding), in "realtime mode" (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.

Summingbird provides you with the primitives you need to build rock solid production systems.

Storehaus

Storehaus is a Scala library that makes it easy to work with asynchronous key-value stores.

Bijection

A Bijection is a function that can be inverted. Practically, in Scala, Bijections are used to tell the type system about equivalent concepts that may have been defined in different libraries (scala.Int vs java.lang.Integer, for example.) The ability to declare these equivalences is hugely valuable.

Injection, a related trait included in the library, is a function that can sometimes be inverted. (Your item might be able to convert to a byte array, but not all byte arrays can come back, for example.) Injection and Bijection turn out to be wonderful at describing serializations. We use the concept heavily in Summingbird and other distributed systems at Twitter.

Algebird

Algebird is an abstract algebra library for Scala. Algebird is designed with streaming aggregations in mind, and implements a number of types and combinators that are useful in a streaming mapreduce environment. The Monoid, for example, is a core concept of Summingbird, Twitter's streaming MapReduce library.

Here are some of the more exotic data structures in Algebird:

  • CountMinSketch
  • SketchMap
  • HyperLogLog
  • Stochastic Gradient Descent

Chill

Chill provides a number of enhancements to the Kryo JVM serialization library; notably, serializers for all scala primitives and collection types, and plugins that make it easy to use Kryo in Hadoop and Storm jobs. Scalding, Cascalog, Spark and many other projects use chill to manage serialization across their various distributed system implementations.

Tormenta

Tormenta provides a type-safe Scala DSL over Storm, along with scala-friendly implementations of Kafka, Kestrel and Twitter Streaming API spouts for Storm.

FORMA

The Forest Monitoring for Action (FORMA) project provides free and open forest clearing alert data derived from MODIS satellite imagery every 16 days beginning in December 2005. I was the lead developer of FORMA's Clojure codebase from January 2011 to mid-2012.

Large Contributions

Here are other people's projects I've contributed to in large ways.

Cascalog

Cascalog is a Datalog implementation in Clojure that compiles queries down to Hadoop jobs. I've maintained Cascalog since late 2011 and authored many core features and modules, including midje-cascalog and cascalog-contrib. I'm currently working on Cascalog 2, which will allow Cascalog's query language to compile down to targets other than Hadoop (like Spark or Storm).

Scalding

Scalding is a Hadoop DSL written in Scala. I've contributed a number of designs and constructs to the codebase; many of these can be found in the scalding-commons project. Some examples are

ElephantDB

ElephantDB is a distributed read-only key-value store designed to be populated by Hadoop. I maintained ElephantDB during the first half of 2012 and performed a major rewrite that went into production at Twitter for a time.

Pallet

Pallet is a cloud provisioning system written in Clojure. I contributed a Hadoop cluster deploy tool called pallet-hadoop.

iOS Games

I developed the following games for iOS:

Fork me on GitHub