What we do

spark-logo-trademark

We are the Apache Spark team that can make your applications run fast. We specialize in making Spark jobs execute with speed, efficiency, and elegance.

Speed

We understand the ins and outs of Apache Spark and can quickly profile your code to understand why things are taking so long. Confused by partitions and shuffles? Confounded by Spark’s extract, transform, and load (ETL) pipeline? Wondering why you have tasks that hang or never finish at all? Spooked by speculative execution? Ok, ok, you get the idea. Let us help!

 

Efficiency

Whether you’re deploying on a homegrown cluster, using Amazon Web Service’s Elastic MapReduce (AWS EMR) or a hosted solution like Databricks it can be extremely difficult to appropriately size your compute cluster to solve the task at hand. If you’re over- provisioned you are wasting money. If you’re under-provisioned you’re wasting time. We can help with this too.

 

Elegance

We know how hard it can be to get new team members up to speed with how you run your big data pipeline. We also know that code is never truly self-documenting and that your big data engineers don’t like making architecture diagrams. Let us help you make sure that your Spark code is readable, your architecture rational, and your team transitions seamless.