Apache Spark and the Value of Momentum

As I wrote a few posts ago, I’m playing a fast game of catch-up in the big data technology arena.  My technical work didn’t intersect with big data until recently.  And, the landscape of tools is quite sprawling – Sqoop, Hive, Pig, HDFS, HBase, Cassandra, Flume, Impala, Spark, etc.  Somewhat comically, Zookeeper is one tool used to keep all these “animals” under control.  But, while rummaging around in the zoo, we stumbled upon what is one of the most compelling technological advances to hit mainstream open source software in several years – Apache Spark.  It works with big data, runs in large clusters, has query tools (SparkSQL),  machine learning (MLlib), stream processing (Spark Streaming) and is considered part of a big data tool set.  But, it enables new capabilities beyond just bigger/faster/more distributed.

Spark allows the rapid integration of and interactive analysis of big data that was previously confined to longer running batch jobs on top of the traditional (how many years are required to create tradition?) big data technology stack. Now, without derailing their train of thought, the data scientist, marketer, or business analyst can iterate through a series of related questions, quickly receiving answers, and changing their approach as necessary to understand their data as it starts to make sense to them.  Spark provides access to lots of data, and the processing of that data, now – not in 12-24 hour batch job cycles.

Programming is required – Java, Python, or Scala (which all the cool kids are doing).  This is great for development teams.  But it’s only a matter of time before  “value added” providers will wrap this and make big fast analytics available through a more declarative interface and in applications that may not even be obvious from the user interface.  Big data platform vendors such as Cloudera and Hortonworks are bundling Spark with their platforms.  Other companies such as Zoomdata are using it to speed up their analytics.  It seems that everyone I talk to in the industry is using Spark.  It’s solving real problems – big problems – and that’s having a snowball effect that we don’t see very often.  As more and more companies adopt Spark, they are contributing developers.  The growing developer community creates more features.  More features attracts more companies.  And, the cycle continues.

I was impressed the first time I sat down to build an iOS application in a way that I hadn’t experienced before – the possibilities were enormous.  And, we’ve seen that play out in the mobile software world.  Similarly, Spark expands the realm of possibility for big data.  This is going to be exciting.

Keep the Disciplined Pursuit of the Hedgehog Concept the Main Thing

I’m reading Greg Mckeown’s book “Essentialism:  The Disciplined Pursuit of Less”.  It is a great book espousing the value of focusing on what matters and gracefully, or even not-so-gracefully, saying no to everything else.  His HBR summary of the same topic is also a good read if you aren’t motivated to read a whole book.  The book is relatively new, but the idea is not.  Stephen Covey popularized the same idea in The Seven Habits of Highly Effective People with Habit 3 – Put First Things First.  Summarized by Dr. Covey in the catchy little phrase, “the main thing is to keep the main thing the main thing”, a follow-on book, First Things First, goes into even more detail on this one topic of prioritization.  Jim Collins identified this as a defining characteristic of great companies as well in his book “Good to Great” where Collins describes the Hedgehog Concept.  The hedgehog concept also can be summarized as focusing on those things which matter most – the intersection of what you are passionate about, what you can be the best at, and economic opportunity.

With so much evidence that focus and clarity directly impact results, both personally and corporately, we (people and companies) have no excuse for getting bogged down in the non-essential.  Don’t let the world around you distract you from the things of most value.  Keep the disciplined pursuit of the hedgehog concept the main thing.

My Last Mile Obsession

Good Eggs is yet another company trying to make a name for itself delivering products to your home.  I just saw they received more funding via CrunchBase (a great site for those of you interested in venture funded endeavors).  I wonder if anyone will survive this time around…

Predictive Analytics

It can be difficult to wrap your head around predictive analytics.  It usually involves lots of data, some math, and a good understanding of what you are looking for.  This post on Predictive Analytics by Tom Davenport at HBR.org is one of the best concise summaries of predictive analytics I’ve read.

Same-Day Delivery – Macy’s ?

I’m still fascinated by attempts to build a successful same-day delivery business model.  Perhaps the winner will be a surprise:


The winner will simply be the one who can execute on order fulfillment and delivery utilizing its local inventory.  Go Macy’s!

