Paweł Szulc

ready to expect the unexpected

Haskell developer by day, Haskell developer by night. Explorer of the uncharted territories. Father, husband, cats herder.

"Apache Spark™ is a fast and general engine for large-scale data  processing."" Above statement is taken from Apache Spark welcome page.  It's one of those definitions that, while describing the product in one  sentence and being 100 % true, tell still little to the wondering noob. 

Why  take interest in Apache Spark? Apache Spark promise being up to 100x  faster than Hadoop MapReduce in certain scenarios. It provide  comprehensible programming model (familiar to everyone who is used to  functional programming) and vast ecosystem of tools. 

In my talk I  will try to reveal secrets of Apache Spark for the very beginners.

We  will do first quick introduction to the set of problems commonly known  as BigData: what they try to solve, what are their obstacles and  challenges and how those can be addressed. We will quickly take a pick  on MapReduce: theory and implementation.  We will then move to  Apache Spark. We will see what was the main factor that drove its  creators to introduce yet another large-scala processing engine. We will  see how it works, what are its main advantages.  Presentation will be mix of slides and code examples.

Slides
Video ←Back