Haskell developer by day, Haskell developer by night. Explorer of the uncharted territories. Father, husband, cats herder.
"Apache Spark™ is a fast and general engine for large-scale data processing."" Above statement is taken from Apache Spark welcome page. It's one of those definitions that, while describing the product in one sentence and being 100 % true, tell still little to the wondering noob.
Why take interest in Apache Spark? Apache Spark promise being up to 100x faster than Hadoop MapReduce in certain scenarios. It provide comprehensible programming model (familiar to everyone who is used to functional programming) and vast ecosystem of tools.
In my talk I will try to reveal secrets of Apache Spark for the very beginners.
We will do first quick introduction to the set of problems commonly known as BigData: what they try to solve, what are their obstacles and challenges and how those can be addressed. We will quickly take a pick on MapReduce: theory and implementation. We will then move to Apache Spark. We will see what was the main factor that drove its creators to introduce yet another large-scala processing engine. We will see how it works, what are its main advantages. Presentation will be mix of slides and code examples.Slides