Wednesday, October 15, 2014

Book Review: Learning Spark: Lightning-Fast Big Data Analytics



Book Review:  Learning Spark: Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, and Matei Zaharia: Publisher- O'Reilly: ISBN- 13: 978-1449358624
 

Learning Spark: Lightning – Fast Big Data Analytics is still in Early Release phase and will be available in Feb 2015.  I have reviewed first seven chapters of the book which are still raw but coming neat and clean.

This book is very good introduction for newbies to Spark which is rage in Big Data domain. Book almost all samples in three languages – Java, Scala and Python which makes easier for lots of people to try them out and learn Spark.

The first chapter is just gives introduction while second chapter onward real fun starts. Chapter 2 lets you to install Spark on your laptop.  Chapters 3 to 6 talk about programming aspects of Spark. Chapter 7 is about Spark cluster.

I am expecting book will be good one.

Disclaimer: I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author. I got electronic format of book from publisher for review.


One can get more information about book and related topics from:

  1. Amazon: http://www.amazon.com/Learning-Spark-Lightning-Fast-Data-Analytics/dp/1449358624
  2. Publisher -- Oreilly http://shop.oreilly.com/product/0636920028512.do



For my High Schooler - Inheritance and Package



At dinner table.

Yash: Today in Java class, we were discussing about packages. Can you please explain?
Me: Sure. You can consider package as any house in a neighborhood.  Each house has its residents – humans, animals, and non-living things. If you scale down by one level, some members of the household have their own rooms while others share rooms.  You can compare those living house as Java Classes, Interfaces, and/or Enums.  Just the way a house may have rooms, a package may have sub-package – packages in a package.
Yash: I got this. But we were also taking about inheritance.
Me: It is pretty simple.
Yash: How?
Me: Let’s create a scenario.  You have parents – one biological mother and one biological father. Correct?
Yash: Correct.
Me: Let’s assume me and your mother separate.  I decide to marry again. Now you have two mothers – one biological and the other one will be your Step Mom.  It is possible that your step Mom may decide to separate after a while and I again marry someone else, now you have three mothers.
Yash: This is complex and crazy and I hope it never happens.
Me: Let’s make it a bit more complex. Your biological mother may also decide to marry someone else. It is also possible that I and one of your step moms may decide to have baby.  To make things worse, you may decide to have baby with one of my spouse’s kid at some point in your life.
Yash: Things may really get ugly here.
Me: Yep. This complexity exists because a baby has two parents. Now assume, if a baby has only one parent. This type of complexity cannot arise. It also allows as many babies as possible from a single parent.
Yash: True.
Me: To avoid this complexity, Java does not allow multiple parents. One parent is allowed.
Yash: Ok.  Who is baby and who is parent in Java?
Me:  In Java, class is the main player. To define relationship of a parent and a baby, we use the keyword – extends.
Yash: How?
Me:  It is simple.  
class B extends A(){
}
Here class A is parent while class B is baby.
In Java terminology, class A is “super class” while class B is “sub class”.

Yash: Hmm…. I suppose, if I want to say that class C is sub class of B then I should write like
                class C extends B(){
                }
Me: Fantastic!!!
Yash: And if class D is also a sub class of B then I can write:
                class D extends B(){
                }
Me: Super!!!
Yash: Does it mean we are creating a hierarchy like class A is super class of class B and classes C & D are sub classes of class B.
 
 


Me: Perfect. This whole concept is called inheritance.
Me: Let’s mix packages and inheritance.  As it is possible that your grandparents may live in a different house and I and you live in one house. It is also possible that classes of same hierarchy belong to different packages. Let’s modify the picture little bit.
 



Yash: So it is possible that a class can live in any package irrespective of the hierarchy.
Me: Yes.
Yash: I have few more questions.
Me: I think, this is good enough for today.
Yash: Okay! Good night.


Thursday, October 2, 2014

Book Review: Using Flume: Stream Data into HDFS and HBase


Book Review:  Using Flume: Stream Data into HDFS and HBase by Hari Shreedharan: Publisher- O'Reilly: ISBN- 13: 978-1449368302



Using Flume: Stream Data into HDFS and HBase is for developers as well as Administrators of Hadoop clusters.  In its first chapter book discusses HBase which is little puzzling but as book progresses, it takes you for  deep dive in various aspects of Flume.  Book covers Streaming of data, various sources, channels, sinks, interceptors, and other components of Flume.

The last chapter is about administration of Flume which is very short. This chapter might be little bit in depth to cover capacity planning, deployment options, etc.

Nevertheless, book is a good reference for any person playing in Hadoop playground.


Disclaimer: I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author. I got electronic format of book from publisher for review.

Further reading: Apache Flume: Distributed Log Collection for Hadoop (http://www.amazon.com/Apache-Flume-Distributed-Collection-Hadoop/dp/1782167919)


One can get more information about book and related topics from:

  1. Amazon: http://www.amazon.com/Using-Flume-Stream-Data-HBase/dp/1449368301
  2. Publisher -- Oreilly http://shop.oreilly.com/product/0636920030348.do

Friday, August 29, 2014

Book Review: Hadoop Operations



Book Review:  Hadoop Operations by Eric Sammer: Publisher- O'Reilly: ISBN- 13: 978-1449327057




Hadoop Operations by Eric Sammer is marvelous book which explains almost each bit of information is very lucid manner. AS name suggests, book is for operations guys - How data is ingested and replicated, or how MapReduce "finds" the most suitable
node to run parts of job, or what the cost and performance advantages are of adopting the shared-nothing, commodity hardware model recommended for Hadoop cluster, etc.


 

This book is for Operations guys/Administrators and as good supporting material to developers.

 

 

Disclaimer: I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author. I got electronic format of book from publisher for review.


Further reading: There are several books on similar topic. Hadoop Operations and ClusterManagement Cookbook and Hadoop Cluster Deployment.


One can get more information about book and related topics from:

 

  1. Amazon: http://www.amazon.com/Hadoop-Operations-Eric-Sammer/dp/1449327052
  2. Publisher -- Oreilly http://shop.oreilly.com/product/0636920025085.do

 

Friday, August 15, 2014

What Are Self-Organising Teams? - My Views



InfoQ is publishing a series of articles on Self Organizing teams by Sigi Kaltenecker  & Peter Hundermark. This is three part series. First part is titled as “What Are Self-Organising Teams?”.  This is one of the finest I have ever encountered. But I have my own doubts about the details provided in the article.
1.       Article quotes Richard Hackman & Authority Matrix published in 2001-02. Is Authority Matrix granular enough? Consider a simple example. The third element on the horizontal axis of Authority Matrix is “Self designing teams”. If we closely scrutinize this element we will notice that it can be broken into two only by varying one factor – Selection of the members for the team. This authority may be within the team or may rest with outside authority (Think of hiring contractors, outsourcing, offshoring, etc.). I am sure you can think of more factors. In my view instead to limiting human/team behavior in a few boxes, it should be represented as some type of continuum.
2.       Article talks about all the advantages of Self Organizing teams but fails to highlight the dangers of Groupthink. Two recent examples of Groupthinks in self-organizing teams are– Indian Judicial appointments by Collegium and Bush Administration.  If there are pit falls in self-governing teams, there should be proper checks and balances in place.