Author: Ahsan Nabi Khan
Why Hadoop?
Some of the best resources at the flexibility of scale-in and scale-out.
There are times when we want to stick to one PC, our favourite one for processing chunks of data we are never cognizant of. There are times, when all we have is just one PC after all, and loads of emails and data crunching that would take years to sort out. Then there are times when we need to scale out: that is grow exponentially as a corporate sector. How do we jell well with technology fun when growing at breakneck speed? That’s where Hadoop Computational Power comes in.
More and more companies over the world are changing their software to Software-as-a-service paradigm. Still further, some are thinking of shifting their networking drags to the Infrastructure-as-a-service. And some geeky stuff has reached to platform scaling out. I mean platform-as-a-service. We are talking about Hadoop and Amazon MapReduce. Now we are talking some serious business.
As we know a lot of discussions on the complexity of pulling your data to distributed systems, there is of course a lot of expertise involved in Hadoop migrations. You need to assign CPUs, you need to take care of data messing all over the networks, and you need to gather the data output from reduced clusters of CPUs. There is a lot of planning going on over there. So why take the big shot? Why Hadoop anyway?
Hadoop processes understandably large bulk of input data utilizing CPUs that you may never have to buy on premises yourself. You got task. You hire hand. You hire tables, laptops, calculate, reduce results to something meaningful. Then you say good bye to the team. Hadoop is easier than that. All the work is done on Amazon, and there you have it: Some of the best resources at the flexibility of scale-in and scale-out. Pay-as-you-go is the slogan, and all you need is an account on Amazon.
Theoretically speaking, Hadoop has the power to outperform in costs the power of a 1000-CPU machine. So yes, the time for mainframe has given in to Hadoop clusters for everybody. Hadoop will take your input data in streams, divide the lines into chunks, map the chunk lines into mappers, distribute correct data chuck to the all reducers, perform local computation where the data resides, share the results for each computation chunk which has reduced the work into meaningful sub units of results together making one big picture after all. That’s the story, that is how it is.
So the big picture is that without you doing all the subtasks management, Hadoop breaks your mountain of data, maps them, reduces the computation and saves the local results that you can consult within minutes of your assigning. Without Hadoop, this would have taken months, so thanks to the technology you are now in the loop.