Hadoop is one of the hottest buzzwords flying around these days, although most people don't know what it is. In simple terms, Hadoop is an open-source distributed data processing platform. This underlying architecture allows for large-scale data processing tasks to be broken down into chunks and distributed across multiple nodes. These nodes can be cheap low-cost clustered servers that create a cost-effective distributed platform. It makes use of a custom parallel processing language called Pig. Hadoop's most notable feature is its ability to process unstructured data (although it can process structured data types as well). The big analytics vendors (IBM, HP, EMC, and Microsoft) have all jumped in to create commercial versions of Hadoop, so it's safe to say we can expect big things (as well as a lot of hype) from Hadoop in the future.
If you have an interest in Big Data, this article is a must read. It provides a deep dive understanding of what Hadoop is and how it can be effectively applied.
Read the article.
(submitted by Judi)