
GARTNER says
“ A data lake is a concept consisting of a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format and are in addition to the originating data stores. ”
While much has been said and written about the data lake, in the end, the concept has stayed pretty simple. The idea has been to use Hadoop as a place that data of all types could be stored in greater detail than ever before, at an affordable cost, and then used to power the existing data warehouse ecosystem and also perform new types of analytics.
James Dixon, the CTO of Pentaho and the creator of the term “data lake”, presents a challenge to the big data community in his blog “Union of the State – A Data Lake Use Case”. Dixon argues that it is time to start figuring out how to make the data lake a time machine for a business.