Chuck Yarbrough, Pentaho | Hadoop Summit 2016 San Jose

Published: 01 July 2016
on channel: SiliconANGLE theCUBE

01. Chuck Yarbrough, Pentaho, Visits #theCUBE!. (00:19)
02. Talk About The Changes Since The Merge With Hitachi. (00:40)
03. What Is The Next Chapter For Pentaho. (02:01)
04. What Does The Solution Look Like For Pipeline Analytics. (03:33)
05. Explain What Fill In The Data Lake Means. (05:17)
06. Do Some Of The New Data Types That Don't Have A Process. (09:21)
07. What About Getting Data Out. (10:47)
08. Is Data Cleanliness A Big Part Of This. (11:30)
09. Is The Lake Size An Issue. (16:22)
10. Explain What You're Working On At Pentaho. (17:24)

Track List created with http://www.vinjavideo.com.
--- ---
Don’t let your data lake turn into a data swamp | #HS16SJ
by Nelson Williams | Jun 28, 2016

Data does not move easily. This truth has plagued the world of Big Data for some time and will continue to do so. In the end, the laws of physics dictate a speed limit, no matter what else is done. However, somewhere between data at rest and the speed of light, there are many processes that must be performed to make data mobile and useful. Integrating data and managing a data pipeline are two of these necessary tasks.

To shed some light on the world of data preparation, John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, visited the Hadoop Summit US 2016 event in San Jose, California. There, they sat down with Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho (A Hitachi Group Company).

Managing the data pipeline

The discussion started with a look at Pentaho and what they do. Yarbrough took the hosts through a tour of the company’s history, saying that early on the founders looked at what data analytics was all about and what it would become. Its idea was to do data integration and do it right to prepare data for the analytic process. It had a vision to manage the entire data pipeline for an analytic purpose.

Yarbrough then explained the solution, stating that Pentaho enables high-scale, complex use cases that require the entire pipeline. That data can be highly varied, coming in from all over the place. Blending and processing that varied data on the fly is the key, and that’s where Pentaho delivers value.
Keeping the data lake clean

Throwing a bunch of data into one place creates a data lake, but if that information isn’t managed, the lake becomes a swamp. Yarbrough asked how does a company manage that data at scale? One load is simple, but 6,000 loads is something else. He described how Pentaho manages that data by leveraging the concept of metadata injection and making processes dynamic.

“Manage what you’re doing,” he said.

Yarbrough then stressed that it always comes down to use cases, what the company is trying to do with its data. Customers want to take data from their lakes and format it into something different. The blueprint Pentaho produced does just that, simplifying the process and allowing large, at-scale data movement.

#HS16SJ
#theCUBE

Watch video Chuck Yarbrough, Pentaho | Hadoop Summit 2016 San Jose online without registration, duration hours minute second in high quality. This video was added by user SiliconANGLE theCUBE 01 July 2016, don't forget to share it with your friends and acquaintances, it has been viewed on our site 89 once and liked it 3 people.

474