This talk will describe the work being done to create connectors for Presto and Apache Spark to read and write data in Phoenix tables. We will describe the new phoenix connector that implements Spark’s DataSource v2 API which will enable customizing and optimizing reads and writes to Phoenix tables.
We will also demo the Presto-phoenix connector, showing how it can be used to federate multiple Phoenix clusters and join Phoenix data with different types of data sources.
We will also describe some in progress work to more tightly integrate with the query optimizers of these frameworks in order to provide table statistics and push down filters, limits and aggregates into Phoenix whenever possible in order to speed up query execution.
Another area being worked on is to provide a way to support bulk loading using HFiles.
Смотрите видео Integrating Apache Phoenix with Distributed Query Engines онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь DataWorks Summit 14 Июнь 2019, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 1,36 раз и оно понравилось 1 людям.