Mohammad Darab takes us through Big Data Cluster data pools:
Data pools enable the creation of scale-out data marts. Whether your data is being ingested from Spark jobs or SQL, it is stored into the data pool. Data is distributed across one, or two, SQL Server instances running queries against it is more efficient.
Whether the data is being ingested from IoT device, Kafka, another relational data source (like Oracle or Teradata), it all is stored into the data pool instances and are available as “data marts” for the consumer to work with. There is no need to go back out to the original data source each time you want to query the data. It is all available inside the data pool instances.
This lets you cache data brought in via PolyBase and spread it across a number of instances. That’s pretty powerful.