Personal tools
You are here: Home Events Title: Parallel Query Processing with Bounded Communication for SQL-over-NoSQL systems

Title: Parallel Query Processing with Bounded Communication for SQL-over-NoSQL systems

Speaker: Yang Cao

What
When Dec 18, 2018
from 01:00 PM to 02:00 PM
Where MF2, Level 4
Add event to calendar vCal
iCal

Abstract:

The SQL-over-NoSQL architecture has found prevalent use in industrial systems to process massive datasets, e.g., Google's Spanner, Facebook's MyRocks, Apache Hive and SparkSQL with Cassandra, among others. In these systems, data is organized as a key-value store in a storage cluster made of commodity machines, while query processing is carried in an elastic SQL layer. Such an architecture offers good horizontal scalability, availability, reliability, and cost-efficiency. However, such systems suffer two major bottlenecks for answering SQL queries: (a) bulk operations like scans are particularly slow over key-value stores, and (b) communication cost is heavy for parallel query evaluation.

In this talk, I will talk about how to mitigate these issues by rethinking the data model used for representing relations in these systems. In particular, we will see how to avoid costly scans and even bound the communication cost with an embarrassingly simple new data model for SQL-over-NoSQL systems.

Document Actions