A Deep-Dive on PolarDB Parallel Scanning Technology

1. Background

2. How Parallel Queries Work

3. Select Tables for Parallel Scanning

SELECT c.c_name, sum(o.o_totalprice) as s
FROM customer c, orders o
WHERE c.c_custkey = o.o_custkey
AND o_orderdate >= '1996-01-01'
AND o_orderdate <= '1996-12-31'
GROUP BY c.c_name
ORDER BY s DESC
LIMIT 10;

4. Join Multiple Tables Selected for Parallel Scanning

SELECT o.o_custkey, sum(l.l_extendedprice) as s
FROM orders o, lineitem l
WHERE o.o_custkey = l.l_orderkey
GROUP BY o.o_custkey
ORDER BY s
LIMIT 10;
  • In the first solution, all the data in the tables need to be read, partitioned based on the selected hash key, and distributed to different threads for data processing. This requires the use of an additional Repartition operator, which sends data to different threads based on the partition rules.
  • In the second solution, after the shared hash table named Build is created, each thread reads a shard from the Probe table and then performs a hash join. The Probe table does not need to be sharded based on a hash key, but the shards read by threads cannot overlap.

5. Parallel Execution of Complex Operators in Statistical Analysis

6. Linear Acceleration of Parallel Query Engines for TPC-H Queries

7. Summary

About Us

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Tech

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!