US 12,386,838 B2
Spatial join query method and apparatus, electronic device, and storage medium
Rubin Wang, Beijing (CN); Jie Bao, Beijing (CN); Ruiyuan Li, Beijing (CN); Huajun He, Beijing (CN); and Chujing Tan, Beijing (CN)
Assigned to JINGDONG CITY (BEIJING) DIGITS TECHNOLOGY CO., LTD., Beijing (CN)
Appl. No. 18/259,407
Filed by JINGDONG CITY (BEIJING) DIGITS TECHNOLOGY CO., LTD., Beijing (CN)
PCT Filed Sep. 22, 2021, PCT No. PCT/CN2021/119520
§ 371(c)(1), (2) Date Jun. 27, 2023,
PCT Pub. No. WO2022/142503, PCT Pub. Date Jul. 7, 2022.
Claims priority of application No. 202011638237.4 (CN), filed on Dec. 31, 2020.
Prior Publication US 2024/0061842 A1, Feb. 22, 2024
Int. Cl. G06F 16/2455 (2019.01); G06F 16/22 (2019.01); G06F 16/2458 (2019.01)
CPC G06F 16/2456 (2019.01) [G06F 16/2246 (2019.01); G06F 16/2471 (2019.01)] 14 Claims
OG exemplary drawing
 
1. A method for spatial join query, comprising:
obtaining a first resilient distributed dataset of first spatial data and a second resilient distributed dataset of second spatial data; wherein the first resilient distributed dataset and the second resilient distributed dataset comprise a plurality of spatial partitions;
generating multi-tree spatial indexes for the spatial partitions, and collecting statistics about spatial distribution information of geometric objects according to the multi-tree spatial indexes, and obtaining global spatial distribution information of the first resilient distributed dataset and global spatial distribution information of the second resilient distributed dataset;
determining an intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset according to the global spatial distribution information; and
setting data corresponding to the intersecting spatial partition in the first spatial data and the second spatial data as target data, and performing spatial join calculation on the target data,
wherein said determining an intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset according to the global spatial distribution information comprises:
determining peer nodes level by level according to the global spatial distribution information; wherein, the peer nodes are nodes which have the same path in the global spatial distribution information corresponding to the first resilient distributed dataset and the second resilient distributed dataset node; and
screening the peer nodes according to a statistical value of each node in the peer nodes, and obtaining the intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset; wherein, the statistic value is a statistic result of collecting statistics about the spatial distribution information of geometric object,
wherein said screening the peer nodes according to a statistical value of each node in the peer nodes, and obtaining the intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset comprises:
in response to determining that the peer nodes comprise a leaf node and a non-leaf node, determining whether the statistical values of the leaf node and a parent node of the leaf node are both 0; in response to determining that any one of the statistical values of the leaf node and the parent node of the leaf node is not 0, setting the spatial partition corresponding to the peer nodes as an intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset;
in response to determining that the peer nodes comprise two non-leaf nodes, determining the peer nodes level by level according to the global spatial distribution information; and
in response to determining that the peer nodes comprise two leaf nodes, determining whether neither of the statistical values of the two leaf nodes are 0; in response to determining that neither of the statistical values of the two leaf nodes are 0, setting the spatial partition corresponding to the peer nodes as the intersecting spatial partition of the first resilient distributed dataset and the second resilient distributed dataset.