摘 要
随着大数据时代的到来,数据处理量呈指数级增长,传统CPU排序算法在处理海量数据时面临性能瓶颈。为提高排序效率,本研究聚焦于GPU平台上的高效并行排序算法实现,旨在充分发挥GPU的并行计算优势,解决大规模数据排序问题。基于此,提出了一种适用于GPU架构的混合并行排序算法,该算法结合了多级Bitonic排序与优化后的归并排序,通过合理划分任务、优化内存访问模式以及减少线程间同步开销等手段,有效提升了排序性能。实验结果表明,在处理10^8规模随机整数序列时,所提算法相较于传统CPU单核快速排序速度提升达50倍以上,且在不同数据分布情况下均表现出良好稳定性。此外,针对GPU硬件特性,引入动态负载均衡机制,进一步提高了资源利用率和算法鲁棒性。本研究不仅为GPU加速排序提供了新思路,也为其他并行计算任务的设计与优化提供了有益参考,具有重要的理论意义和应用价值。
关键词:GPU并行排序;混合排序算法;Bitonic排序;归并排序优化;动态负载均衡
Abstract
With the advent of the big data era, the volume of data processing has grown exponentially, leading to performance bottlenecks for traditional CPU-based sorting algorithms when handling massive datasets. To enhance sorting efficiency, this study focuses on the implementation of efficient parallel sorting algorithms on GPU platforms, aiming to fully leverage the parallel computing advantages of GPUs to address large-scale data sorting challenges. Consequently, a hybrid parallel sorting algorithm tailored for GPU architecture is proposed, integrating multi-level Bitonic sort with optimized merge sort. By effectively partitioning tasks, optimizing memory access patterns, and minimizing thread synchronization overhead, the proposed algorithm significantly improves sorting performance. Experimental results demonstrate that, when processing random integer sequences of 10^8 scale, the proposed algorithm achieves more than a 50-fold speedup compared to traditional single-core quicksort on CPUs, while maintaining robust stability across different data distributions. Furthermore, in consideration of GPU hardware characteristics, a dynamic load balancing mechanism is introduced, further enhancing resource utilization and algorithm robustness. This research not only provides new insights into GPU-accelerated sorting but also offers valuable references for the design and optimization of other parallel computing tasks, holding significant theoretical implications and practical value.
Keywords:Gpu Parallel Sorting;Hybrid Sorting Algorithm;Bitonic Sorting;Merging Sort Optimization;Dynamic Load Balancing
目 录
摘 要 I
Abstract II
引 言 1
第一章 高效并行排序算法概述 2
1.1 并行排序算法分类 2
1.2 算法选择依据 3
第二章 GPU并行排序算法设计 5
2.1 数据分块策略 5
2.2 内存管理优化 5
2.3 线程调度机制 6
第三章 关键技术实现与优化 8
3.1 共享内存利用 8
3.2 指令级并行优化 8
3.3 内存访问模式 9
第四章 性能评估与分析 11
4.1 测试平台搭建 11
4.2 性能指标对比 11
4.3 优化效果分析 12
结 论 14
参考文献 15
致 谢 16