A dynamic approach to sorting with respect to big data

University essay from Umeå universitet/Institutionen för datavetenskap

Abstract: This study introduces a dynamic approach to sorting, making use of predictions and data gathered during run-time to optimize the sorting of the current data set. This approach is used to develop a sorting algorithm called DynamicSort which partitions data and calculates a partial standard deviation for each partition to determine which of two sorting algorithms should be used to sort the partition. The algorithm is tested against Quicksort and radix sort on data sets of different sizes and standard deviation with the intent of finding advantages of the approach. In order to adapt to modern applications, the algorithm is tested in an environment utilizing parallel processing on multiple machines on data sets generated to mimic the characteristic size of big data. To accommodate this the data is divided at start and merged together after sorting using a k-way merge sort. While the tests conducted do not show any concrete gain in performance there are several factors that could be further optimized and evaluated. We find that it is not enough to simply consider the standard deviation in this approach. While no real instance of big data was used the algorithm was adapted for limited cache sizes and multiple hosts working in parallel.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)