Influence of File Systems on Performance When Working with an Abundance of Small Files

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Simon Andersson; [2017]

Keywords: ;

Abstract: High-performance computing is widely used within the scientific community to perform demanding computational work. Using the resources available at a high-performance center in an efficient manner is of great importance. One potential bottleneck for high-performance computing is file systems. In this study two different file systems, the Lustre file system and MATLAB Datastore, have been evaluated in terms of performance when performing computations on an abundance of small files. The performance test consisted of classification of large numbers of small (<2 megabytes) images in MATLAB using the high-performance computer system Kebnekaise at HPC2N in Umeå. Results indicate that MATLAB Datastore gives better performance than the Lustre file system for all images sets tested in the study. This makes it possible to recommend using MATLAB Datastore over the Lustre file system in situations where large number of smaller files are to be read and from the file system.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)