Genomics in the Cloud

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: David Östlund; [2021]

Keywords: Cloud; IT; Genomics; GCP; AWS;

Abstract: The continued cost reduction for sequencing genomics data is causing an exponentialgrowth in the amount of data available. Moving both storage and calculation of thisdata to the cloud has been a common trend, but the way to do it is not alwaysobvious. This report compares three different alternatives for doing ad-hoc queries ina cloud based setting: two solutions using data lakes and one solution using arelational database hosted in the cloud. The data lake solutions proved to be easy toset up and fully functional for querying genomics data. The relational database wasmore complicated to set up, but the queries were more time efficient and more costefficient when performing more than 1200 queries per month on at least 100GB ofdata. To make the cloud computing possible for genomics data it had to betransformed into a file format supported by the cloud providers. For this purpose theParquet file format was chosen, tested, and proven to work well

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)