Benchmarking Deep Learning Testing Techniques A Methodology and Its Application
Abstract: With the adoption of Deep Learning (DL) systems within the security and safetycriticaldomains, a variety of traditional testing techniques, novel techniques, andnew ideas are increasingly being adopted and implemented within DL testing tools.However, there is currently no benchmark method that can help practitioners tocompare the performance of the different DL testing tools. The primary objectiveof this study is to attempt to construct a benchmarking method to help practitionersin their selection of a DL testing tool. In this paper, we perform an exploratory studyon fifteen DL testing tools to construct a benchmarking method and have made oneof the first steps towards designing a benchmarking method for DL testing tools. Wepropose a set of seven tasks using a requirement-scenario-task model, to benchmarkDL testing tools. We evaluated four DL testing tools using our benchmarking tool.The results show that the current focus within the field of DL testing is on improvingthe robustness of the DL systems, however, common performance metrics to evaluateDL testing tools are difficult to establish. Our study suggests that even though thereis an increase in DL testing research papers, the field is still in an early phase; it is notsufficiently developed to run a full benchmarking suite. However, the benchmarkingtasks defined in the benchmarking method can be helpful to the DL practitionersin selecting a DL testing tool. For future research, we recommend a collaborativeeffort between the DL testing tool researchers to extend the benchmarking method.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)