Only trustable science can be good science. We are convinced that the most trustable academic research can only be achieved when rigorously applying the scientific method. In our view a minimal condition of the correct application of the scientific method is reproducibility. While this seems obvious to be put into practice in Academia, in practice it requires extremely well organised scientists. If one wants to produce relevant, widely accessible and trustable scientific outputs reproducibility is a serious, time and effort consuming commitment. This document aims at describing the framework we implement on a day to day basis to achieve reproducibility of our research as well as teaching.
We view the reproducible research approach as a comprehensive philosophy that is embedded in all activities of our group. It includes the individual research of group members but also internal and external collaborations we have. Products and activities of our group include software development, academic papers, technical reports, white papers and educational activities. For all of them we follow the concept of reproducibility.
Complex scientific challenges can most often only be tackled in collaborations and reproducibility is obviously of immediate importance in such a setting. Statistics is by nature an interdisciplinary effort and as many other disciplines it faces the reproducibility crisis. Reproducibility is therefore a core concern in our work. The workflows that we are involved in benefit optimally from scripting any data handling and analysis not only in view of reproducibility but also in view of inferential and computational efficiency. Within a collaboration pipeline scripted analysis allows reproducibility along with efficient communication. Through the use of scripting in combination with information exchange by a version control system such as GitLab many sources of irreproducibility and plain errors can be eliminated. Such systems can be combined with continuous integration steps saving as a by-product a lot of precious time (e.g., by decreasing the amount of necessary emailing).
Producing and delivering reproducible results on the basis of analysis code implies -- if done well -- that the code is usable independent of specific user environments. This is why we use platform independent and open source programming languages. It also requires well documented code that corresponds to commonly used code styles facilitating readability for the user. In our opinion the tests used to develop code are part of it and therefore we publish them along with the analysis code itself.
Another essential part of reproducibility is the transparency of data. Therefore, we use publicly available and trustable data wherever possible and feasible. Likewise, we make our data products openly accessible together with the necessary documentation.
As a step beyond analysis code we develop software packages allowing to create reproducible results in many setting. In the software development process we follow the dynamic programming approach, which is a method to solve large scale problems by atomising them into simple tasks. Using version control allows us to document changes, ensuring historical reproducibility and efficient collaboration. We pay special attention to publish the necessary documentation together with the software.
We are convinced that raising the awareness of the next generation of scientists to reproducibility issues is of paramount importance. This is why we teach apply our philosophy in the lectures we give. We use version control and dynamic documents for collaborative student projects. We use open source scripting language and we publish the code used in lectures. We promote the use of good statistical practice both as a toolbox in all our teaching offers including lectures and projects.
We put particular emphasis on the propagation of the described methods to PhD students. We discuss about reproducibility issues and solutions regularly and apply them from the very beginning. We do not make grading of final PhD manuscripts dependent on statistical significance or successful publication. More generally we choose to not use any publication metric to assess scientists.
Reproducible research is a fast moving research area and we invest time for scouting new approaches and exchange with other groups.
Declaration on a PDF.
As group leader, I, Reinhard Furrer, have signed and I endorse the Commitment to Research Transparency and Open Science