The need for better workflows in data analysis of electrophysiological signals ============================================================================== What is a typical workflow for data analysis? * Data is recorded (or simulated) * Data is stored * Data is pre-processed * Data is analyzed * A paper is written Where are the complications in such a seemingly simple workflow? New challenges in dealing with new qualities of electrophysiological data: complexity ------------------------------------------------------------------------------------- * Stimulus sets and behavior with difficult or high-dimensional parametrization * Multi-modal approaches * Massively parallel data streams up to 200 electrodes or more * Research questions that aim to exploit the parallel/complex nature of such data: correlations Impacts on data analysis workflows ---------------------------------- **Features of modern experiments:** * Large size * Various data types * Complicated stimuli/behavior * Involved, complex analysis * →Collaborative effort **Impacts:** * Managing data/results * Linking analyses * Computational load * Reproducibility * Code verification * Methods validation * Experimental complexity * Workflow and collaboration Managing data/results: Dramatic increase in data volume ------------------------------------------------------- Centralized data storage necessary: * Raw signals at 30kHz, 200 channels, 150GB per day, for 6 months → 40TB * Experimenter-side post-processing (e.g., spike sorting) adds to storage load * Transport of data sets between collaborators / labs / computer systems Needed: * Uniform data structures * On-demand loading on laptops and HPC environments * Multi-user, persistent, versioned, database-like organization TO BE COMPLETED Summary ------- We face new dimensions of data analysis workflow complexity! Verification and validation in data analysis of electrophysiological data occur on different levels * dealing with special data features or artifacts * advancement and calibration of analysis methods * validation of a specific hypothesis by a combination of methods The complexity of the analysis workflow is inflated by * dependencies between analysis parameters * dependencies between different signal types * advanced, non-trivial surrogate technique * additional calibration analyses tuned to parameters * combinatorial explosion via (stacked) surrogate techniques * sensitivity of parameters leading to complex parameter spaces MORE...