The need for better workflows in data analysis of electrophysiological signals¶
What is a typical workflow for data analysis?
- Data is recorded (or simulated)
- Data is stored
- Data is pre-processed
- Data is analyzed
- A paper is written
Where are the complications in such a seemingly simple workflow?
New challenges in dealing with new qualities of electrophysiological data: complexity¶
- Stimulus sets and behavior with difficult or high-dimensional parametrization
- Multi-modal approaches
- Massively parallel data streams up to 200 electrodes or more
- Research questions that aim to exploit the parallel/complex nature of such data: correlations
Impacts on data analysis workflows¶
Features of modern experiments:
- Large size
- Various data types
- Complicated stimuli/behavior
- Involved, complex analysis
- →Collaborative effort
Impacts:
- Managing data/results
- Linking analyses
- Computational load
- Reproducibility
- Code verification
- Methods validation
- Experimental complexity
- Workflow and collaboration
Managing data/results: Dramatic increase in data volume¶
Centralized data storage necessary:
- Raw signals at 30kHz, 200 channels, 150GB per day, for 6 months → 40TB
- Experimenter-side post-processing (e.g., spike sorting) adds to storage load
- Transport of data sets between collaborators / labs / computer systems
Needed:
- Uniform data structures
- On-demand loading on laptops and HPC environments
- Multi-user, persistent, versioned, database-like organization
TO BE COMPLETED
Summary¶
We face new dimensions of data analysis workflow complexity!
Verification and validation in data analysis of electrophysiological data occur on different levels
- dealing with special data features or artifacts
- advancement and calibration of analysis methods
- validation of a specific hypothesis by a combination of methods
The complexity of the analysis workflow is inflated by
- dependencies between analysis parameters
- dependencies between different signal types
- advanced, non-trivial surrogate technique
- additional calibration analyses tuned to parameters
- combinatorial explosion via (stacked) surrogate techniques
- sensitivity of parameters leading to complex parameter spaces
MORE...