The need for better workflows in data analysis of electrophysiological signals

What is a typical workflow for data analysis?

  • Data is recorded (or simulated)
  • Data is stored
  • Data is pre-processed
  • Data is analyzed
  • A paper is written

Where are the complications in such a seemingly simple workflow?

New challenges in dealing with new qualities of electrophysiological data: complexity

  • Stimulus sets and behavior with difficult or high-dimensional parametrization
  • Multi-modal approaches
  • Massively parallel data streams up to 200 electrodes or more
  • Research questions that aim to exploit the parallel/complex nature of such data: correlations

Impacts on data analysis workflows

Features of modern experiments:

  • Large size
  • Various data types
  • Complicated stimuli/behavior
  • Involved, complex analysis
  • →Collaborative effort

Impacts:

  • Managing data/results
  • Linking analyses
  • Computational load
  • Reproducibility
  • Code verification
  • Methods validation
  • Experimental complexity
  • Workflow and collaboration

Managing data/results: Dramatic increase in data volume

Centralized data storage necessary:

  • Raw signals at 30kHz, 200 channels, 150GB per day, for 6 months → 40TB
  • Experimenter-side post-processing (e.g., spike sorting) adds to storage load
  • Transport of data sets between collaborators / labs / computer systems

Needed:

  • Uniform data structures
  • On-demand loading on laptops and HPC environments
  • Multi-user, persistent, versioned, database-like organization

TO BE COMPLETED

Summary

We face new dimensions of data analysis workflow complexity!

Verification and validation in data analysis of electrophysiological data occur on different levels

  • dealing with special data features or artifacts
  • advancement and calibration of analysis methods
  • validation of a specific hypothesis by a combination of methods

The complexity of the analysis workflow is inflated by

  • dependencies between analysis parameters
  • dependencies between different signal types
  • advanced, non-trivial surrogate technique
  • additional calibration analyses tuned to parameters
  • combinatorial explosion via (stacked) surrogate techniques
  • sensitivity of parameters leading to complex parameter spaces

MORE...