Best practices for managing complex workflows: data

  • formal version control systems don’t work well with large data files, or with non-text files (but see git media)
  • use open formats where possible
  • store metadata with, or close to, data
  • ensure everything has a time stamp
  • never change a data file (e.g. to remove artifacts, bad recordings), unless it is under version control - instead copy the file and edit the copy, recording how you got from one to the other. To enforce this, make raw data read-only. Ideally the transformation should be scripted/automated.

The curse of file formats

Unlike MRI scanners, for which there is only a handful of manufacturers (the main ones being GE, Siemens, Philips, Hitachi and Toshiba) pressure from hospitals for interoperability and standard file formats (NIfTI, DICOMM) there are dozens of providers of electrophysiology recording equipment, each of which also provides their own acquisition (and often analysis) software, and each of which uses their own, proprietary file format.