Wednesday, 09 December 2009

The Olin Center was started back in 2003, with it's first recorded scans in April. The archiving and retrieval method at the time was a simple web interface that had access to only the past 30 days worth of data. The rest of the data was archived to DVD. After a couple years it became tough to find the data you wanted because you'd need to sift through 200 DVDs to find the MRI scans you wanted. It became more challenging if you needed data from 100 subjects. So in 2005 I built 4 servers to house the MRI data and created a simple web interface to allow people to search and download the directly to our analysis servers. The whole system was dubbed "All Data Online" or the adoserver for short.

In 2007 I rebuilt the system, distributing the data between 2 servers. At the time, there was approximately 6.5TB of data stored on the servers, all instantly searchable and downloadable. Earlier this year, I needed to rebuild the system and placed the entire system on a single server... The trend from 4 to 2 to 1 server is because of the tremendous drop in the price of disk space. It's just more affordable to have 14TB of space on a single server than 2 servers with 7TB. It's also easier to maintain. Since the amount of data transfer on and off the server is about 8GB/day, there's no bottleneck in keeping it in one place.

Now, at the end of 2009 after 7 years of collecting MRI data, our data is archived in triplicate on 1200 DVDs, we have more than 7500 MRI studies, and 170,000 series of MRI data, stored in 4 different formats. In total there are 12TB of data in approximately 68 million files, and the system has used 160 CPU-days to process and archive the data. There are also new reporting, auditing, and trend monitoring tools. Searches are faster and more comprehensive, with thumbnail image results. There have been approximately 40,000 data requests since the system was created, though the counter was reset with each new system.



The system is used to store structural and functional data, which are later analyzed in SPM2, SPM5, even SPM8, freesurfer, FSL, and VBM. The benefit of such a storage system is that any investigator can come in and find N subjects who have run a particular task over the past 7 years. This provides an enormous set of data from which people can test their hypotheses.

I also included some colorful reporting functions to visualize usage by day. Not terribly necessary, but neat to look at:

