MOSAIC logo

A scalable framework for fMRI dataset aggregation and modeling of human vision

Benjamin Lahner1,2,3,4, Mayukh Deb5,6, Apurva Ratan Murty5,6, Aude Oliva1
1 Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
2 Department of Ophthalmology, Byers Eye Institute, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
3 Stanford Bio-X, Stanford University, Stanford, CA, USA.
4 Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.
5 Cognition and Brain Science, School of Psychology, Georgia Tech, Atlanta, GA, USA.
6 Computational Cognition, Georgia Tech, Atlanta, GA, USA.
0
Datasets
0
Subjects
0
Stimuli
0
Trials

Why MOSAIC?

Human fMRI neuroscience needs massive scale to keep up with modern deep learning and produce generalizable results. Isolated fMRI experiments cannot get us there. MOSAIC unifies existing (and future) fMRI datasets with a shared preprocessing pipeline and a cross-dataset test/train data split. Now researchers can train bigger brain models, test results across datasets, and contribute their own datasets to shape the future of MOSAIC and computational neuroscience.

Get started with MOSAIC in seconds!

Use our Python package to access eight of the largest fMRI datasets in just a few lines of code!

pip install mosaic-dataset
import mosaic

dataset = mosaic.load(
    names_and_subjects={
        "NSD": [2,3],
        "deeprecon": "all",
    },
    folder="./MOSAIC" 
)

print(dataset[0])

Alternatively, you can browse the S3 bucket to download data manually or use the AWS command line interface.

aws s3 ls --no-sign-request s3://mosaicfmri/

Abstract

Recent large-scale vision fMRI datasets have been invaluable resources to the vision neuroscience community for their deep sampling of individual subjects and diverse stimulus sets. However, practical limitations to the number of subjects, stimuli, and trials that can be collected prevent individual fMRI datasets from reaching the scale necessary for modern modeling approaches and robust conclusions. Here, we introduce MOSAIC (Meta-Organized Stimuli And fMRI Imaging data for Computational modeling), a fMRI dataset aggregation framework designed to leverage the richness of individual datasets for computationally intensive modeling and robust tests of generalization. MOSAIC is composed of eight large-scale vision fMRI datasets totaling 93 subjects, 430,007 fMRI-stimulus pairs, and 162,839 naturalistic and artificial stimuli. A shared fMRI preprocessing pipeline and a filtered test-train split minimizes dataset-specific confounds and test-set leakage when aggregating the datasets. Crucially, this rigorous procedure can be applied to additional datasets post-hoc, allowing MOSAIC to evolve according to the community's interests. We use MOSAIC to show that perceptually diverse stimulus sets consistently improve decoding accuracy and stability, carrying implications for future fMRI stimulus set design. We then jointly train brain-optimized encoding models across subjects and datasets to predict fMRI activity of all visual cortex and even the whole brain. In silico functional localizer experiments performed on these digital twin models are able to recover subject-specific category-selective cortical regions. Together, MOSAIC provides a scalable and community-driven solution to build robust models of human vision.

Acknowledgements

Thank you to the Amazon's AWS Open Data Sponsorship Program for hosting the data and the Multidisciplinary University Research Initiative (MURI) award by the Army Research Office (grant No. W911NF-23-1-0277) to A.O.

Citation

Coming Soon!