A Systematic Comparison of fMRI-to-video Reconstruction Techniques
Presented at ICML 2024's Controllable Video Generation (CVG) workshop
Camilo Fosco*, Benjamin Lahner*, Bowen Pan, Alex Andonian, Emilie Josephs, Alex Lascelles, Aude Oliva
Summary
We reconstruct videos from functional magnetic resonance imaging (fMRI) data of humans viewing these videos. This work focuses on comparing the reconstruction quality
of the videos using various methodologies. Specifically, we vary the backbone video generation models between Zeroscope V2 [1], Modelscope [2], Stable
Video Diffusion [3], and Hotshot-XL [4] and assess video reconstruction quality on videos from the BOLD Moments Dataset [5] and
CC2017 [6] dataset. We find that Zeroscope V2 performs best on both BMD and CC2017, and the general pipeline achieves better quantitative and qualitative
reconstructions than other leading methodologies. We describe the full reconstruction pipeline in detail in our
ECCV 2024 paper. Interactive reconstruction examples on this webpage coming soon.
Acknowledgements
This research was funded by the Multidisciplinary University Research Initiative (MURI) award by the Army Research Office (grant No. W911NF-23-1-0277) to A.O; the
MIT EECS MathWorks Fellowship to B.L.; the MIT EECS
MathWorks Fellowship to C.F.
References
- Hysts. Zeroscope v2. https://huggingface.co/
spaces/hysts/zeroscope-v2, 2024. Accessed:
2024-06-05.
-
Wang, J., Yuan, H., Chen, D., Zhang, Y., Wang, X., and
Zhang, S. Modelscope text-to-video technical report.
arXiv preprint arXiv:2308.06571, 2023.
-
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D.,
Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V.,
Letts, A., et al. Stable video diffusion: Scaling latent
video diffusion models to large datasets. arXiv preprint
arXiv:2311.15127, 2023a.
-
Mullan, J., Crawbuck, D., and Sastry, A. Hotshot-
XL, October 2023. URL https://github.com/
hotshotco/hotshot-xl.
-
Lahner, B., Dwivedi, K., Iamshchinina, P., Graumann, M., Lascelles, A., Roig, G., ... & Cichy, R. (2024). Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nature Communications, 15(1), 6241.
-
Wen, H., Shi, J., Zhang, Y., Lu, K. H., Cao, J., & Liu, Z. (2018). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12), 4136-4160.