A Systematic Comparison of fMRI-to-video Reconstruction Techniques

Presented at ICML 2024's Controllable Video Generation (CVG) workshop

Camilo Fosco*, Benjamin Lahner*, Bowen Pan, Alex Andonian, Emilie Josephs, Alex Lascelles, Aude Oliva

Summary

We reconstruct videos from functional magnetic resonance imaging (fMRI) data of humans viewing these videos. This work focuses on comparing the reconstruction quality of the videos using various methodologies. Specifically, we vary the backbone video generation models between Zeroscope V2 [1], Modelscope [2], Stable Video Diffusion [3], and Hotshot-XL [4] and assess video reconstruction quality on videos from the BOLD Moments Dataset [5] and CC2017 [6] dataset. We find that Zeroscope V2 performs best on both BMD and CC2017, and the general pipeline achieves better quantitative and qualitative reconstructions than other leading methodologies. We describe the full reconstruction pipeline in detail in our ECCV 2024 paper. Interactive reconstruction examples on this webpage coming soon.

reconstruction comparison overview

Acknowledgements

This research was funded by the Multidisciplinary University Research Initiative (MURI) award by the Army Research Office (grant No. W911NF-23-1-0277) to A.O; the MIT EECS MathWorks Fellowship to B.L.; the MIT EECS MathWorks Fellowship to C.F.

References

  1. Hysts. Zeroscope v2. https://huggingface.co/ spaces/hysts/zeroscope-v2, 2024. Accessed: 2024-06-05.
  2. Wang, J., Yuan, H., Chen, D., Zhang, Y., Wang, X., and Zhang, S. Modelscope text-to-video technical report. arXiv preprint arXiv:2308.06571, 2023.
  3. Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023a.
  4. Mullan, J., Crawbuck, D., and Sastry, A. Hotshot- XL, October 2023. URL https://github.com/ hotshotco/hotshot-xl.
  5. Lahner, B., Dwivedi, K., Iamshchinina, P., Graumann, M., Lascelles, A., Roig, G., ... & Cichy, R. (2024). Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nature Communications, 15(1), 6241.
  6. Wen, H., Shi, J., Zhang, Y., Lu, K. H., Cao, J., & Liu, Z. (2018). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12), 4136-4160.