The Spatiotemporal Power Spectrum of Human Vision

Runner with Blur_1010x540
Exploring the space and time content of natural vision

Vision Science PhD candidate Vasha DuTell, a member of the Olshausen Lab, and Dr. Agostino Gibaldi of the Banks lab, are exploring the space and time content of natural vision. The lab seeks to understand the statistics of how the world changes in space and in time, and how fast and slow things move in the environment for a human observer. They are also investigating how body, head, and eye motion modulates the statistics of the signal. To collect data, DuTell and team are using a custom built setup with a mobile computer running a head mounted eye tracker and high frame-rate camera, along with a depth sensor and two motion detectors.

Data Collection

The research team designed and built a custom, wearable head mounted camera, binocular eye tracker, and motion sensing system, tracking world motion, eye motion, and head and body motion at 200 frames per second. Videos were collected from both a static camera mounted on a tripod near the subject, and camera mounted on the head of the subject performing natural tasks.

Body and Eye Motion

While previous works have studied the spatiotemporal statistics of natural signals, none before have been able to account for the contribution of body and eye motion to the signal that reaches the retina, especially at such high spatial and temporal resolution. Using their custom built system, the researchers were able to separate the contribution of each of these motion types, which will help determine how their statistical properties combine, resulting in the final signal on the retina. For each task in the study, researchers recorded data in three conditions: the system sitting on a mannequin head on a tripod (environmental motion only), the raw video from the head mounted camera (body and head motion), as well as the head mounted camera with the subject's eye motion overlaid (retinal signal). Motion sensors are placed on both the body and head so that the contributions of head versus body motion on the signal can be measured separately. And using a depth sensor, researchers were able simulate the blur present in the peripheral regions of the retina.


Having a better understanding of this signal on the retina has important implications for many different aspects of vision science.

First, it will help scientists better understand the relationship between the human visual system and the environment it evolved in. For example, there is a large body of evidence supporting the idea that visual systems (as well as other sensory systems) use neural coding strategies that are matched to the statistics of the environment. Early vision, for instance, has been theorized to equalize the power spectrum of the scene, allowing it to be more efficiently transmitted down the optic nerve. This implies that through evolution, the visual system has been optimized to efficiently encode and process the incoming visual signal. With a better understanding of the signal present on a person's retina, and how head, body, and eye motion contribute to this signal, scientists have added important pieces to the puzzle of how the visual system is adapted to the environment. This information allows us to better tune the parameters of theoretical models for human vision, which tell us how an optimized system would encode and process information, allowing us to predict properties of human vision that haven't been studied yet.

Currently available datasets either do not include all the types of motion present in human vision, include non-human motion such as zoom or scene cuts, or do not include spatial and temporal resolution high enough to match the sensitivity of human perception. Using the system developed by the two labs allows for the collection of a large dataset of high fidelity videos that can be used as stimuli by others in the field to answer a wide variety of scientific questions about vision. Additionally, a description of the statistics present in these movies will pave the way for synthetic stimuli to be designed that can be more easily controlled, yet match the statistics -- including the power spectrum -- of the signal on the retina.

Day in the Life of the Retina

The team is also exploring how these aspects of human vision vary in different environments. For each subject, the researchers collected data in a wide variety of situations, meant to be representative of 'a day in the life' of the retina. Data was collected while the subjects participated in passive, near work, such as reading a book and watching a movie, as well as more interactive varied-distance tasks such as making a sandwich and ordering a coffee. They were asked to go for a walk both inside and outside where light levels and object distances are different, and to participate in more dynamic tasks involving tracking, such as playing table tennis. This dataset will help us understand how the statistics of depth and motion change when people are in these different environments, as well as how the visual system changes its eye and body motion to adapt.

Changing Visual Environment

Nowadays, humans spend much less time outdoors looking at distant objects. Instead, most of us are inside in front of screens. This has changed the visual environment drastically. Evidence has pointed to changes in this visual environment being linked to eye strain, myopia, and disruption of circadian rhythms. Scientist hope to discover how the spatiotemporal power spectrum and other statistics of the visual environment vary in these different situations, and to answer the question: how do we adjust our behavior to adapt?

About the Image

Vasha DuTell with the custom made, wearable head mounted camera, binocular eye tracker, and motion sensing system.

Banks Lab

Olshausen Lab