Fast Feature Field (F3)

Abstract

This paper develops a mathematical argument and algorithms for building representations of data from event-based cameras, that we call Fast Feature Field (F³). We learn this representation by predicting future events from past events and show that it preserves scene structure and motion information. F³ exploits the sparsity of event data and is robust to noise and variations in event rates. It can be computed efficiently using ideas from multi-resolution hash encoding and deep sets—achieving 120 Hz at HD and 440 Hz at VGA resolutions. F³ represents events within a contiguous spatiotemporal volume as a multi-channel image, enabling a range of downstream tasks. We obtain state-of-the-art performance on optical flow estimation, semantic segmentation, and monocular metric depth estimation, on data from three robotic platforms (a car, a quadruped robot and a flying platform), across different lighting conditions (daytime, nighttime), environments (indoors, outdoors, urban, as well as off-road) and dynamic vision sensors (resolutions and event rates). Our implementations can predict these tasks at 25-75 Hz at HD resolution.

F³ is a strong foundation for event-based perception

It is possible to use F³ in any standard computer vision algorithm or neural architecture built for RGB data. F³ can be the foundation for a variety of robotic perception tasks. We show state-of-the-art performance on data from three platforms (driving, quadruped locomotion and a flying platform) and four tasks (supervised semantic segmentation, unsupervised optical flow, unsupervised stereo matching, and supervised monocular metric depth estimation). These sequences showcase F³ performing multiple perception tasks simultaneously—demonstrating its versatility as a foundation for robotic perception.

F³ architecture is designed specifically for events

F³ is a predictive representation of events. It is a statistic of past events sufficient to predict future events. We prove that such a representation retains information about the structure and motion in the scene. F³ achieves low-latency computation by exploiting the sparsity of event data using a multi-resolution hash encoder and permutation-invariant architecture. Our implementation can compute F³ at 120 Hz and 440 Hz at HD and VGA resolutions, respectively, and can predict different downstream tasks at 25-75 Hz at HD resolution. These HD inference rates are roughly 2-5 times faster than the current state-of-the-art event-based methods. Please refer to the paper for more details.

An overview of the neural architecture for Fast Feature Field (F³).

F³ is robust

F³-based approaches work robustly without additional training across data from different robotic platforms, lighting and environmental conditions (daytime vs. night-time, indoors vs. outdoors, urban vs. off-road) and dynamic vision sensors (with different resolutions and event rates). See examples of F³ (trained on daytime urban car driving sequences) generalizing to different

F³ accurately captures scene motion in an almost completely dark parking lot

F³ accurately captures the motion of oncoming traffic despite extreme lighting changes—generalizing to the high dynamic range capability of events

F³ captures fine monocular depth and optical flow in a forest environment, despite being trained only on urban driving sequences

The motion information in F³ helps extract the correct flow and depth of the closer tree branches, which often get smeared with the background in RGB images

F³ can accurately capture the depth of an indoor cluttered environment from a quadruped robot

F³ can capture the depth of fine structures in the stairwell of a building

F³ can capture the depth of skaters moving rapidly in a skatepark

F³ can capture the sharp depth of indoor structures from a flying platform

F³ can consistently estimate the depth of outdoor structures from a flying platform, even at very high speeds

F³ generalizes to event cameras with different resolutions and event rates

BibTeX

@misc{das2025fastfeaturefield, title={Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events}, author={Richeek Das and Kostas Daniilidis and Pratik Chaudhari}, year={2025}, eprint={2509.25146}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.25146}, }

Fast Feature Field (F³): A Predictive Representation of Events

Abstract

Video

F³ is a strong foundation for event-based perception

Overview: Driving scenes in Philly showing multi-task capabilities in urban environments using events.

Standard cameras fail to detect pedestrians crossing the road in low-light conditions

Standard cameras fail to detect pedestrians hidden in the shadows cast by sidewalk trees

Standard cameras fail to detect pedestrians hidden in the shadows cast by buildings

F³ captures very subtle structure and motion details, such as the fine movements of pedestrians

F³ prominently captures moving objects like bikes and pedestrians crossing the road

F³ captures structure and motion even under difficult lighting and very high speeds (freeway) - distinctly segments the passing car and oncoming traffic

F³ architecture is designed specifically for events

F³ is robust

F³ accurately captures scene motion in an almost completely dark parking lot

F³ accurately captures the motion of oncoming traffic despite extreme lighting changes—generalizing to the high dynamic range capability of events

F³ captures fine monocular depth and optical flow in a forest environment, despite being trained only on urban driving sequences

The motion information in F³ helps extract the correct flow and depth of the closer tree branches, which often get smeared with the background in RGB images

F³ can accurately capture the depth of an indoor cluttered environment from a quadruped robot

F³ can capture the depth of fine structures in the stairwell of a building

F³ can capture the depth of skaters moving rapidly in a skatepark

F³ can capture the sharp depth of indoor structures from a flying platform

F³ can consistently estimate the depth of outdoor structures from a flying platform, even at very high speeds

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates

BibTeX

Fast Feature Field (F3): A Predictive Representation of Events

Abstract

Video

F3 is a strong foundation for event-based perception

Overview: Driving scenes in Philly showing multi-task capabilities in urban environments using events.

Standard cameras fail to detect pedestrians crossing the road in low-light conditions

Standard cameras fail to detect pedestrians hidden in the shadows cast by sidewalk trees

Standard cameras fail to detect pedestrians hidden in the shadows cast by buildings

F3 captures very subtle structure and motion details, such as the fine movements of pedestrians

F3 prominently captures moving objects like bikes and pedestrians crossing the road

F3 captures structure and motion even under difficult lighting and very high speeds (freeway) - distinctly segments the passing car and oncoming traffic

F3 architecture is designed specifically for events

F3 is robust

F3 accurately captures scene motion in an almost completely dark parking lot

F3 accurately captures the motion of oncoming traffic despite extreme lighting changes—generalizing to the high dynamic range capability of events

F3 captures fine monocular depth and optical flow in a forest environment, despite being trained only on urban driving sequences

The motion information in F3 helps extract the correct flow and depth of the closer tree branches, which often get smeared with the background in RGB images

F3 can accurately capture the depth of an indoor cluttered environment from a quadruped robot

F3 can capture the depth of fine structures in the stairwell of a building

F3 can capture the depth of skaters moving rapidly in a skatepark

F3 can capture the sharp depth of indoor structures from a flying platform

F3 can consistently estimate the depth of outdoor structures from a flying platform, even at very high speeds

F3 generalizes to event cameras with different resolutions and event rates

F3 generalizes to event cameras with different resolutions and event rates

F3 generalizes to event cameras with different resolutions and event rates

F3 generalizes to event cameras with different resolutions and event rates

BibTeX

Fast Feature Field (F³): A Predictive Representation of Events

F³ is a strong foundation for event-based perception

F³ captures very subtle structure and motion details, such as the fine movements of pedestrians

F³ prominently captures moving objects like bikes and pedestrians crossing the road

F³ captures structure and motion even under difficult lighting and very high speeds (freeway) - distinctly segments the passing car and oncoming traffic

F³ architecture is designed specifically for events

F³ is robust

F³ accurately captures scene motion in an almost completely dark parking lot

F³ accurately captures the motion of oncoming traffic despite extreme lighting changes—generalizing to the high dynamic range capability of events

F³ captures fine monocular depth and optical flow in a forest environment, despite being trained only on urban driving sequences

The motion information in F³ helps extract the correct flow and depth of the closer tree branches, which often get smeared with the background in RGB images

F³ can accurately capture the depth of an indoor cluttered environment from a quadruped robot

F³ can capture the depth of fine structures in the stairwell of a building

F³ can capture the depth of skaters moving rapidly in a skatepark

F³ can capture the sharp depth of indoor structures from a flying platform

F³ can consistently estimate the depth of outdoor structures from a flying platform, even at very high speeds

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates

F³ generalizes to event cameras with different resolutions and event rates