Repeating processes ranging from natural cycles, such as phases of the moon or heartbeats and breathing, to artificial repetitive processes, like those found on manufacturing lines or in traffic patterns, are commonplace in our daily lives. Beyond just their prevalence, repeating processes are of interest to researchers for the variety of insights one can tease out of them. It may be that there is an underlying cause behind something that happens multiple times, or there may be gradual changes in a scene that may be useful for understanding. Sometimes, repeating processes provide us with unambiguous “action units”, semantically meaningful segments that make up an action. For example, if a person is chopping an onion, the action unit is the manipulation action that is repeated to produce additional slices. These units may be indicative of more complex activity and may allow us to analyze more such actions automatically at a finer time-scale without having a person annotate these units. For the above reasons, perceptual systems that aim to observe and understand our world for an extended period of time will benefit from a system that understands general repetitions.
In “Counting Out Time: Class Agnostic Video Repetition Counting in the Wild”, we presentRepNet, a single model that can understand a broad range of repeating processes, ranging from people exercising or using tools, to animals running and birds flapping their wings, pendulums swinging, and a wide variety of others. In contrast toour previous work, whichused cycle-consistency constraints across different videos of the same action to understand them at a fine-grained level, in this work we present a system that can recognize repetitions within a single video. Along with this model, we are releasing adatasetto benchmark class-agnostic counting in videos and aColab notebookto run RepNet.