Generating motion for arbitrary skeletons is a longstanding challenge in
computer graphics, remaining largely unexplored due to the scarcity of diverse datasets and the irregular nature of the data. In this work, we introduce
AnyTop, a diffusion model that generates motions for diverse characters
with distinct motion dynamics, using only their skeletal structure as input. Our work features a transformer-based denoising network, tailored
for arbitrary skeleton learning, integrating topology information into the
traditional attention mechanism. Additionally, by incorporating textual joint
descriptions into the latent feature representation, AnyTop learns semantic
correspondences between joints across diverse skeletons. Our evaluation
demonstrates that AnyTop generalizes well, even with as few as three training examples per topology, and can produce motions for unseen skeletons as
well. Furthermore, our model's latent space is highly informative, enabling downstream tasks such as joint correspondence, temporal segmentation and motion editing.