Latest AI News

arXiv cs.AI•about 13 hours ago·6 min read

ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation

arXiv:2605.26525v1 Announce Type: cross Abstract: Minute-scale cinematic video generation is a central challenge for generative video models. Existing paradigms address only fragments of this challenge: single-shot extrapolation preserves an anchor but lacks cinematic structure, while multi-shot storytelling imposes structure yet remains free to invent its visual states rather than continue an observed one. We define Multi-Shot Video Extrapolation (MSVE), a task that extends an observed frame or clip into a sequence of cinematically structured shots while preserving anchor state and advancing narrative intent. This setting operates under the finite per-call generation budget of short-video models. We identify three coupled bottlenecks: (1) global planners over-specify unsupported details from full screenplays; (2) shot-level prompts dilute task-relevant state when carrying the complete story; and (3) temporal chaining turns generated frames into a lossy memory in which identity, scene, object, and action state decay. MSVE reveals that long-video failure is not merely a limitation of context length, but a failure of context allocation. We propose Recursive Context Allocation (ReCA), an inference-time framework that allocates context hierarchically across planning and generation. ReCA recursively decomposes MSVE into context-bounded subproblems, invokes frozen generators at leaf nodes, and propagates structured state updates across time. To evaluate this setting, we further propose MSVE-Bench and NB-Q, a source-grounded protocol with prompts purpose-built for 3 to 5 minute long-video generation, a regime not addressed by existing short-clip benchmarks. Compared to previous methods, ReCA improves average normalized score by 8 to 16 percent over the strongest competing controller and improves multi-shot consistency metrics by 28 to 43 percent. View the project page at https://reca.vmv.re.

Latest News

Plans for Evaluating Structured Generative Search Summaries

Confounder Detection via Treatment Intent: A New Observational Study Design

Latest News

Plans for Evaluating Structured Generative Search Summaries

Confounder Detection via Treatment Intent: A New Observational Study Design

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation

DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials

Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

Securing Multi-Agent Systems Against Corruptions via Node Contribution Backpropagation

Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection

Geometrically Constrained Outlier Synthesis

Foundations of a Time-Consistent Counterfactual Actuarial Runtime for Autonomous AI Agents

StreamSplit: Continuous Audio Representation Learning via Uncertainty-Guided Adaptive Splitting

Examining the Challenges of Intellectual Property in AI-Generated Productions

Linear and Neural Dueling Bandits with Delayed Feedback

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation

Measuring Prediction Uncertainty in Neural Cellular Automata

Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

Adversarial Training for Robust Coverage Network under Worst-case Facility Losses