Latest AI News

arXiv cs.AI•about 15 hours ago·5 min read

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

arXiv:2605.26870v1 Announce Type: cross Abstract: Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persistently in a real academic research environment with durable memory, local files, external tools, scheduled routines, delegated roles, and explicit safety protocols. Methods: A structured self-observed implementation case study was conducted from January 31 to May 25, 2026. The unit of analysis was the persistent human-agent environment: researcher, agent runtime, memory layer, tools, repositories, scheduled jobs, specialized agent roles, and governance rules. Outcomes were organized using PARE-M (Persistent Agentic Research Environment Measurement), a measurement framework covering architecture, utilization, artifact production, resource use, reproducibility, and governance. Results: Recoverable main-agent telemetry contained 75,671 de-duplicated records across 96 active days, with 8,059 user-role and 23,710 assistant-role messages. The workspace included 502 memory-related files, 17 configured agent directories, and 57 skill files. Active system time was 579.7 hours (30-minute capped-gap estimate). Memory-derived records identified 482 output-proxy events and 889 failure, verification, correction, or protocol-proxy events. A strict May 2026 trajectory subset captured 627 model-completed events and 73.95 million recorded tokens, of which 82.9% were cache reads. Conclusions: The workflow was cache-dominant, suggesting that persistent agentic environments may shift the economic unit from cost per token to cost per completed artifact. Future evaluations should use artifact-level denominators, reproducible parsing rules, correction taxonomies, and independent coding of governance events.

Latest News

Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning

Latest News

Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Self-Ensembling Vision-Language Models for Chart Data Extraction

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Trust Region Q Adjoint Matching

Drive-P2D: A Progressive Perception-to-Decision Benchmark for VLMs in Autonomous Driving

Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding

Augment Engineering: A Methodology for Multi-Tool AI Orchestration Across Professional Domains

Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards

Monte Carlo Permutation Search

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation

Strategies for Guiding LLMs to Use Software Design Patterns: A Case of Singleton

Practical Anonymous Two-Party Gradient Boosting Decision Tree

High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework

LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring