Evaluating Strategic Reasoning in Forecasting Agents

arXiv:2604.26106v1 Announce Type: new
Abstract: Forecasting benchmarks produce accuracy leaderboards but little insight into why some forecasters are more accurate than others. We introduce Bench to the Future 2 (BTF-2), 1,417 pastcasting questions with a frozen 15M-document research corpus in which agents reproducibly research and forecast offline, producing full reasoning traces. BTF-2 detects accuracy differences of 0.004 Brier score, and can distinguish differential agent strengths in research vs. judgment. We build a forecaster 0.011 Brier more accurate than any single frontier agent, and use it to evaluate agent strategic reasoning without hindsight bias. We find the better forecaster differs primarily in its pre-mortem analysis of its blind spots and consideration of black swans. Expert human forecasters found the dominant strategic reasoning failures of frontier agents are in assessing political and business leaders’ incentives, judging their likelihood to follow through on stated plans, and modeling institutional processes.
Continue ReadingEvaluating Strategic Reasoning in Forecasting Agents

OMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms

arXiv:2604.26211v1 Announce Type: new
Abstract: In order to automate AI research we introduce a full, end-to-end framework, OMEGA: Optimizing Machine learning by Evaluating Generated Algorithms, that starts at idea generation and ends with executable code. Our system combines structured meta-prompt engineering with executable code generation to create new ML classifiers. The OMEGA framework has been utilized to generate several novel algorithms that outperform scikit-learn baselines across a robust selection of 20 benchmark datasets (infinity-bench). You can access models discussed in this paper and more in the python package: pip install omega-models.
Continue ReadingOMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

arXiv:2604.26120v1 Announce Type: new
Abstract: Behavioral logs provide rich signals for user modeling, but are noisy and interleaved across diverse intents. Recent work uses LLMs to generate interpretable natural-language personas from user logs, yet evaluation often emphasizes downstream utility, providing limited assurance of persona quality itself. We propose a hierarchical framework that aggregates user actions into intent memories and induces multiple evidence-grounded personas by clustering and labeling these memories. We formulate persona induction as an optimization problem over persona quality-captured by cluster cohesion, persona-evidence alignment, and persona truthfulness-and train the persona model using a groupwise extension of Direct Preference Optimization (DPO). Experiments on a large-scale service log and two public datasets show that our method induces more coherent, evidence-grounded, and trustworthy personas, while also improving future interaction prediction.
Continue ReadingHierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

A Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions

arXiv:2604.25943v1 Announce Type: new
Abstract: Efficient and stable solution of partial differential equations (PDEs) is central to scientific and engineering applications, yet existing numerical solvers rely heavily on matrix based discretizations, while learning based methods require costly training and often suffer from limited generalization. In this work, we proposes a PDE energy driven framework that solves PDEs through physically constrained diffusion iterations, without relying on classical matrix based finite element assembly or data driven neural network training. The proposed method evolves arbitrary random initial fields through PDE energy driven implicit iterations combined with Gaussian smoothing, while strictly enforcing boundary conditions at each iteration. The proposed formulation is applied to representative one dimensional Poisson, Heat, and viscous Burgers equations, covering both steady state and transient problems. Numerical results demonstrate stable convergence to the unique physical solution from random initializations, with accurate resolution of sharp gradients and controlled Mean Squared Error (MSE) across a wide range of discretization parameters. Detailed comparisons with analytical solutions indicate that the framework achieves competitive accuracy and stability. Overall, the proposed framework provides a fast, flexible, and physically consistent alternative to traditional numerical solvers, offering a potential pathway for scalable PDE solutions in both research and engineering applications.
Continue ReadingA Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions

A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms

arXiv:2604.25942v1 Announce Type: new
Abstract: Left ventricular ejection fraction (LVEF) assessment depends on echocardiography, limiting access in primary care and resource-constrained settings. We developed a multimodal machine-learning framework that combines engineered 12-lead ECG timeseries features with structured EHR variables to classify LVEF into four clinically used strata: normal (>50%), mildly reduced (40-50%), moderately reduced (30-40%), and severely reduced (<30%). To support model explainability, we identified the most influential ECG and EHR features via SHAP attributions. Using retrospective data from Hartford HealthCare, we trained XGBoost models on 36,784 ECG-echocardiogram pairs from 30,952 outpatients and evaluated temporal generalizability on 19,966 ECGs from a subsequent period. The multimodal model achieved one-vs-rest AUROCs of 0.95 (severe), 0.92 (moderate), 0.82 (mild), and 0.91 (normal), outperforming ECG-only and EHR-only baselines, and maintained performance under temporal validation. This work supports ECG-based, multimodal LVEF stratification as a practical screening and triage aid to prioritize confirmatory imaging where resources are limited.
Continue ReadingA Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms

A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication

arXiv:2604.25972v1 Announce Type: new
Abstract: In multi-agent reinforcement learning (MARL), the integration of a communication mechanism, allowing agents to better learn to coordinate their actions and converge on their objectives by sharing information. Based on an interaction graph, a subclass of methods employs graph neural networks (GNNs) to learn the communication, enabling agents to improve their internal representations by enriching them with information exchanged. With growing research, we note a lack of explicit structure and framework to distinguish and classify MARL approaches with communication based on GNNs. Thus, this paper surveys recent works in this field. We propose a generalized GNN-based communication process with the goal of making the underlying concepts behind the methods more obvious and accessible.
Continue ReadingA Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication