Published research
2026
Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models
ICML
·
2026
·
doi:10.48550/arXiv.2602.12586
We introduce McDiffuSE, a framework that formulates slot selection in Masked Diffusion Models as decision making and optimises infilling orders through Monte Carlo Tree Search. Look-ahead simulations evaluate partial completions before commitment, systematically exploring the combinatorial space of generation orders. McDiffuSE achieves average gains of 3.2% over autoregressive baselines and 8.0% over plan-and-infill baselines, with notable improvements of 19.5% on MBPP and 4.9% on MATH500.
PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains
ACL (Findings)
·
2026
·
doi:10.48550/arXiv.2508.21787
We propose PiCSAR (Probabilistic Confidence Selection And Ranking), a simple training-free method for best-of-n sampling that scores candidate reasoning chains using the joint log-likelihood of the reasoning and final answer. This naturally decomposes into reasoning confidence and answer confidence. PiCSAR achieves substantial gains across diverse benchmarks (+10.18 on MATH500, +9.81 on AIME2025), outperforming baselines with at least 2× fewer samples in 16 out of 20 comparisons.
A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond
TMLR
·
2026
·
doi:10.48550/arXiv.2503.05954
We review deep generative modelling approaches for tabular data from the perspective of four types of requirements: utility of the synthetic data, alignment of the synthetic data with domain-specific knowledge, statistical fidelity of the synthetic data distribution compared to the real data distribution, and privacy-preserving capabilities.
2025
Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI
NeurIPS
·
2025
·
doi:10.48550/arXiv.2510.25497
Neurosymbolic AI models are prone to reasoning shortcuts — learning spurious correlations rather than the intended concepts. We introduce Prototypical Neurosymbolic architectures that avoid shortcuts at their root cause by training models to satisfy background knowledge while accounting for input similarity to a handful of labelled datapoints. We validate on the rsbench benchmark suite across synthetic (MNIST-EvenOdd, Kand-Logic) and real-world (BDD-OIA) tasks, showing significant improvements in learning the right concepts even in extremely low data regimes.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
ICLR
·
2025
·
doi:10.48550/arXiv.2502.18237
We introduce the Disjunctive Refinement Layer (DRL), a novel layer designed to enforce the alignment of generated data with the background knowledge specified in user-defined constraints. DRL is the first method able to automatically make deep learning models inherently compliant with constraints as expressive as quantifier-free linear formulas, which can define non-convex and even disconnected spaces.
A Posteriori Verification or a Priori Design? Navigating Requirements-Driven Deep Learning
ECAI
·
2025
·
doi:10.3233/FAIA250782
We contrast two approaches to ensuring machine learning systems satisfy formal requirements in safety-critical settings: integrating requirements directly into model architecture and training (a priori design) versus analyzing trained models for desired properties (a posteriori verification).
2024
How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data
ICLR
·
2024
·
doi:10.48550/arXiv.2402.04823
We show how deep generative models for tabular data can be constrained such that their generated samples are guaranteed to be compliant with given constraints. This is achieved by automatically parsing the constraints and transforming them into a Constraint Layer seamlessly integrated with the model.
Deep generative models as an adversarial attack strategy for tabular machine learning
ICMLC
·
2024
·
doi:10.48550/arXiv.2409.12642
We adapt popular tabular deep generative models into adversarial models and evaluate their effectiveness in generating realistic adversarial examples that conform to domain constraints.
CCN+: A neuro-symbolic framework for deep learning with requirements
International Journal of Approximate Reasoning
·
2024
·
doi:10.1016/j.ijar.2024.109124
We present CCN+, a neuro-symbolic framework that integrates logical requirements directly into neural network outputs using inference rules to ensure compliance, and adapts the standard binary cross-entropy loss for constraint satisfaction in deep learning.
ULLER: A Unified Language for Learning and Reasoning
NeSy (Spotlight)
·
2024
·
doi:10.1007/978-3-031-71167-1_12
We introduce ULLER, a unified language for learning and reasoning that standardises how background knowledge is expressed across neuro-symbolic AI frameworks. ULLER provides first-order logic syntax with multiple semantic interpretations.
PiShield: A PyTorch Package for Learning with Requirements
IJCAI
·
2024
·
doi:10.48550/arXiv.2402.18285
We introduce PiShield, the first package ever allowing for the integration of (propositional or linear) requirements into the neural networks’ topology. PiShield guarantees compliance with these requirements, regardless of input.