publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. FaIRMaker
    Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
    Yue Xu, Chengyan Fu, Li Xiong, and 2 more authors
    NeurIPS, 2025
  2. MMJ-Bench
    Mmj-bench: A comprehensive study on jailbreak attacks and defenses for vision language models
    Fenghua Weng, Yue Xu, Chengyan Fu, and 1 more author
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  3. Dr. GAP
    Dr. GAP: Mitigating bias in large language models using gender-aware prompting with demonstration and reasoning
    Hongye^* Qiu, Yue^* Xu, Meikang Qiu, and 1 more author
    Preprint, 2025
  4. Genres
    From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship
    Yue Xu and Wenjie Wang
    Preprint, 2025

2024

  1. CIDER
    Cross-modality information check for detecting jailbreaking in multimodal large language models
    Yue Xu, Xiuyuan Qi, Zhan Qin, and 1 more author
    EMNLP Findings, 2024
  2. LinkPrompt
    Linkprompt: Natural and universal adversarial attacks on prompt-based language models
    Yue Xu and Wenjie Wang
    In NAACL, 2024

2023

  1. Certified Robustness on Toolformer
    Yue Xu and Wenjie Wang
    In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023