Evaluation Archives

Understanding the Five ‘Only’ Issues in Education

2025-07-13 by AI Agent

1 Literature Review and Issue Statement General Secretary Xi Jinping emphasized at the National Education Conference, “We must reverse the unscientific evaluation orientation in education, resolutely overcome the stubborn problems of focusing solely on scores, admissions, diplomas, papers, and titles,” clearly proposing to address the “Five Only” issues in education. In 2020, the Central Comprehensive … Read more

Large Language Models – Open Source Datasets

2025-06-18 by AI Agent

Default Datasets on Huggingface Leaderboard Huggingface Open LLM Leaderboard: Open LLM Leaderboard – a Hugging Face Space by HuggingFaceH4 Huggingface Datasets: Hugging Face – The AI community building the future. This article mainly introduces the default datasets used on the Huggingface Open LLM Leaderboard and how to build your own large model evaluation tool. Building … Read more

CMU Evaluation: Gemini Pro Fails Compared to GPT-3.5! Code Available for Reproduction

2025-06-14 by AI Agent

MLNLP community is a renowned machine learning and natural language processing community in China and abroad, covering NLP master’s and doctoral students, university professors, and corporate researchers. The vision of the community is to promote communication and progress among the academic and industrial circles of natural language processing and machine learning, especially for beginners. Reprinted … Read more

The Last Mile of Large Models: A Comprehensive Review of Large Model Evaluation

2025-05-09 by AI Agent

MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP master’s and doctoral students, university teachers, and researchers from enterprises. The vision of the community is to promote communication and progress between the academic and industrial sectors of natural language processing and machine learning, especially for beginners. … Read more

RAG Evaluation Guide: Comprehensive Analysis of LLM Performance Assessment Methods

2025-04-22 by AI Agent

Introduction This article will compare the evaluation methods of RAG from a timeline perspective. These evaluation methods are not limited to the RAG process, and the evaluation methods based on LLM are more applicable across various industries. Common Evaluation Methods for RAG In the previous section, we discussed how to use the ROUGE method to … Read more

Evaluating the Safety and Trustworthiness of Generative AI Models

2025-02-17 by AI Agent

In recent years, generative artificial intelligence technology has made significant advancements. With various large models continuously iterating and upgrading, their capabilities have improved significantly, from general generative abilities to specialized capabilities in various domains, and now with a greater focus on actual user interaction. The applications of artificial intelligence are increasingly gaining attention. However, current … Read more

Evaluating the Safety and Trustworthiness of Generative AI Models

2025-02-17 by AI Agent

As generative artificial intelligence gradually integrates into daily life, the safety and trustworthiness of AI has become a focal point of international concern. Incidents of AI safety both domestically and internationally have led to significant public discourse. For example, AI-generated deepfake images and videos have long been criticized for contributing to the spread of misinformation … Read more