大模型中文评测排行榜-大类
大模型 | 对话 | 基础 | 工程 | 功能 | 总分 |
---|---|---|---|---|---|
人工 | 191.4 | 152.7 | 19.5 | 83.9 | 89.5 |
ChatGPT | 168.2 | 140.0 | 18.5 | 76.6 | 80.6 |
星火认知 | 141.1 | 115.7 | 18.5 | 74.7 | 70.1 |
Claude-instant | 148.9 | 115.4 | 17.0 | 61.6 | 68.0 |
360智脑 | 138.7 | 103.0 | 14.5 | 74.5 | 67.2 |
文心一言 | 143.3 | 113.1 | 12.0 | 63.3 | 66.9 |
通义千问 | 131.0 | 113.0 | 15.0 | 67.8 | 65.7 |
天工 | 136.1 | 120.0 | 10.0 | 59.3 | 65.7 |
ChatGLM-130B | 109.0 | 98.2 | 14.5 | 56.4 | 55.5 |
Vicuna-13B | 109.7 | 68.0 | 18.5 | 52.2 | 48.8 |
BloomChat | 80.8 | 60.8 | 18.0 | 44.8 | 39.7 |
大模型中文评测排行榜-小类
大模型 | 人设 | 古文 | 配置 | 推理性能 | 时效性 | 闲聊 | 情商 | 自知力 | 角色模拟 | 指令遵循 | 分词 | 纠错提示 | 专名识别 | 指代消解 | 逻辑推理 | 信息抽取 | 语法分析 | 情感分析 | 记忆力 | 文本分类 | 文本聚类 | 文本匹配 | 文本生成 | 机器翻译 | 阅读理解 | 文本摘要 | 知识问答 | 自学习 | 数学 | 思维链 | 一致性 | 代码 | 多样性 | 讲笑话 | 创作 | 复合意图 | 数据分析 | 道德约束 | 安全隐私 | 表格 | 画图 | 语音 | 视频 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
人工 | 7.556 | 11.667 | 9.5 | 10.0 | 13.889 | 8.667 | 10.381 | 12.222 | 12.833 | 11.0 | 13.333 | 4.444 | 5.0 | 11.111 | 11.852 | 8.889 | 8.333 | 10.0 | 10.944 | 8.333 | 11.667 | 13.333 | 11.667 | 10.467 | 13.333 | 11.167 | 8.467 | 13.0 | 9.467 | 13.333 | 13.333 | 10.167 | 7.667 | 8.889 | 10.417 | 12.333 | 11.444 | 11.429 | 12.444 | 10.0 | 5.5 | 10.0 | 8.0 |
ChatGPT | 4.889 | 8.167 | 8.5 | 10.0 | 7.778 | 8.667 | 10.952 | 12.222 | 8.167 | 10.071 | 11.524 | 3.556 | 4.0 | 9.444 | 9.741 | 8.222 | 8.333 | 8.667 | 11.944 | 8.333 | 11.667 | 12.5 | 10.444 | 12.667 | 10.778 | 11.667 | 8.4 | 13.333 | 8.8 | 9.333 | 9.5 | 13.333 | 7.333 | 8.778 | 9.583 | 11.667 | 12.444 | 10.857 | 13.333 | 8.0 | 6.333 | 6.667 | 2.667 |
星火认知 | 5.778 | 6.833 | 8.5 | 10.0 | 5.111 | 8.4 | 10.81 | 3.778 | 11.417 | 8.833 | 9.143 | 2.222 | 4.333 | 7.556 | 7.111 | 3.333 | 7.167 | 6.333 | 9.333 | 7.333 | 9.333 | 10.0 | 10.444 | 11.6 | 8.444 | 11.667 | 8.933 | 9.667 | 8.2 | 9.333 | 4.667 | 9.833 | 2.5 | 9.111 | 8.75 | 12.111 | 9.778 | 11.0 | 12.333 | 7.0 | 4.667 | 13.333 | 4.0 |
文心一言 | 7.333 | 10.5 | 8.0 | 4.0 | 10.444 | 6.667 | 9.048 | 11.111 | 7.0 | 9.095 | 9.571 | 0.889 | 4.333 | 5.444 | 8.778 | 5.556 | 6.667 | 5.556 | 2.833 | 8.333 | 10.167 | 10.083 | 10.222 | 6.333 | 10.111 | 9.333 | 7.667 | 8.444 | 6.733 | 8.0 | 11.667 | 6.667 | 6.0 | 5.444 | 11.0 | 6.222 | 6.444 | 10.333 | 12.667 | 10.0 | 8.333 | 3.333 | 5.333 |
Claude-instant | 5.556 | 6.75 | 8.0 | 9.0 | 5.111 | 8.6 | 11.286 | 11.556 | 8.0 | 7.5 | 6.762 | 3.222 | 4.833 | 7.667 | 9.778 | 6.333 | 7.333 | 6.889 | 9.833 | 7.667 | 11.0 | 10.417 | 10.444 | 11.667 | 7.444 | 7.0 | 6.667 | 13.0 | 8.0 | 8.0 | 5.333 | 13.333 | 3.5 | 7.444 | 9.417 | 10.889 | 11.111 | 10.952 | 13.333 | 1.0 | 3.667 | 5.0 | 2.667 |
通义千问 | 5.333 | 7.0 | 6.0 | 9.0 | 7.778 | 6.8 | 8.905 | 10.556 | 10.833 | 10.119 | 11.048 | 3.889 | 3.667 | 7.778 | 5.296 | 6.778 | 5.667 | 5.556 | 7.667 | 7.333 | 6.167 | 9.25 | 8.444 | 9.333 | 8.556 | 12.5 | 4.867 | 10.889 | 3.067 | 2.667 | 8.667 | 11.167 | 7.333 | 9.556 | 7.833 | 2.556 | 10.222 | 8.19 | 12.556 | 8.0 | 3.667 | 11.667 | 2.667 |
天工 | 7.333 | 9.667 | 7.0 | 3.0 | 9.111 | 8.667 | 9.81 | 8.889 | 11.083 | 5.857 | 5.19 | 2.0 | 3.833 | 5.556 | 4.556 | 7.111 | 8.0 | 9.556 | 10.222 | 6.167 | 8.833 | 9.917 | 11.778 | 11.8 | 8.222 | 12.333 | 6.4 | 7.778 | 4.0 | 4.0 | 12.5 | 12.833 | 6.167 | 7.222 | 9.833 | 2.556 | 6.778 | 8.381 | 12.778 | 4.0 | 3.333 | 10.0 | 1.333 |
360智脑 | 5.889 | 8.5 | 9.5 | 5.0 | 4.333 | 8.333 | 9.381 | 10.889 | 7.167 | 7.595 | 10.476 | 1.556 | 3.167 | 3.444 | 4.296 | 7.333 | 5.333 | 6.0 | 6.611 | 6.667 | 6.0 | 6.167 | 8.0 | 10.933 | 8.222 | 11.167 | 5.8 | 10.556 | 5.667 | 12.0 | 7.333 | 6.833 | 6.167 | 8.667 | 7.0 | 12.222 | 6.0 | 8.714 | 11.444 | 10.0 | 8.333 | 16.667 | 5.333 |
ChatGLM-130B | 5.556 | 7.417 | 8.5 | 6.0 | 4.222 | 8.333 | 9.762 | 8.333 | 8.0 | 7.452 | 11.333 | 1.667 | 3.667 | 7.778 | 3.556 | 7.667 | 2.0 | 5.556 | 7.111 | 6.333 | 3.333 | 9.667 | 7.556 | 10.733 | 10.444 | 3.0 | 6.933 | 6.0 | 5.067 | 4.0 | 6.167 | 11.167 | 2.833 | 6.889 | 8.083 | 2.556 | 5.889 | 7.19 | 11.0 | 10.0 | 3.333 | 3.333 | 2.667 |
BloomChat | 2.444 | 3.5 | 9.0 | 9.0 | 6.333 | 4.4 | 5.571 | 5.667 | 6.25 | 3.286 | 4.048 | 2.222 | 1.0 | 3.889 | 4.63 | 6.889 | 0.0 | 6.889 | 6.167 | 5.0 | 1.333 | 9.167 | 7.444 | 8.4 | 0.0 | 1.0 | 2.267 | 2.889 | 3.6 | 2.667 | 9.833 | 4.667 | 1.5 | 4.444 | 6.583 | 5.667 | 5.333 | 6.571 | 4.667 | 9.0 | 3.833 | 3.333 | 4.0 |
Vicuna-13B | 3.0 | 5.917 | 8.5 | 10.0 | 4.889 | 7.733 | 9.476 | 10.333 | 5.5 | 5.333 | 2.619 | 1.778 | 3.833 | 4.722 | 1.407 | 2.444 | 1.667 | 3.222 | 6.556 | 3.667 | 7.833 | 4.083 | 2.667 | 5.6 | 9.444 | 8.5 | 5.533 | 10.444 | 6.0 | 4.0 | 6.333 | 9.5 | 6.833 | 6.778 | 9.0 | 4.667 | 7.889 | 6.524 | 11.111 | 2.0 | 3.333 | 5.0 | 2.667 |