|
本帖最后由 godspeed66 于 2024-11-13 20:51 编辑
M4 MAX 128G
RAG对话。chatchat+xinference qwen2.5-72b-mlx-8bit
2024-11-12 22:46:38,018 xinference.model.llm.mlx.core 9938 INFO Average generation speed: 0.59 tokens/s.
2024-11-12 22:51:15,523 xinference.model.llm.mlx.core 9938 INFO Average generation speed: 0.53 tokens/s.
RAG对话。chatchat+xinference qwen2.5-32b-mlx-8bit
2024-11-12 22:59:05,647 xinference.model.llm.mlx.core 23115 INFO Average generation speed: 5.81 tokens/s.
2024-11-12 23:00:29,167 xinference.model.llm.mlx.core 23115 INFO Average generation speed: 6.27 tokens/s.
不知道为啥72b 在RAG环节完蛋了,32B的还行
基本判断是MAC环境下Xinference的问题 |
|