Original link:https://blog.kelu.org/tech/2023/06/30/mac-chatglm2-6b.html

I am running on Mac studio M2 Max. This article documents the process of running it. Here are some of my version info:

MacOS 13.4
Shared memory: 96G
conda 23.5.0
Python 3.11.4
pip 23.1.2

1. Environmental preparation

If you are not familiar with the use of python, you can refer to my previous articles about conda and switch to the virtual environment for operation.

Download the source code on GitHub. https://github.com/THUDM/ChatGLM2-6B

Use domestic sources (Tsinghua University) to install dependencies, otherwise the speed will be very slow.

 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

2. Download the model

https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/

You can also download the model from huggingface.co . It didn’t work for me though.
 brew install git-lfs 
Install.
 pip install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

You can specify the downloaded model address in the code. I waited for the download to start after running python web_demo.py , and then directly replaced the cache.

My default download path is this:

 ~/.cache/huggingface/hub/models--THUDM--chatglm2-6b/snapshots/c57e892806dfe383cd5caf09719628788fe96379

3. Run the demo

1. web demo

 python web_demo.py

You can notice that there is a warning:

 /modeling_chatglm.py:1173: UserWarning: MPS: no support for int64 min/max ops, casting it to int32 (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1682343668887/work/aten/src/ATen/native/mps/operations/ReduceOps.mm:1271.) if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):

Run demo2:

 pip install streamlit streamlit-chat -i https://pypi.tuna.tsinghua.edu.cn/simple

 streamlit run web_demo2.py

2. Command line demo

3. APIs

 curl -X POST "http://127.0.0.1:8000" \ -H 'Content-Type: application/json' \ -d '{"prompt": "你和chatgpt哪个更好？", "history": []}'

Fourth, some problems encountered:

As long as the system agent is turned on, this error will be reported.

 requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm2-6b/resolve/main/tokenizer_config.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1002)')))

After checking a lot of information, I can’t solve it. After I turned off the proxy, there was:

 assert os.path.isfile(model_path), model_path ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen genericpath>", line 30, in isfile TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

This error is missing model files.

So if you don’t open the proxy, you won’t automatically download the file and you can’t find the model. If you open the proxy, you won’t be able to download the model.

The final solution turned out to be to open a global proxy. If it is a regular mode, then add huggingface.co into it.

Running error:

 File "/Users/kelu/Workspace/Miniforge3/envs/pytorch_env/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

It is still necessary to read the official documentation carefully. Mac deployment needs to modify the loading method of the model:

 model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

This article is transferred from:https://blog.kelu.org/tech/2023/06/30/mac-chatglm2-6b.html
This site is only for collection, and the copyright belongs to the original author.