Mac running chatglm2-6b

Original link:https://blog.kelu.org/tech/2023/06/30/mac-chatglm2-6b.html

mac.jpg

I am running on Mac studio M2 Max. This article documents the process of running it. Here are some of my version info:

  • MacOS 13.4
  • Shared memory: 96G
  • conda 23.5.0
  • Python 3.11.4
  • pip 23.1.2

1. Environmental preparation

If you are not familiar with the use of python, you can refer to my previous articles about conda and switch to the virtual environment for operation.

Download the source code on GitHub. https://github.com/THUDM/ChatGLM2-6B

Use domestic sources (Tsinghua University) to install dependencies, otherwise the speed will be very slow.

 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple 

image-20230702073622684

2. Download the model

https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/

You can also download the model from huggingface.co . It didn’t work for me though.

 brew install git-lfs 

image-20230702074106842

Install.

image-20230702074138924

 pip install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

You can specify the downloaded model address in the code. I waited for the download to start after running python web_demo.py , and then directly replaced the cache.

My default download path is this:

 ~/.cache/huggingface/hub/models--THUDM--chatglm2-6b/snapshots/c57e892806dfe383cd5caf09719628788fe96379 

image-20230702093408246

3. Run the demo

1. web demo

 python web_demo.py 

image-20230702102520122

image-20230702101903189

You can notice that there is a warning:

 /modeling_chatglm.py:1173: UserWarning: MPS: no support for int64 min/max ops, casting it to int32 (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1682343668887/work/aten/src/ATen/native/mps/operations/ReduceOps.mm:1271.) if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):

Run demo2:

 pip install streamlit streamlit-chat -i https://pypi.tuna.tsinghua.edu.cn/simple 

image-20230702105953743

 streamlit run web_demo2.py 

image-20230702110501103

2. Command line demo

image-20230702111002384

3. APIs

image-20230702111601859

 curl -X POST "http://127.0.0.1:8000" \ -H 'Content-Type: application/json' \ -d '{"prompt": "你和chatgpt哪个更好?", "history": []}' 

image-20230702111734556

image-20230702111758665

Fourth, some problems encountered:

  1. As long as the system agent is turned on, this error will be reported.

     requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm2-6b/resolve/main/tokenizer_config.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1002)')))

    After checking a lot of information, I can’t solve it. After I turned off the proxy, there was:

     assert os.path.isfile(model_path), model_path ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen genericpath>", line 30, in isfile TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

    This error is missing model files.

    So if you don’t open the proxy, you won’t automatically download the file and you can’t find the model. If you open the proxy, you won’t be able to download the model.

    The final solution turned out to be to open a global proxy. If it is a regular mode, then add huggingface.co into it.

  2. Running error:

     File "/Users/kelu/Workspace/Miniforge3/envs/pytorch_env/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

    It is still necessary to read the official documentation carefully. Mac deployment needs to modify the loading method of the model:

     model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

This article is transferred from:https://blog.kelu.org/tech/2023/06/30/mac-chatglm2-6b.html
This site is only for collection, and the copyright belongs to the original author.