03. Llama 환경 구성하기(Mac)

spring-ai/01.기본

03. Llama 환경 구성하기(Mac)

은서파 2025. 5. 26. 15:14

이번 포스트에서는 mac 환경에서 llama를 설치하는 방법에 대해 알아보자. 앞으로의 설명은 llama를 통해서 진행할 계획이다. (이미 OpenAI 등의 api key를 가지고 있다면 llama를 구성할 필요는 당연히 없다.)

관련 내용은 다음 링크에 잘 나와있다.

https://www.llama.com/docs/llama-everywhere/

Llama Everywhere

Although Meta Llama models are often hosted by Cloud Service Providers, Meta Llama can be used in other contexts as well, such as Linux, the Windows Subsystem for Linux (WSL), macOS, Jupyter notebooks, and even mobile devices.

www.llama.com

llama 실행 환경 구성하기

ollama 설치

ollama.com으로 이동하면 자신의 OS에 적합한 application을 다운로드 할 수 있다.

또는 brew를 사용중이라면 다음 명령을 사용하면 된다.

brew install ollama

참고로 mac에서 model의 저장 위치를 변경하기 위해서 다음의 명령을 사용한다.

echo 'export OLLAMA_MODELS="/your/custom/path"' >> ~/.zshrc

기본 명령어

ollama를 실행하면 기본적인 명령어를 확인할 수 있다.

ollama
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

기본 명령은 다음의 단계를 거칠 수 있다.

ollama serve                  // ollama 서비스 시작
ollama pull #model_name       // model_name download

ollama list                   // 설치된 모델 확인
ollama run  #model_name       // model_name 실행하기
ollama stop #model_name       // model_name 중지하기

모델 선택

llama에서 사용할 모델은 meta , Hugging Face, Kaggle 등 에서 확인할 수 있다.

meta: https://www.llama.com/llama-downloads/
hugging face: https://huggingface.co/meta-llama
kaggle: https://www.kaggle.com/organizations/metaresearch/models
ollama: https://ollama.com/search

모델은 모델이 잘 하는 분야도 중요하지만 파라미터의 개수를 선택하는 것도 중요하다. 테스트용으로 사용하는 컴퓨터의 사양에 어울리는 모델을 골라야지 괜히 욕심 부렸다가는 컴퓨터가 앓아 눕는 경우가 존재한다. 당연히 파라미터의 개수가 많은 수록 모델의 성능은 좋아지며 그만큼 필요한 리소스가 엄청 늘어난다.

대략적인 가이드는 다음과 같다.

모델 파라미터	권장 램	권장 VRAM(GPU 사용 시)	비고
~3B	8GB 이상	4GB 이상	비교적 가볍고 빠르게 테스트 하기 좋음
~8B	16GB 이상	6GB 이상	일반적인 PC에서 돌리기 무난
~13B	32GB 이상	8GB 이상	좀 더 나은 성능을 위해 필요
~70B	64GB 이상	40GB 이상	서버급 하드웨어 필요

일반적인 PC에서 돌릴만한 모델로는 다음을 추천한다.

모델 및 파라미터	비고	설치
Llama 3.2 3B	성능이 우수하고 가벼움, Spring AI와 호환성 우수	ollama pull llama3.2:3b
gemma3:4b	Gemini의 lightweight 버전으로 Text, Image input 처리	ollama pull gemma3:4b-it-qat
qwen3:8b	thinking mode, tool calling을 지원하여 MCP 활용 가능.	ollama pull qwen3:8b

사용하기

ollama serve 명령을 이용해 서비스를 실행 후 모델을 실행하면 prompt가 변경되고 모델과 소통이 가능하다.

ollama run llama3.2:latest
>>> hello llama? 
Hello! Nice to meet you. I'm not actually a llama, but rather a computer program designed to chat with humans. Would 
you like to talk about llamas or something else entirely?
>>> /bye

하지만 매번 이렇게 질의하기가 좀 귀찮기 때문에 Msty 같은 ollama client를 설치할 수 있다.