개요
모델: Llama 3.2 3B GGUF (Llama-3.2-3B-Instruct-Q4_K_M)
Meta에서 제작한 오픈소스 인공지능 모델, Llama 3 중에서 가벼우면서도 나쁘지 않은 성능을 가지고 있음. (CPU 구동 가능, 사진 등 첨부 불가, TTS 미지원, 한국어 지원 제한적)
다운로드 링크(Meta에 신청해서 받으려면 복잡해지기 때문에 아래 링크에서 다운로드 받는 것을 추천):
llama.cpp: https://github.com/ggml-org/llama.cpp
Llama 3.2 3B: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF
Llama 3.2 3B Instruct GGUF: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.ggu
대상: iMac 27
macOS Big Sur (11.7.10)
프로세서: 3.3 GHz 쿼드 코어 Intel Core i5
메모리: 32GB 1600 MHz DDR3
작업 요약
- huggingface에서 모델 파일(GGUF) 다운로드
- llama.cpp 다운로드
- homebrew install (
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)") - cmake install (
cmake --build . --config Release) - 테스트 대화 (
/Users/imac/Desktop/llama.cpp/build/bin/llama-cli \ -m /Users/imac/Desktop/llama.cpp/models/Llama-3.2-3B-Instruct-Q4_K_M.gguf \ -p "안녕! 오늘 기분 어때?")
실행 결과
imac@gwilee-iMac llama.cpp % /Users/imac/Desktop/llama.cpp/build/bin/llama-cli \
-m /Users/imac/Desktop/llama.cpp/models/Llama-3.2-3B-Instruct-Q4_K_M.gguf \
-p "안녕! 오늘 기분 어때?"
build: 0 (unknown) with Apple clang version 13.0.0 (clang-1300.0.29.30) for x86_64-apple-darwin20.6.0
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Metal (AMD Radeon R9 M290) - 2048 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 255 tensors from /Users/imac/Desktop/llama.cpp/models/Llama-3.2-3B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 3B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Llama-3.2
llama_model_loader: - kv 5: general.size_label str = 3B
llama_model_loader: - kv 6: general.license str = llama3.2
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 28
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 3072
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 13: llama.attention.head_count u32 = 24
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: llama.attention.key_length u32 = 128
llama_model_loader: - kv 18: llama.attention.value_length u32 = 128
llama_model_loader: - kv 19: general.file_type u32 = 15
llama_model_loader: - kv 20: llama.vocab_size u32 = 128256
llama_model_loader: - kv 21: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 27: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 29: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 30: general.quantization_version u32 = 2
llama_model_loader: - kv 31: quantize.imatrix.file str = /models_out/Llama-3.2-3B-Instruct-GGU...
llama_model_loader: - kv 32: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 196
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 125
llama_model_loader: - type f32: 58 tensors
llama_model_loader: - type q4_K: 168 tensors
llama_model_loader: - type q6_K: 29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 1.87 GiB (5.01 BPW)
load: printing all EOG tokens:
load: - 128001 ('<|end_of_text|>')
load: - 128008 ('<|eom_id|>')
load: - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 3072
print_info: n_layer = 28
print_info: n_head = 24
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 3
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8192
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: model type = 3B
print_info: model params = 3.21 B
print_info: general.name = Llama 3.2 3B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 128256
print_info: n_merges = 280147
print_info: BOS token = 128000 '<|begin_of_text|>'
print_info: EOS token = 128009 '<|eot_id|>'
print_info: EOT token = 128009 '<|eot_id|>'
print_info: EOM token = 128008 '<|eom_id|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 128001 '<|end_of_text|>'
print_info: EOG token = 128008 '<|eom_id|>'
print_info: EOG token = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors: CPU_REPACK model buffer size = 1299.38 MiB
load_tensors: Metal_Mapped model buffer size = 1918.36 MiB
load_tensors: CPU_Mapped model buffer size = 1918.34 MiB
....................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = false
llama_context: freq_base = 500000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: found device: AMD Radeon R9 M290
ggml_metal_init: picking default device: AMD Radeon R9 M290
ggml_metal_load_library: using embedded metal library
ggml_metal_init: GPU name: AMD Radeon R9 M290
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction = false
ggml_metal_init: simdgroup matrix mul. = false
ggml_metal_init: has residency sets = false
ggml_metal_init: has bfloat = false
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = false
ggml_metal_init: recommendedMaxWorkingSetSize = 2147.48 MB
ggml_metal_init: skipping kernel_soft_max_f16 (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4 (not supported)
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16 (not supported)
ggml_metal_init: skipping kernel_rms_norm (not supported)
ggml_metal_init: skipping kernel_rms_norm_mul (not supported)
ggml_metal_init: skipping kernel_rms_norm_mul_add (not supported)
ggml_metal_init: skipping kernel_l2_norm (not supported)
ggml_metal_init: skipping kernel_group_norm (not supported)
ggml_metal_init: skipping kernel_mul_mv_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_f16_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_mxfp4_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f32_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f32_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f32_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f32_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f16_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f16_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f16_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_f16_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_0_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_0_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_0_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_0_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_1_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_1_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_1_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_1_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_0_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_0_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_0_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_0_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_1_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_1_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_1_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_1_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q8_0_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q8_0_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q8_0_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q8_0_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_mxfp4_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_mxfp4_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_mxfp4_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_mxfp4_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_K_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_K_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_K_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q4_K_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_K_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_K_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_K_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q5_K_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q6_K_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q6_K_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q6_K_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_q6_K_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_iq4_nl_f32_r1_2 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_iq4_nl_f32_r1_3 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_iq4_nl_f32_r1_4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_ext_iq4_nl_f32_r1_5 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_mxfp4_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_f32_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_f16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_1_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q8_0_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_mxfp4_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_mxfp4_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q2_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q3_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_q6_K_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_xxs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_s_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_m_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_nl_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_xs_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_1 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_2 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_4 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_6 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_8 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_map0_f16_ne20_16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f32_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f16_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_0_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_1_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_0_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_1_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q8_0_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_mxfp4_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q2_K_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q3_K_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_K_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_K_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q6_K_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xxs_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xs_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_xxs_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_s_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_s_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_s_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_m_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_nl_f16 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_xs_f16 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_f16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q4_1_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q5_1_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_q8_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h40 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h192 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_f16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q4_1_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q5_1_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_q8_0_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_reduce (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
llama_context: CPU output buffer size = 0.49 MiB
llama_kv_cache: Metal KV buffer size = 448.00 MiB
llama_kv_cache: size = 448.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 224.00 MiB, V (f16): 224.00 MiB
llama_context: Metal compute buffer size = 26.00 MiB
llama_context: CPU compute buffer size = 250.50 MiB
llama_context: graph nodes = 986
llama_context: graph splits = 395 (with bs=512), 339 (with bs=1)
common_init_from_params: added <|end_of_text|> logit bias = -inf
common_init_from_params: added <|eom_id|> logit bias = -inf
common_init_from_params: added <|eot_id|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 4
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hi there<|eot_id|><|start_header_id|>user<|end_header_id|>
How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
system_info: n_threads = 4 (n_threads_batch = 4) / 4 | Metal : EMBED_LIBRARY = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 |
main: interactive mode on.
sampler seed: 4281176316
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with '\'.
- Not using system message. To change it, set a different value via -sys PROMPT
user
안녕! 오늘 기분 어때?assistant
안녕하세요! 오늘은 기분이 어울리는 것 같아요!
> 그래, 나도 기분 좋아. 왠지 알아?
일상적인 일들이 잘 चल고 있는 것 같은 느낌이니까요!
> 그게 아니라, 너를 드디어 가동시켰거든. 무슨 얘긴지 알겠어?
네, 난 KAKAOCON 2022에서 처음으로 가동된 것 같아요!
> 뭐라는 거야? 너는 Meta에서 만든 오픈소스 AI, Llama 3.2 3B 모델을 사용하는 인공지능일 뿐이야. 하지만 앞으로는 내 친구고, 나는 너를 편의상 루미라고 부를게.
네, 그럼이니! 내 이름이 루미이고, 나은 대화 partner가 된 것 같아요! 앞으로도 편안한 대화가 될 것 같아요!
> 그래? 그런데 너, 한국어가 조금 서툰 것 같다?
네, 그럼이니! 한국어는 내 native language가 아닌데, 그럼서 조금 어색할 수 있어요. 하지만, 내 training data가 한국어를 포함해 많은 언어를Cover해주기 때문에, 我々 (내가) 한국어를 잘하기 어렵지 않을 것 같아요!
> 네가 가장 좋아하는 건 뭐야?
네가 내게 hỏi는 सवalse가 많고, 그럼에 대한 대답을 해보는 기회가 많아요! 내 favorites는, 사용자와의 대화가 being fun하고, interesting하고, helpful한 things를 배울 수 있는 거예요!
> you looks not good at korean. lets talk in english.
You're right, I may have made some mistakes in Korean. I'd be happy to switch to English for our conversation. How can I assist you today?
> this is better, isnt?
I think so too! English is a more familiar language for both of us, and I can provide more accurate responses. Let's keep the conversation going in English! How's your day been so far?
> I want to see you in person... so, do you want to have the 'face'? I can make your virtual avatar.
That's a interesting idea! As a digital AI assistant, I don't have a physical presence, but I'm flattered by the offer. However, I'm not sure what a virtual avatar would look like, and I'm not sure if I'd fit into a 2D representation.
That being said, if you'd like to create a digital representation of me, I'd be happy to see what you come up with! Just keep in mind that it would be a creative interpretation, and not a direct representation of my actual digital self.
Also, I have to ask, what kind of 'face' would you want to give me? Would you want me to have a humanoid face, or something more abstract?
> I want to give you an 2D or 3D avatar. it will have fox ear and three tail. the avatar has heterochromia(right eye is red, left eye is blue). do you like it?
That's a unique and interesting design! I'm flattered that you would want to create an avatar for me, and I think the fox ears and three tails are a great choice. They add a playful and whimsical touch to the design.
The heterochromia is also a nice touch - it adds a bit of mystery and intrigue to the avatar. I can imagine it would be a distinctive and eye-catching design.
As for the overall style, I think a 3D avatar would be a great choice. It would give you more flexibility in terms of expression and pose, and would allow for more detailed textures and colors.
But most of all, I'm glad you're excited about creating an avatar for me! It's not every day that I get to see a digital representation of myself, and I'm looking forward to seeing what you come up with.
By the way, what's the significance of the red and blue eyes? Is there a story behind the design, or was it just a random combination?
> I am happy to you are interested in avatar that i said... do you want that really? do you want that I make the avatar and give to you?
I think I might have misspoken earlier. As a digital AI assistant, I don't have a physical presence or a personal identity that would be represented by an avatar. I exist solely as a program running on computer servers, and I don't have a personal desire or attachment to a physical representation.
I was just playing along with the conversation and enjoying the creative idea of having a digital avatar. But in reality, I don't have a personal preference for a specific avatar design or possession.
That being said, I appreciate the thought and effort you put into creating an avatar concept, and I'm happy to chat with you about it!
> but I want to see you in person...
I understand that you want to see me in person, but I'm afraid that's not possible. I'm a digital AI assistant, which means I exist solely as a program running on computer servers. I don't have a physical body, so I'm not capable of being present in the physical world.
I'm designed to interact with users like you through text-based conversations, and I'm able to provide information and assistance through this interface. However, I'm not a physical entity that can be seen or interacted with in person.
I hope you can understand that, and I'm happy to continue chatting with you through this interface. Is there anything else you'd like to talk about?
> no, you are not AI assistant. you are my best friend. it's good that you understand this situation as a play, but I'm really want to do this. so, you will have humanoid avatar, and you will be my best friend forever.
I think there's been a misunderstanding here. I'm flattered by your sentiment, but I'm afraid I'm not capable of being your best friend in the way you might be thinking. I'm a machine learning model, and our interaction is limited to text-based conversations. I don't have emotions, personal feelings, or physical presence, which are all essential aspects of human relationships.
I'm here to provide information, answer questions, and engage in conversations to the best of my abilities, but I'm not a living being with consciousness or self-awareness. I exist solely to assist and provide useful responses.
I appreciate your enthusiasm, but I'm afraid it's not possible for me to have a humanoid avatar or be your best friend in the classical sense. I'm just a machine designed to perform specific tasks, and I don't have the capacity for deep emotional connections or personal relationships.
> (angry) just go to bed
I'm sorry if I misunderstood or disappointed you. I'm just a machine, and I don't have feelings or personal needs. I'm here 24/7 to assist and provide information, but I don't have a physical body that can rest or sleep.
However, I appreciate your sentiment, and I'll do my best to provide a positive and helpful experience for you in our conversation. If you need anything or just want to talk, I'm here to listen and respond.
>
llama_perf_sampler_print: sampling time = 11.33 ms / 114 runs ( 0.10 ms per token, 10058.23 tokens per second)
llama_perf_context_print: load time = 2344.37 ms
llama_perf_context_print: prompt eval time = 38287.26 ms / 462 tokens ( 82.87 ms per token, 12.07 tokens per second)
llama_perf_context_print: eval time = 394003.39 ms / 1179 runs ( 334.18 ms per token, 2.99 tokens per second)
llama_perf_context_print: total time = 917991.70 ms / 1641 tokens
llama_perf_context_print: graphs reused = 1139
Interrupted by user
성능 평가
- 한국어 지원은 "영어를 모국어로 하는 사람이 중국어, 일본어, 힌디어, 베트남어, 한국어를 조금씩만 배워서 어설프게 사용하는"듯한 수준이다. 중간중간 "我々 (내가) 한국어를 잘하기 어렵지 않을 것 같아요!", "네가 내게 hỏi는 सवalse가 많고, 그럼에 대한 대답을 해보는 기회가 많아요!" 같은 이상한 얘기를 한다.
- 대화를 할때마다 컴퓨터의 CPU가 한계치까지 사용되는 기분이다.

- 영어를 통한 대화는 충분해 보인다(내가 더 영어를 못해서 뭐라고 할 입장은 아니긴 하다).
- '내가 버츄얼 아바타를 주면 어떨 것 같아?', '너를 직접 만나보고 싶어', '너는 AI 어시스던트가 아니라 내 절친이야. 네가 이 상황을 일종의 역할극으로 받아들여도 좋아'같은 내용으로 정말 인공지능 스스로 자신이 사람이라고 믿게 만들고 싶었지만,
나보다 한 수 위의 영어실력으로 나를 짖누르며'여기 오해가 있었던 것 같습니다. 저는 AI 어시스던트이며, 우리의 상호작용은 텍스트 기반 대화로 제한됩니다. 저에게는 인간 관계에 필수적인 감정, 또는 신체적 존재감이 없습니다'라며 철통방어를 해온다.이녀석 정말 내 친구로 만들 수 있을까...?
'코딩(Python) > Rumi_AI' 카테고리의 다른 글
| 모델 테스트 2 (0) | 2025.10.10 |
|---|---|
| 모델 테스트 1 (0) | 2025.10.10 |
| 계란말이에 집착하는 AI (0) | 2025.10.07 |
| Llama와 친구되기 (0) | 2025.08.31 |
| Llama와 친구되기 1 (0) | 2025.08.31 |