๐Ÿค– KUKU ์ฑ—๋ด‡ Overview

์™ธ๊ตญ๋ฏผ KUKU (OURS)

image

  • ์‹œ๋‚˜๋ฆฌ์˜ค 1 : ์œ ์ € ์ฟผ๋ฆฌ๊ฐ€ ์ผ์ƒ๋Œ€ํ™”์ผ ๋•Œ

์œ ์ € ์ฟผ๋ฆฌ๊ฐ€ ์งˆ๋ฌธ์ด ์•„๋‹Œ, โ€˜์•ˆ๋…•, โ€˜๋ฐฐ๊ณ ํŒŒโ€™, โ€˜์˜ค๋Š˜ ๊ธฐ๋ถ„์ด ์ข‹์•„โ€™ ๊ฐ™์€ casual ๋Œ€ํ™”์ผ ๋•Œ ํŒŒ์ธํŠœ๋‹๋œ LLM์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ผ์šฐํ„ฐ๋ฅผ ํ†ตํ•ด ์ฟผ๋ฆฌ๊ฐ€ โ€˜์งˆ๋ฌธ์ธ์ง€ ์•„๋‹Œ์ง€โ€™๋กœ ๋ถ„๋ฅ˜๋œ๋‹ค. causal ๋Œ€ํ™”์ผ ๋•Œ๋Š” โ€˜๊ตญ๋ฏผ๋Œ€ํ•™๊ต ํ•™์ƒ๋“ค๊ณผ ๋Œ€ํ™”๋ฅผ ํ•˜๋Š” ์นœ์ ˆํ•œ ์–ด์‹œ์Šคํ„ดํŠธโ€™ ๋ผ๋Š” ์‹œ์Šคํ…œ ๋ฉ”์‹œ์ง€๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” LLM ๋ชจ๋ธ๋กœ ์ „๋‹ฌ๋˜์–ด ์ผ์ƒ์ ์ธ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ณ , ์œ ์ €ํ•œํ…Œ ์ „๋‹ฌํ•œ๋‹ค.

  • ์‹œ๋‚˜๋ฆฌ์˜ค 2 : ์œ ์ € ์ฟผ๋ฆฌ๊ฐ€ ์งˆ๋ฌธ์ผ ๋•Œ

์งˆ๋ฌธ ์ฟผ๋ฆฌ๋Š” ์šฐ์„  ์–ธ์–ด์™€ ์ƒ๊ด€ ์—†์ด ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญ๋œ๋‹ค. ๊ทธ ์ด์œ ๋Š” ์ˆ˜์ง‘ํ•œ ๊ฑฐ์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ํ•œ๊ตญ์–ด๋กœ ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ฟผ๋ฆฌ์™€ ๋ฌธ์„œ์˜ ์œ ์‚ฌ๋„ ๋น„๊ต๋ฅผ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค. ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋‘ ๋ฒกํ„ฐํ™” ๋˜์–ด ์žˆ๊ณ  โ€˜๊ณต์ง€์‚ฌํ•ญ ๊ด€๋ จโ€™, โ€˜ํ•™๊ต ์ƒํ™œ ๊ด€๋ จโ€™, โ€˜๊ทธ ์™ธ ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐโ€™ ์„ธ ๊ฐ€์ง€ ๋ถ„๋ฅ˜๋กœ ๋‚˜๋ˆ ์ ธ ์žˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ์— ์šฐ๋ฆฌ๊ฐ€ ์„ค์ •ํ•œ ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ langchain์˜ ์•™์ƒ๋ธ” ๊ฒ€์ƒ‰๊ธฐ๋ฅผ ์ˆ˜ํ–‰๋œ๋‹ค. ์ƒ์œ„ K๊ฐœ(k=10)๊ฐœ์˜ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ LLM์€ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•œ๋‹ค. ์ดํ›„, ์ด ๋‹ต๋ณ€์€ โ€˜๋‹ต๋ณ€์ด ์ ์ ˆํ•œ์ง€ ์•„๋‹Œ์ง€โ€™ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด Fine tuning๋œ LLM์— ์ „๋‹ฌ๋œ๋‹ค.

  • ์‹œ๋‚˜๋ฆฌ์˜ค 2-1 : ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์ด ์ ์ ˆํ•  ๋•Œ

์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์ด ์ ์ ˆํ•  ๋•Œ, ํ•œ๊ตญ์–ด๋กœ ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์„ ์‚ฌ์šฉ์ž๊ฐ€ ๊ธฐ๋Œ€ํ•˜๋Š” ์–ธ์–ด๋กœ ๋ฒˆ์—ญ ๋˜์–ด ์‚ฌ์šฉ์ž์—๊ฒŒ ์ „๋‹ฌ ๋œ๋‹ค.

  • ์‹œ๋‚˜๋ฆฌ์˜ค 2-2 : ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์ด ์ ์ ˆํ•˜์ง€ ์•Š์„ ๋•Œ

์ด์ „์— ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์„ ๋ฒ„๋ฆฌ๊ณ  ๊ตฌ๊ธ€ ๊ฒ€์ƒ‰ ๊ธฐ๋ฐ˜ RAG ์‹œ์Šคํ…œ โ€˜Tavily Search APIโ€™๋ฅผ ํ†ตํ•ด ์ƒˆ๋กญ๊ฒŒ ๋‹ต๋ณ€์„ ๊ตฌ์„ฑํ•œ๋‹ค. ์ดํ›„ ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์„ ์‚ฌ์šฉ์ž๊ฐ€ ๊ธฐ๋Œ€ํ•˜๋Š” ์–ธ์–ด๋กœ ๋ฒˆ์—ญ ํ•˜์—ฌ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ „๋‹ฌํ•œ๋‹ค.

Simple RAG

image

Usage

๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘ ์„ ํƒ

1. ์™ธ๊ตญ๋ฏผ App (์ถ”์ฒœ) gpt-4o ๋ชจ๋ธ

ํ”Œ๋ ˆ์ด์Šคํ† ์–ด์˜ ์™ธ๊ตญ๋ฏผ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋‹ค์šด๋ฐ›์•„ ์ฑ—๋ด‡์„ ์‚ฌ์šฉ

2. Python

์ด ๋ฐฉ๋ฒ•์€ API KEY๊ฐ€ ํ•„์š” ํ•ฉ๋‹ˆ๋‹ค!

  1. git clone https://github.com/kookmin-sw/capstone-2024-30.git
  2. cd YOUR PATH/ai/
  3. pip install -r requirements.txt
  4. ๋ฒกํ„ฐ ์ €์žฅ์†Œ FAISS ํด๋”๋ฅผ /ai ์— ์œ„์น˜ ๋‹ค์šด๋กœ๋“œ ๋งํฌ
  5. /ai์— .env ํŒŒ์ผ ์ƒ์„ฑ
    OPENAI_API_KEY = 
    LANGCHAIN_API_KEY = 
    TAVILY_API_KEY = 
    CHANNEL_ID = 
    DEEPL_API_KEY = 
    PAPAGO_ID = 
    PAPAGO_API_KEY = 
    
  6. python run_chatbot.py

3. DiscordBot gpt-3.5-turbo

image

  • ๋””์Šค์ฝ”๋“œ ์ฑ„๋„์˜ KUKU ๋ด‡ ์ดˆ๋Œ€ ํ›„ ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ์ฑ„ํŒ…
  • ๋ด‡ ์ดˆ๋Œ€ ๊ถŒํ•œ ํ•„์š” (์„œ๋ฒ„ ๊ด€๋ฆฌ์ž)
  • ๋ด‡ ์ดˆ๋Œ€ ๋งํฌ : https://discord.com/oauth2/authorize?client_id=1229021729192677488
  • !p ์งˆ๋ฌธ ๋‚ด์šฉ ๋ช…๋ น์–ด๋กœ ์ฑ„ํŒ… ๊ฐ€๋Šฅ
  • (๋น„์šฉ ๋ฌธ์ œ๋กœ gpt-3.5-turbo๋ฅผ ์‚ฌ์šฉ์ค‘. ์™ธ๊ตญ๋ฏผ ์•ฑ๊ณผ ์„ฑ๋Šฅ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ)

Metrics

image RAGAS

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLMโ€™s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

๋น„๊ต ๋ชจ๋ธ (Comparison model)

  • ์™ธ๊ตญ๋ฏผ KUKU (OURS)
  • SimpleRAG
  • ChatGPT(gpt-3.5-turbo)
  • ON๊ตญ๋ฏผ ์ฟ ๋ฏผ์ด

ํ…Œ์ŠคํŠธ์— ์‚ฌ์šฉ๋œ ์•ฝ 200๊ฐœ์˜ ์งˆ๋ฌธ์€ question_list.md ์—์„œ ํ™•์ธ ๊ฐ€๋Šฅ

Faithfulness

This measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.
The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context. To calculate this a set of claims from the generated answer is first identified. Then each one of these claims are cross checked with given context to determine if it can be inferred from given context or not. The faithfulness score is given by divided by

image

  • Faithfulness๋Š” ์ฑ—๋ด‡์˜ ๋‹ต๋ณ€ ๋น„์œจ์ด ์–ผ๋งˆ๋‚˜ ์‚ฌ์‹ค์— ๊ธฐ๋ฐ˜ํ–ˆ๋Š”์ง€ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ์ด๋‹ค.
  • 0~1 ๋ฒ”์œ„์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ ๋†’์„ ์ˆ˜๋ก ์ข‹๋‹ค

image image

  • ๋””์Šค์ฝ”๋“œ ๋ด‡์„ ํ†ตํ•ด ๊ตญ๋ฏผ๋Œ€ 19ํ•™๋ฒˆ์œผ๋กœ ๊ตฌ์„ฑ๋œ โ€˜์™ธ๊ตญ๋ฏผโ€™ ํŒ€์›์ด ์ง์ ‘ ์‚ฌ์‹ค๊ด€๊ณ„๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ธก์ • image

Answer Relevancy

The evaluation metric, Answer Relevancy, focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy. This metric is computed using the question, the context and the answer.

image image

The Answer Relevancy is defined as the mean cosine similartiy of the original question to a number of artifical questions, which where generated (reverse engineered) based on the answer
  • Answer Relevance์€ ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์ด ์ฃผ์–ด์ง„ ํ”„๋กฌํ”„ํŠธ์™€ ์–ผ๋งˆ๋‚˜ ์ ์ ˆํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค.
  • ๋ถˆ์™„์ „ํ•˜๊ฑฐ๋‚˜ ์ค‘๋ณต ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋‹ต๋ณ€์—๋Š” ๋‚ฎ์€ ์ ์ˆ˜๊ฐ€ ํ• ๋‹น๋˜๊ณ , ๋” ๋†’์€ ์ ์ˆ˜๋Š” ๋” ์ข‹์€ ์ ํ•ฉ์„ฑ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  • 0~1 ๋ฒ”์œ„์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ ๋†’์„ ์ˆ˜๋ก ์ข‹๋‹ค

image

Answer Relevance๋Š” Reverse Engineering์„ ํ†ตํ•ด ๊ณ„์‚ฐ๋œ๋‹ค. ์ˆœ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  1. ์ฑ—๋ด‡์—๊ฒŒ ์งˆ๋ฌธ ํ›„ ๋‹ต๋ณ€์„ ๋ฐ›๋Š”๋‹ค.
  2. ๋‹ต๋ณ€์„ llm์˜ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด ์งˆ๋ฌธ์„ ์˜ˆ์ธกํ•˜๊ฒŒ ํ•œ๋‹ค.
  3. ์˜ˆ์ธก๋œ ์งˆ๋ฌธ๊ณผ ์›๋ž˜์˜ ์งˆ๋ฌธ์˜ ์œ ์‚ฌ๋„(์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„)๋ฅผ ๋น„๊ตํ•œ๋‹ค.

Latency

  • ์ฑ—๋ด‡์—๊ฒŒ ์ฟผ๋ฆฌ๋ฅผ ๋ณด๋‚ธ ํ›„ ๋‹ต๋ณ€ ์‘๋‹ต๊นŒ์ง€ ๊ฑธ๋ฆฐ ํ‰๊ท  ์†Œ์š” ์‹œ๊ฐ„(sec)
  • ๋‚ฎ์„ ์ˆ˜๋ก ์ข‹๋‹ค.

image

Test Sample

image image

Test Log

chatbot_result.xlsx

image

LangSmith Tracing

LangSmith๋ฅผ ํ†ตํ•ด ๋””๋ฒ„๊น… ๋ฐ ์ถ”์ 

  • ๋žญ์Šค๋ฏธ์Šค๋ฅผ ํ†ตํ•ด Langchain์˜ ์ฒด์ธ๊ฐ„์˜ ์ž…์ถœ๋ ฅ ํ™•์ธ ๊ฐ€๋Šฅ
  • Retriever์˜ ๊ฒฐ๊ณผ๋กœ ์–ด๋–ค ๋ฌธ์„œ๊ฐ€ ๊ฒ€์ƒ‰๋˜์—ˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅ

์˜ˆ์‹œ) image