โ† Back

Financial RAG Pipeline & Product Development

Competitionstrategyresearch2024

Overview

Financial RAG Challenge๋Š” ๋ฐฉ๋Œ€ํ•œ ์žฌ๋ฌด ๋ฌธ์„œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” Retrieval-Augmented Generation (RAG) ์‹œ์Šคํ…œ์˜ ๋ฐœ์ „์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๋Œ€ํšŒ์ž…๋‹ˆ๋‹ค. ์ฐธ๊ฐ€์ž๋“ค์€ ๋Œ€๊ทœ๋ชจ ๊ธˆ์œต ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ด€๋ จ ๋ฌธ๋งฅ์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•ด์•ผ ํ•˜๋ฉฐ, ๊ธˆ์œต ์šฉ์–ด, ์‚ฐ์—… ํŠน์œ ์˜ ์–ธ์–ด, ์ˆ˜์น˜ ๋ฐ์ดํ„ฐ ๋“ฑ ์‹ค์ œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋Šฅ๋ ฅ์ด ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค. ํ†ตํ•ฉ๋œ ํ…์ŠคํŠธ ๋ฐ ํ‘œ ํ˜•์‹์˜ ์žฌ๋ฌด ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ nDCG@10 ์ง€ํ‘œ๋กœ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๋ฉฐ, ์˜ˆ์„  ํ†ต๊ณผ ์ƒ์œ„ 10๊ฐœ ํŒ€์€ ๋ณธ์„ ์—์„œ KB์ฆ๊ถŒ ๋ณธ์‚ฌ์—์„œ ํ”„๋ ˆ์  ํ…Œ์ด์…˜์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

Task 1
Task 1 โ€“ Retrieval
Task 2
Task 2 โ€“ Generation

๊ธฐ์ˆ ์  ๊ตฌํ˜„

์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜
RAG ํŒŒ์ดํ”„๋ผ์ธ๊ณผ Reranking ๊ตฌ์„ฑ

๋ณต์žกํ•œ ๊ธˆ์œต ์šฉ์–ด์™€ ํ˜ผํ•ฉ ๋ฐ์ดํ„ฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ RAG pipeline์„ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค. Query Expansion์„ ํ†ตํ•ด GPT-4o-mini๋กœ ๊ธˆ์œต ์•ฝ์–ด์™€ ๊ธฐ์—…๋ณ„ ์šฉ์–ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , BM25 sparse retrieval๊ณผ dense semantic search๋ฅผ ๊ฒฐํ•ฉํ•œ Hybrid Search๋ฅผ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์š”์•ฝ ๋ฐ table extraction์„ ํ†ตํ•œ corpus refinement ๊ณผ์ •์„ ๊ฑฐ์ณ, ColBERT์™€ Voyage AI ๋“ฑ ์ตœ์‹  reranking model์„ ์ ์šฉํ•˜์—ฌ ํŠนํžˆ tabular dataset์—์„œ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ฆฌ๋”๋ณด๋“œ ์„ฑ๋Šฅ
๋Œ€ํšŒ ๋ฆฌ๋”๋ณด๋“œ ์„ฑ๋Šฅ (nDCG@10)

์„œ๋น„์Šค ์ œ์•ˆ

Query Chain ๋ถ„์„ framework

๊ธฐ์ˆ ์  ๊ตฌํ˜„์„ ๋„˜์–ด, ์—ฐ์†์ ์ธ ์งˆ์˜ ํŒจํ„ด์„ ํ†ตํ•ด ์‚ฌ์šฉ์ž์˜ ์ž ์žฌ์  ์˜๋„๋ฅผ ํŒŒ์•…ํ•˜๋Š” Query Chain ๋ถ„์„ framework๋ฅผ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. DIKW(Data-Information-Knowledge-Wisdom) ๊ณ„์ธต๊ตฌ์กฐ์™€ Goal-Means analysis๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์—ฐ์† ์งˆ์˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋” ๊นŠ์€ ํˆฌ์ž ๋ชฉํ‘œ์™€ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์„ ๋“œ๋Ÿฌ๋‚ด๋Š”์ง€ ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค.

Task 1
Base (1) DIKW
Task 2
Base (2) Goal-Mean Structureโ€“ Generation

์ง€๋Šฅํ˜• ํˆฌ์ž ํ”Œ๋žซํผ ZIRASys

์ด๋Ÿฌํ•œ insight๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐœ์ธํ™”๋œ content ์ƒ์„ฑ, ์‹ค์‹œ๊ฐ„ ๊ธˆ์œต ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ, community ๊ธฐ๋ฐ˜ insight๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ง€๋Šฅํ˜• ํˆฌ์ž platform ZIRASys๋ฅผ ์ œ์•ˆํ•˜์—ฌ, ๊ธฐ์กด rule-based chatbot์„ ์‚ฌ์šฉ์ž needs๋ฅผ ์ดํ•ดํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๋Šฅ๋™์  ๊ธˆ์œต assistant๋กœ ์ „ํ™˜ํ•˜๋Š” ๋ฐฉ์•ˆ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.

Query Chain ๋ถ„์„
Query Chain ๊ธฐ๋ฐ˜ ์ž ์žฌ ์˜๋„ ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ
Query Chain ๋ถ„์„
Latent Sub Goal ์˜ˆ์ธก ๊ธฐ๋Šฅ

์ž๋ฃŒ

Jaywoong Jeong