Journal Search Engine
Download PDF Export Citation Korean Bibliography PMC Previewer
ISSN : 1226-4822(Print)
ISSN : (Online)
The Sociolinguistic Journal of Korea Vol.32 No.3 pp.41-64
DOI : http://dx.doi.org/10.14353/sjk.2024.32.3.02

Fine-tuned Korean Language Models for Sociolinguistic Studies

Noh Kangsan*, Soo-Yeon Kim**, Hye-Won Choi***, Hayeun Jang****, Sanghoun Song*****
*First Author: Kangsan Noh, Graduate Student, College of Liberal Arts, Korea University; 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
**Author: Soo-Yeon Kim, Professor, College of Liberal Arts, Sejong University; 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, South Korea
***Author: Hye-Won Choi, Professor, College of Liberal Arts, Ewha Womans University; 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, South Korea
****Author: Hayeun Jang, Assistant Professor, College of Liberal Arts, Sungkyunkwan University; 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul 03063, South Korea
*****Corresponding Author: Sanghoun Song, Associate Professor, College of Liberal Arts, Korea University; 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea

Abstract

This paper aims to test deep-learning-based Korean language models’ capacity to learn and detect social registers embedded in speech data, specifically age, gender, and regional dialects. A comprehensive understanding of linguistic phenomena requires contextualizing speech based on speakers’ age, gender, and geographic background, along with the processing of syntactic structures. To bridge the gap between human language understanding and model processing, we fine-tuned three representative Korean language models—KR-BERT, KoELECTRA-base, and KLUE-RoBERTa-base—using transcribed data from 4,000 hours of speech by middle-aged and elderly Korean speakers. The findings reveal that KoELECTRA-base outperformed the other two models across all social registers, which is likely attributed to its larger vocabulary and parameters size. Among the dialects, the Jeju dialect showed the highest accuracy in inference, which is attributed to its distinctiveness, making it easier for the models to detect. In addition to the fine-tuning process, we have made our fine-tuned models publicly available to support researchers interested in Korean computational sociolinguistics.

사회언어학 연구를 위한 한국어 미세조정 언어모델

노강산*, 김수연**, 최혜원***, 장하연****, 송상헌*****

초록

Figure

Table

  1. Journal Abbreviation : j. Korea Saf. Manag. Sci.
    Frequency : Published three times annually in April, August and December, including an international issue
    Doi Prefix : 10.14353/sjk.
    Year of Launching : 2013
    Publisher :
    The Sociolinguistic Society of Korea
    Indexed/Tracked/Covered By :

  2. Online Submission

    https://socioling.jams.or.kr

  3. The Sociolinguistic
    Society of Korea

  4. Editorial Office
    Contact Information

    - Tel: +82-10-4513-5639
    - E-mail: mibong78@hanmail.net