LLMs and Coding in Qualitative Research: Advancements and Opportunities for Social Verbatim as an Integral Qualitative Tool

DEBATE: Beyond Big Data: Generative AI and LLMs as New Digital Technologies for the Analysis of Social Reality

Authors

DOI:

https://doi.org/10.54790/rccs.176

Keywords:

Large Language Models (LLMs), qualitative coding, Generative Artificial Intelligence (GAI), qualitative research, open science, AI-assisted qualitative analysis

Abstract

This article explores the use of Large Language Models (LLMs) in qualitative coding, highlighting advances and opportunities for the Social Verbatim tool. It reviews the fundamentals of LLMs, their architecture, and the impact of hardware on their development. Additionally, specific applications of LLMs in qualitative research are analyzed, including thematic coding and comparative analysis. Methodological, ethical, and epistemological challenges are addressed, and strategies to mitigate these issues are proposed. Finally, the implications of integrating LLMs into tools like Social Verbatim are discussed, emphasizing the importance of transparency and human-machine collaboration in qualitative research.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Juan Miguel Gómez Espino, Universidad Pablo de Olavide

Profesor titular de Sociología de la Universidad Pablo de Olavide (Sevilla). Desde 2001 ha compaginado la docencia y la investigación en sociología de la infancia y la educación con labores de gestión universitaria (actualmente como decano de la Facultad de Ciencias Sociales). Licenciado en Ciencias Políticas y Sociología por la Universidad de Granada, y doctor desde 2009, ha escrito varias publicaciones en revistas de impacto como Revista Española de Investigaciones Sociológicas, Empiria, Childhood, o International Sociology o Language and education o capítulos libros en editoriales como Springer. Además, ha sido profesor invitado en la Universidad de Sheffield (Reino Unido) y Gramma (Cuba).

References

Arlinghaus, C. S., Wulff, C., Maier, G. W., Arlinghaus, C., Wulff, C. y Maier, G. (2024). Inductive coding with chatgpt-an evaluation of different gpt models clustering qualitative data into categories. OSF Preprints, doi, 10.

Bail, C. A. (2023). Can generative AI improve social science? https://osf.io/rwtzs/download

Blair, E. (2015). A reflexive exploration of two qualitative data coding techniques. Journal of Methods and Measurement in the Social Sciences, 6(1), 14-29.

Carius, A. C. y Teixeira, A. J. (2024). Artificial Intelligence and content analysis: the large language models (LLM) and the automatized categorization. AI y Soc., 40, 2405-2416. https://doi.org/10.1007/s00146-024-01988-y

Chew, R., Bollenbacher, J., Wenger, M., Speer, J. y Kim, A. (2023). LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924.

Christou, P. A. (2023). How to use artificial intelligence (AI) as a resource, methodological and analysis tool in qualitative research? Qualitative Report, 28(7), 1968-1980. https://doi.org/10.46743)2160-3715/2023.6406C

Crabtree, B. y Miller, W. (Eds.) (1999). DoingQualitative Research (2nd ed.) [20 paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 3(4), art. 3. http://nbn-resolving.de/urn:nbn:de:0114-fqs020432. © 2002 FQS. http://www.qualitative-research.net/fqs/

Dai, S. C., Xiong, A. y Ku, L. W. (2023). LLM-in-the-loop: Leveraging large language model for thematic analysis. arXiv preprint arXiv:2310.15100.

Deterding, N. M. y Waters, M. C. (2021). Flexible coding of in-depth interviews: A twenty-first-century approach. Sociological Methods & Research, 50(2), 708-739.

Dunivin, Z. O. (2024). Scalable qualitative coding with LLM: Chain-of-thought reasoning matches human performance in some hermeneutic tasks. arXiv preprint arXiv:2401.15170.

Gao, J., Shu, Z. y Yeo, S. Y. (2025). Using Large Language Model to Support Flexible and Structural Inductive Qualitative Analysis. arXiv preprint arXiv:2501.00775.

Glasser, B. G. y Strauss, A. L. (1967). The development of grounded theory. Chicago, IL: Alden.

Gómez-Espino, J. M., Simó-Noguera, C. y Carvajal-Soria, P. (2025). Ciencia abierta y procesos de investigación cualitativa en la app Social Verbatim. Pendiente de publicación.

González-Veja, A. M. D. C., Sánchez, R. M., Salazar, A. L. y Salazar, G. L. L. (2022). La entrevista cualitativa como técnica de investigación en el estudio de las organizaciones. New trends in qualitative research, 14.

Hayes, A. S. (2025). «Conversing» with qualitative data: Enhancing qualitative research through large language models (LLM). International Journal of Qualitative Methods, 24, 16094069251322346.

Jiang, J. A., Wade, K., Fiesler, C. y Brubaker, J. R. (2021). Supporting serendipity: Opportunities and challenges for Human-AI Collaboration in qualitative analysis. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1-23.

Kissinger, H., Schmidt, E. y Huttenlocher, D. (2023), February 24. ChatGPT heralds an intellectual revolution. The Wall Street Journal, 24 de febrero. https://www.wsj.com/articles/chatgpt-heralds-an-intellectual-revolution

Lakshmanan, L. (2022). Why large language models like ChatGPT are bullshit artists, and how to use them effectively anyway. LinkedIn. https://www.linkedin.com/pulse/why-large-language-models-like-chatgpt-bullshit-how-use-lakshmanan/?trk=pulse-article_more-articles_related-content-card

Li, J., Li, J. y Su, Y (2024, Mayo). A map of exploring human interaction patterns with LLM: Insights into collaboration and creativity. En International conference on human-computer interaction (pp. 60-85). Springer Nature Switzerland.

Linneberg, S. M. y Korsgaard, S. (2019). Coding qualitative data: A synthesis guiding the novice. Qualitative Research Journal, 19(3), 259-270.

Marshall, D. T. y Naff, D. B. (2024). The ethics of using artificial intelligence in qualitative research. Journal of Empirical Research on Human Research Ethics, 19(3), 92-102.

Mathis, W. S., Zhao, S., Pratt, N., Weleff, J. y De Paoli, S. (2024). Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods? Computer Methods and Programs in Biomedicine, 255, 108356.

McMullin, C. (2023). Transcription and qualitative methods: Implications for third sector research. VOLUNTAS: International journal of voluntary and nonprofit organizations, 34(1), 140-153.

Meng, H., Yang, Y., Li, Y., Lee, J. y Lee, Y. C. (2024). Exploring the potential of human-LLM synergy in advancing qualitative analysis: A case study on mental-illness stigma. arXiv preprint arXiv:2405.05758.

Miles, M. B. et al. (2015). Qualitative Data Analysis: A Methods Sourcebook and The Coding Manual for Qualitative Researchers: Matthew B. Miles, A. Michael Huberman, and Johnny Saldaña. Thousand Oaks, CA: SAGE.

Mitchell, M. (2024). Large Language Models. En M. C. Frank y A. Majid (Eds.), Open Encyclopedia of Cognitive Science. MIT Press. https://doi.org/10.21428/e2759450.2bb20e3c

Mitchell, M. y Krakauer, D. C. (2023). The debate over understanding in AI’s large language models. Proceedings of the National Academy of Sciences, 120(13), e2215907120.

Molina, M. y Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45(1), 27-45.

Morgan, D. L. (2023). Exploring the Use of Artificial Intelligence for Qualitative Data Analysis: The Case of ChatGPT. International Journal of Qualitative Methods, 22. https://doi.org/10.1177/16094069231211248

Qiao, T., Walker, C., Cunningham, C. W. y Koh, Y. S. (2025). Thematic-LM: a LLM-based Multi-agent System for Large-scale Thematic Analysis. En The Web Conference 2025.

Rossi, L., Harrison, K. y Shklovski, I. (2024). The Problems of LLM-generated Data in Social Science Research. Sociologica, 18(2), 145-168.

Schroeder, H., Quéré, M. A. L., Randazzo, C., Mimno, D. y Schoenebeck, S. (2025). Large Language Models in Qualitative Research: Uses, Tensions, and Intentions. Computer Science. abril-mayo. https://arxiv.org/abs/2410.07362

Social Verbatim (s. f.). Social Verbatim. https://www.socialverbatim.com. www.app.socialverbatim.com.

Spirling, A. (2023). World view. Nature, 616, 413.

Tai, R. H., Bentley, L. R., Xia, X., Sitt, J. M., Fankhauser, S. C., Chicas-Mosier, A. M. y Monteith, B. G. (2024). An examination of the use of large language models to aid analysis of textual data. International Journal of Qualitative Methods, 23, 16094069241231168.

Törnberg, P. (2023). How to use LLMs for text analysis. arXiv [Preprint] (2023). https://doi.org/10.48550/arXiv.2307.13106

Van Dis, E. A., Bollen, J., Zuidema, W., Van Rooij, R. y Bockting, C. L. (2023). ChatGPT: five priorities for research. Nature, 614(7947), 224-226.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. y Polosukhin, I. (2017). Attention Is All You Need. En Advances in Neural Information Processing Systems (vol. 30). https://arxiv.org/abs/1706.03762

Wang, Y., Wang, Q., Shi, S., He, X., Tang, Z., Zhao, K. y Chu, X. (2019). Benchmarking the performance and energy efficiency of AI accelerators for AI training. Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). https://doi.org/10.1109/HPCA.2019.00015

Wu, Y., Nagler, J., Tucker, J. A. y Messing, S. (2023). Large language models can be used to scale the ideologies of politicians in a zero-shot learning setting. arXiv [Preprint] https://doi.org/10.48550/arXiv.2303.12057.

Xiao, Z., Yuan, X., Liao, Q. V., Abdelghani, R. y Oudeyer, P. Y. (2023, March). Supporting qualitative analysis with large language models: Combining codebook with GPT-3 for deductive coding. En Companion proceedings of the 28th international conference on intelligent user interfaces (pp. 75-78).

Yang, Y. y Ma, L. (2025). Artificial intelligence in qualitative analysis: a practical guide and reflections based on results from using GPT to analyze interview data in a substance use program. Quality & Quantity, 1-24.

Zhang, H., Wu, C., Xie, J., Rubino, F., Graver, S., Kim, C., ... y Cai, J. (2024). When Qualitative Research Meets Large Language Model: Exploring the Potential of QualiGPT as a Tool for Qualitative Coding. arXiv preprint arXiv:2407.14925.

Zhao, F., Yu, F. y Shang, Y. (2024). A New Method Supporting Qualitative Data Analysis Through Prompt Generation for Inductive Coding. En 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 164-169). IEEE.

Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z. y Yang, D. (2024). Can large language models transform computational social science? Computational Linguistics, 50(1), 237-291. https://doi.org/10.48550/arXiv.2305.03514

Published

2026-01-09

How to Cite

Gómez Espino, J. M. (2026). LLMs and Coding in Qualitative Research: Advancements and Opportunities for Social Verbatim as an Integral Qualitative Tool: DEBATE: Beyond Big Data: Generative AI and LLMs as New Digital Technologies for the Analysis of Social Reality. CENTRA Journal of Social Sciences, 5(1), 195–218. https://doi.org/10.54790/rccs.176

Issue

Section

Discussion