I want to analyze a powerpoint using LLM with Langchain via an application built with Streamlit. I can’t figure out how to extract the file and pass it to Langchain.
One solution would be to save the uploaded file on my computer and load it in the classical way with Langchain, but this solution doesn’t seem elegant to me. I’d like to be able to use the instance obtained with file_uploader to have a Langchain document instance.
So I gave it a try using the Unstructured library. But the file type returned by st.file_uploader doesn’t match what’s expected by Unstructured (TypeError: expected str, bytes or os.PathLike object, not UploadedFile).
The ultimate goal would be to obtain something similar to (or even be able to use) the following langchain function :
Here’s a minimal example of what I tried:
import streamlit as st import streamlit.components.v1 as components import numpy as np import pandas as pd import os import json from langchain.document_loaders import UnstructuredPowerPointLoader from unstructured.partition.pptx import partition_pptx uploaded_file = st.file_uploader("Télécharger un fichier Powerpoint", type=['ppt', 'pptx']) def get_slides(file) -> list[int]: elements = partition_pptx(file) slides =  slide = "" for e in elements : slide += str(e) if str(e) == "<PAGE BREAK>": slides.append(slide) slide = "" return slides if uploaded_file is not None: slides = get_slides(uploaded_file)
Do you have any idea how to do this ? Thank you for your help.