Hello,
I want to analyze a powerpoint using LLM with Langchain via an application built with Streamlit. I canât figure out how to extract the file and pass it to Langchain.
One solution would be to save the uploaded file on my computer and load it in the classical way with Langchain, but this solution doesnât seem elegant to me. Iâd like to be able to use the instance obtained with file_uploader to have a Langchain document instance.
So I gave it a try using the Unstructured library. But the file type returned by st.file_uploader doesnât match whatâs expected by Unstructured (TypeError: expected str, bytes or os.PathLike object, not UploadedFile).
The ultimate goal would be to obtain something similar to (or even be able to use) the following langchain function :
Hereâs a minimal example of what I tried:
import streamlit as st
import streamlit.components.v1 as components
import numpy as np
import pandas as pd
import os
import json
from langchain.document_loaders import UnstructuredPowerPointLoader
from unstructured.partition.pptx import partition_pptx
uploaded_file = st.file_uploader("Télécharger un fichier Powerpoint", type=['ppt', 'pptx'])
def get_slides(file) -> list[int]:
elements = partition_pptx(file)
slides = []
slide = ""
for e in elements :
slide += str(e)
if str(e) == "<PAGE BREAK>":
slides.append(slide)
slide = ""
return slides
if uploaded_file is not None:
slides = get_slides(uploaded_file)
Do you have any idea how to do this ? Thank you for your help.