Upload a powerpoint for langchain


I want to analyze a powerpoint using LLM with Langchain via an application built with Streamlit. I can’t figure out how to extract the file and pass it to Langchain.

One solution would be to save the uploaded file on my computer and load it in the classical way with Langchain, but this solution doesn’t seem elegant to me. I’d like to be able to use the instance obtained with file_uploader to have a Langchain document instance.

So I gave it a try using the Unstructured library. But the file type returned by st.file_uploader doesn’t match what’s expected by Unstructured (TypeError: expected str, bytes or os.PathLike object, not UploadedFile).

The ultimate goal would be to obtain something similar to (or even be able to use) the following langchain function :

Here’s a minimal example of what I tried:

import streamlit as st
import streamlit.components.v1 as components
import numpy as np
import pandas as pd
import os
import json
from langchain.document_loaders import UnstructuredPowerPointLoader
from unstructured.partition.pptx import partition_pptx

uploaded_file = st.file_uploader("Télécharger un fichier Powerpoint", type=['ppt', 'pptx'])

def get_slides(file) -> list[int]:
    elements = partition_pptx(file)
    slides = []
    slide = ""
    for e in elements :
        slide += str(e)
        if str(e) == "<PAGE BREAK>":
            slide = ""
    return slides

if uploaded_file is not None:
    slides = get_slides(uploaded_file)

Do you have any idea how to do this ? Thank you for your help.

Hi @AxelJ,

Thank you for sharing your question with the community!

Your post is missing a code snippet and a link to your app’s GitHub repo. Please check out our guidelines on how to post an effective question here and update your post to help the community answer your question.

1 Like

I think your fastest solution would be to use tempfile. As you’ve encountered, Streamlit’s file uploader returns a file-like object which is held in memory; it doesn’t exist in the app’s file system in a way that can be pointed to with a path.

If you have a library that accepts a buffer, you can work with your uploaded file in memory. However, if you have a library that needs a path instead of a file-like object or buffer, use tempfile. It temporarily saves the file to disk and provides a path. That way, you don’t have to manually name and remove files and worry about things getting overwritten.

partition_pptx can also take a file keyword argument.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.