Upload a powerpoint for langchain

AxelJ · June 28, 2023, 3:18pm

Hello,

I want to analyze a powerpoint using LLM with Langchain via an application built with Streamlit. I can’t figure out how to extract the file and pass it to Langchain.

One solution would be to save the uploaded file on my computer and load it in the classical way with Langchain, but this solution doesn’t seem elegant to me. I’d like to be able to use the instance obtained with file_uploader to have a Langchain document instance.

So I gave it a try using the Unstructured library. But the file type returned by st.file_uploader doesn’t match what’s expected by Unstructured (TypeError: expected str, bytes or os.PathLike object, not UploadedFile).

The ultimate goal would be to obtain something similar to (or even be able to use) the following langchain function :

Here’s a minimal example of what I tried:

import streamlit as st
import streamlit.components.v1 as components
import numpy as np
import pandas as pd
import os
import json
from langchain.document_loaders import UnstructuredPowerPointLoader
from unstructured.partition.pptx import partition_pptx

uploaded_file = st.file_uploader("Télécharger un fichier Powerpoint", type=['ppt', 'pptx'])

def get_slides(file) -> list[int]:
    elements = partition_pptx(file)
    slides = []
    slide = ""
    for e in elements :
        slide += str(e)
        if str(e) == "<PAGE BREAK>":
            slides.append(slide)
            slide = ""
    return slides

if uploaded_file is not None:
    slides = get_slides(uploaded_file)

Do you have any idea how to do this ? Thank you for your help.

tonykip · June 30, 2023, 1:28pm

Hi @AxelJ,

Thank you for sharing your question with the community!

Your post is missing a code snippet and a link to your app’s GitHub repo. Please check out our guidelines on how to post an effective question here and update your post to help the community answer your question.

mathcatsand · July 22, 2023, 4:16pm

I think your fastest solution would be to use tempfile. As you’ve encountered, Streamlit’s file uploader returns a file-like object which is held in memory; it doesn’t exist in the app’s file system in a way that can be pointed to with a path.

If you have a library that accepts a buffer, you can work with your uploaded file in memory. However, if you have a library that needs a path instead of a file-like object or buffer, use tempfile. It temporarily saves the file to disk and provides a path. That way, you don’t have to manually name and remove files and worry about things getting overwritten.

Goyo · July 27, 2023, 12:29pm

partition_pptx can also take a file keyword argument.

system · January 23, 2024, 12:29pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LangChain 🤝 Streamlit Show the Community! llms	1	2049	March 9, 2024
Advice needed: Converting Jupyter Notebook to Streamlit web app for LLM chatbot LLMs and AI discussion	2	327	July 30, 2024
Langchain stream Show the Community! llms	11	16532	August 28, 2024
Problem importing langchain Using Streamlit	7	4294	October 7, 2024
How to print Langchain prompt on Streamlit app LLMs and AI llms , prompts , debugging	1	527	March 12, 2024

Upload a powerpoint for langchain

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies