Text data extractor: PDF to Text

nainiayoub · May 13, 2022, 10:47pm

Hello community,
I made this PDF to text extractor app that takes a pdf as input, displays the document on the page and returns, based on the user option, either a txt file that contains all of the PDF’s text or a ZIP folder that has txt files containing the text from the pages, such as every file represents a page from the pdf.

I will be adding text extraction from scanned PDF next.
Hope it helps!

nainiayoub · December 15, 2022, 10:10pm

Update

You can now enable OCR for scanned documents and extract your text data:

Upload your PDF
Enable OCR
Select the PDF language (English, French, Spanish or Arabic)
Download your output file (zip/txt)

App → PDF text extractor with OCR
Code → Repository

system · December 15, 2023, 10:10pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Streamlit (Python) developer to build an interactive PDF text extraction application Jobs	0	131	May 20, 2025
Text Extraction Application Show the Community! nlp , computer-vision	1	994	January 26, 2023
Text-extraction-app Show the Community! nlp , computer-vision	11	5258	June 7, 2021
PDF Annotation Extraction Show the Community! file-upload , streamlit-cloud , python-programming	2	1212	August 11, 2023
Extracting Text from a PDF Image Using Drawable Canvas Using Streamlit discussion	0	40	May 15, 2025

Text data extractor: PDF to Text

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies