I’m experiencing a persistent connection error when deploying my Streamlit app that relies on an Ollama LLM model running on my local machine.
Environment Details:
- Local Machine:
- Operating System: macOS
- Ollama Version: 0.1.31 (also tried downgrading to 0.0.11)
- Streamlit App:
- Uses LangChain, Ollama LLM, and Chroma for vector storage
- Deployment Platform:
- Streamlit Community Cloud
Problem Description:
-
When running the app locally, it functions correctly and communicates with the Ollama server without issues.
-
Upon deploying the app on Streamlit Community Cloud, I encounter the following error:
Connection error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f01295e4380>: Failed to establish a new connection: [Errno 111] Connection refused'))
What I’ve Tried So Far:
-
Verified Ollama Server is Running:
-
Started the Ollama server using
ollama serve
. -
Confirmed it’s listening on port 11434.
-
Checked running processes with
ps aux | grep ollama
and saw:/Applications/Ollama.app/Contents/Resources/ollama serve
-
-
Checked API Endpoints:
- Accessed
http://localhost:11434
and received"ollama is running"
. - Tried
curl http://localhost:11434/api/ps
and got{"models":[]}
. - Attempted
curl http://localhost:11434/api/models
andcurl http://localhost:11434/api/generate
, but received 404 Not Found errors.
- Accessed
-
Verified Installed Models:
-
Ran
ollama list
and confirmed models likellama2
,llama3
, and others are installed.NAME ID SIZE MODIFIED llama2:latest 78e26419b446 3.8 GB 3 hours ago llama3:latest 365c0bd3c000 4.7 GB 13 days ago
-
-
Tested API Calls:
- Tried
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello"}'
and received a 404 Not Found error. - Noted that
curl http://localhost:11434/api/ps
returns{"models":[]}
, indicating no models are running.
- Tried
-
Checked for Port Conflicts and Firewall Issues:
- Used
lsof -i :11434
to confirm no other service is occupying the port. - Ensured firewall settings are not blocking port 11434.
- Used
-
Verified Ollama Version and API Changes:
- Found that Ollama version 0.1.31 has deprecated the REST API endpoints like
/api/generate
. - Recognized that the 404 errors are due to these deprecated endpoints.
- Found that Ollama version 0.1.31 has deprecated the REST API endpoints like
-
Attempted to Use Correct API Endpoints:
- Modified the base URL in the code to remove
/api
prefix. - Tested endpoints like
http://localhost:11434/generate
, but still received 404 Not Found errors.
- Modified the base URL in the code to remove
-
Explored Using the Ollama CLI Instead of the REST API:
-
Created a custom LLM class in LangChain to interact with the Ollama CLI using Python’s
subprocess
module. -
Code snippet for the custom LLM class:
from langchain.llms.base import LLM import subprocess class OllamaCLI(LLM): def __init__(self, model="llama2"): self.model = model def _call(self, prompt, stop=None): cmd = ["ollama", "generate", self.model] process = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) stdout, stderr = process.communicate(input=prompt) if process.returncode != 0: raise Exception(f"Ollama Error: {stderr}") return stdout @property def _llm_type(self): return "ollama_cli"
-
Adjusted the application to use this custom LLM class.
-
Tested locally, and the app works using the CLI.
-
-
Attempted to Uninstall Ollama and Install an Older Version:
- Tried
brew uninstall ollama
, but receivedError: Cask 'ollama' is not installed.
- Realized Ollama was installed as a standalone application in
/Applications/Ollama.app
.
- Tried
-
Manually Uninstalled Ollama:
-
Quit the Ollama application and killed running processes with
pkill -f ollama
. -
Deleted
Ollama.app
from the Applications folder. -
Removed associated files and directories:
rm -rf ~/Library/Application\ Support/Ollama rm -rf ~/Library/Caches/com.ollama* rm -rf ~/Library/Preferences/com.ollama* rm -rf ~/Library/Logs/Ollama rm -rf ~/.ollama
-
-
Installed Ollama Version 0.0.11 with REST API Support:
- Downloaded the older version from Ollama Releases.
- Installed the binary to
/usr/local/bin
. - Verified installation with
ollama -v
, which now showsollama version is 0.0.11
. - Started the server with
ollama serve
. - Successfully accessed the REST API endpoints.
-
Adjusted Application to Use REST API:
- Updated the base URL in the application to include
/api
prefix. - Changed the model names to match those available in version 0.0.11.
- Tested the app locally, and it works as expected.
- Updated the base URL in the application to include
-
Deployment Challenges Remain:
- Deployed the app on Streamlit Community Cloud.
- Still receiving connection errors because the deployed app cannot reach the local Ollama server.
My Questions:
-
Is it possible for a deployed Streamlit app to connect to a local Ollama server?
- Given that the app is running on Streamlit Community Cloud and the Ollama server is on my local machine, I suspect network connectivity issues are preventing communication.
-
If not, what are the recommended approaches to make the Ollama server accessible to the deployed app?
- Are there secure methods to expose the Ollama server to the internet without compromising security?
- Should I consider hosting the Ollama server on a cloud platform?
-
Alternatively, is it more feasible to use a cloud-based LLM like OpenAI’s GPT-3.5-turbo for deployment scenarios?
- Considering the complexities and potential security risks, would switching to a cloud-based LLM service be more practical?
-
What are the best practices for deploying apps that require a custom LLM backend?
- How do others handle deploying applications that rely on local models?
Additional Information:
-
Network Limitations:
- My local machine is behind a NAT firewall with a dynamic IP, making direct connections from the internet challenging.
- I am hesitant to set up port forwarding due to security concerns.
-
Security Concerns:
- Exposing my local server to the internet might introduce vulnerabilities.
- I prefer a solution that maintains security while allowing the deployed app to access the LLM.
-
Deployment Constraints:
- The app is deployed on Streamlit Community Cloud, which doesn’t allow running background processes like the Ollama server.
What I’m Seeking:
- Advice on how to enable communication between the deployed Streamlit app and the Ollama server.
- Recommendations for securely hosting the Ollama server in a way that the deployed app can access it.
- Best practices for deploying apps that require a custom LLM backend.
- Insights on whether switching to a cloud-based LLM service would be more practical for deployment.
Thank you for your assistance!
Additional Notes:
- I’ve read through the Ollama GitHub Repository and the LangChain Ollama Integration Documentation to understand the API changes and potential workarounds.
- I understand that newer versions of Ollama have deprecated the REST API in favor of a gRPC-based API and CLI interactions, which might not be supported by LangChain yet.
- Modifying the application to use the CLI works locally but doesn’t resolve the deployment issue due to the inability to run the Ollama server on Streamlit Community Cloud.
I appreciate any guidance or suggestions on how to proceed.