*Connection Error When Deploying Streamlit App Using Local Ollama LLM Model*

I’m experiencing a persistent connection error when deploying my Streamlit app that relies on an Ollama LLM model running on my local machine.

Environment Details:

  • Local Machine:
    • Operating System: macOS
    • Ollama Version: 0.1.31 (also tried downgrading to 0.0.11)
  • Streamlit App:
    • Uses LangChain, Ollama LLM, and Chroma for vector storage
  • Deployment Platform:
    • Streamlit Community Cloud

Problem Description:

  • When running the app locally, it functions correctly and communicates with the Ollama server without issues.

  • Upon deploying the app on Streamlit Community Cloud, I encounter the following error:

    Connection error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f01295e4380>: Failed to establish a new connection: [Errno 111] Connection refused'))
    

What I’ve Tried So Far:

  1. Verified Ollama Server is Running:

    • Started the Ollama server using ollama serve.

    • Confirmed it’s listening on port 11434.

    • Checked running processes with ps aux | grep ollama and saw:

      /Applications/Ollama.app/Contents/Resources/ollama serve
      
  2. Checked API Endpoints:

    • Accessed http://localhost:11434 and received "ollama is running".
    • Tried curl http://localhost:11434/api/ps and got {"models":[]}.
    • Attempted curl http://localhost:11434/api/models and curl http://localhost:11434/api/generate, but received 404 Not Found errors.
  3. Verified Installed Models:

    • Ran ollama list and confirmed models like llama2, llama3, and others are installed.

      NAME                  ID              SIZE      MODIFIED
      llama2:latest         78e26419b446    3.8 GB    3 hours ago
      llama3:latest         365c0bd3c000    4.7 GB    13 days ago
      
  4. Tested API Calls:

    • Tried curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello"}' and received a 404 Not Found error.
    • Noted that curl http://localhost:11434/api/ps returns {"models":[]}, indicating no models are running.
  5. Checked for Port Conflicts and Firewall Issues:

    • Used lsof -i :11434 to confirm no other service is occupying the port.
    • Ensured firewall settings are not blocking port 11434.
  6. Verified Ollama Version and API Changes:

    • Found that Ollama version 0.1.31 has deprecated the REST API endpoints like /api/generate.
    • Recognized that the 404 errors are due to these deprecated endpoints.
  7. Attempted to Use Correct API Endpoints:

    • Modified the base URL in the code to remove /api prefix.
    • Tested endpoints like http://localhost:11434/generate, but still received 404 Not Found errors.
  8. Explored Using the Ollama CLI Instead of the REST API:

    • Created a custom LLM class in LangChain to interact with the Ollama CLI using Python’s subprocess module.

    • Code snippet for the custom LLM class:

      from langchain.llms.base import LLM
      import subprocess
      
      class OllamaCLI(LLM):
          def __init__(self, model="llama2"):
              self.model = model
      
          def _call(self, prompt, stop=None):
              cmd = ["ollama", "generate", self.model]
              process = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
              stdout, stderr = process.communicate(input=prompt)
              if process.returncode != 0:
                  raise Exception(f"Ollama Error: {stderr}")
              return stdout
      
          @property
          def _llm_type(self):
              return "ollama_cli"
      
    • Adjusted the application to use this custom LLM class.

    • Tested locally, and the app works using the CLI.

  9. Attempted to Uninstall Ollama and Install an Older Version:

    • Tried brew uninstall ollama, but received Error: Cask 'ollama' is not installed.
    • Realized Ollama was installed as a standalone application in /Applications/Ollama.app.
  10. Manually Uninstalled Ollama:

    • Quit the Ollama application and killed running processes with pkill -f ollama.

    • Deleted Ollama.app from the Applications folder.

    • Removed associated files and directories:

      rm -rf ~/Library/Application\ Support/Ollama
      rm -rf ~/Library/Caches/com.ollama*
      rm -rf ~/Library/Preferences/com.ollama*
      rm -rf ~/Library/Logs/Ollama
      rm -rf ~/.ollama
      
  11. Installed Ollama Version 0.0.11 with REST API Support:

    • Downloaded the older version from Ollama Releases.
    • Installed the binary to /usr/local/bin.
    • Verified installation with ollama -v, which now shows ollama version is 0.0.11.
    • Started the server with ollama serve.
    • Successfully accessed the REST API endpoints.
  12. Adjusted Application to Use REST API:

    • Updated the base URL in the application to include /api prefix.
    • Changed the model names to match those available in version 0.0.11.
    • Tested the app locally, and it works as expected.
  13. Deployment Challenges Remain:

    • Deployed the app on Streamlit Community Cloud.
    • Still receiving connection errors because the deployed app cannot reach the local Ollama server.

My Questions:

  1. Is it possible for a deployed Streamlit app to connect to a local Ollama server?

    • Given that the app is running on Streamlit Community Cloud and the Ollama server is on my local machine, I suspect network connectivity issues are preventing communication.
  2. If not, what are the recommended approaches to make the Ollama server accessible to the deployed app?

    • Are there secure methods to expose the Ollama server to the internet without compromising security?
    • Should I consider hosting the Ollama server on a cloud platform?
  3. Alternatively, is it more feasible to use a cloud-based LLM like OpenAI’s GPT-3.5-turbo for deployment scenarios?

    • Considering the complexities and potential security risks, would switching to a cloud-based LLM service be more practical?
  4. What are the best practices for deploying apps that require a custom LLM backend?

    • How do others handle deploying applications that rely on local models?

Additional Information:

  • Network Limitations:

    • My local machine is behind a NAT firewall with a dynamic IP, making direct connections from the internet challenging.
    • I am hesitant to set up port forwarding due to security concerns.
  • Security Concerns:

    • Exposing my local server to the internet might introduce vulnerabilities.
    • I prefer a solution that maintains security while allowing the deployed app to access the LLM.
  • Deployment Constraints:

    • The app is deployed on Streamlit Community Cloud, which doesn’t allow running background processes like the Ollama server.

What I’m Seeking:

  • Advice on how to enable communication between the deployed Streamlit app and the Ollama server.
  • Recommendations for securely hosting the Ollama server in a way that the deployed app can access it.
  • Best practices for deploying apps that require a custom LLM backend.
  • Insights on whether switching to a cloud-based LLM service would be more practical for deployment.

Thank you for your assistance!


Additional Notes:

  • I’ve read through the Ollama GitHub Repository and the LangChain Ollama Integration Documentation to understand the API changes and potential workarounds.
  • I understand that newer versions of Ollama have deprecated the REST API in favor of a gRPC-based API and CLI interactions, which might not be supported by LangChain yet.
  • Modifying the application to use the CLI works locally but doesn’t resolve the deployment issue due to the inability to run the Ollama server on Streamlit Community Cloud.

I appreciate any guidance or suggestions on how to proceed.