Can anyone help me with what I am doing incorrectly? This app runs fine on a local computer but having a hard time deploying it.
Error ```
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before
sending its port number.
[19:50:20] 🖥 Provisioning machine...
[19:50:29] 🎛 Preparing system...
[19:49:49] 🚀 Starting up repository: 'searchengine', branch: 'main', main module: 'data/pyapp.py'
[19:49:49] 🐙 Cloning repository...
[19:49:52] 🐙 Cloning into '/mount/src/searchengine'...
Warning: Permanently added the ED25519 host key for IP address '140.82.116.3' to the list of known hosts.
[19:49:52] 🐙 Cloned repository!
[19:49:52] 🐙 Pulling code changes from Github...
[19:49:53] 📦 Processing dependencies...
──────────────────────────────────────── uv ───────────────────────────────────────────
Using uv pip install.
Resolved 51 packages in 3.84s
Downloaded 51 packages in 18.45s
Installed 51 packages in 185ms
+ altair==5.3.0
+ attrs==23.2.0
+ blinker==1.7.0
+ cachetools==5.3.3
+ certifi==2024.2.2
+ charset-normalizer==3.3.2
+ click==8.1.7
+ gitdb==4.0.11
+ gitpython==3.1.43
+ idna==3.7
+ importlib-metadata==6.11.0
+ jinja2==3.1.3
+ joblib==1.4.0
+ jsonschema==4.21.1
+ jsonschema-specifications==2023.12.1
+ markdown-it-py==3.0.0
+ markupsafe==2.1.5
+ mdurl==0.1.2
+ nltk==3.8.1
+ numpy==1.26.4
+ packaging==23.2
+ pandas==2.2.2
+ pillow==10.3.0
+ protobuf==4.25.3
+ py4j==0.10.9.7
+ pyarrow==16.0.0
+ pydeck==0.8.0
+ pygments==2.17.2
+ pyspark==3.5.1
+ python-dateutil==2.9.0.post0
+ pytz==2024.1
+ referencing==0.34.0
+ regex==2024.4.16
+ requests==2.31.0
+ rich==13.7.1
+ rpds-py==0.18.0
+ six==1.16.0
+ smmap==5.0.1
+ streamlit==1.29.0
+ tenacity==8.2.3
+ toml==0.10.2
+ toolz==0.12.1
+ tornado==6.4
+ tqdm==4.66.2
+ typing-extensions==4.11.0
+ tzdata==2024.1
+ tzlocal==5.2
+ urllib3==2.2.1
+ validators==0.28.1
+ watchdog==4.0.0
+ zipp==3.18.1
Checking if Streamlit is installed
Found Streamlit version 1.29.0 in the environment
────────────────────────────────────────────────────────────────────────────────────────
[19:50:21] 🐍 Python dependencies were installed from /mount/src/searchengine/data/requirements.txt using uv.
[19:50:21] 📦 WARN: More than one requirements file detected in the repository. Available options: uv /mount/src/searchengine/data/requirements.txt, uv /mount/src/searchengine/requirements.txt. Used: uv with /mount/src/searchengine/data/requirements.txt
Check if streamlit is installed
Streamlit is already installed
[19:50:22] 📦 Processed dependencies!
[19:50:33] ⛓ Spinning up manager process...
[nltk_data] Downloading package punkt to /home/appuser/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data] /home/appuser/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found
JAVA_HOME is not set
────────────────────── Traceback (most recent call last) ───────────────────────
/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptru
nner/script_runner.py:534 in _run_script
/mount/src/searchengine/data/pyapp.py:15 in <module>
12 nltk.download('stopwords')
13
14 # Create Spark session
❱ 15 spark = SparkSession.builder.appName("LegalSearch").getOrCreate()
16
17 # Assuming preprocessed data are stored in relative paths in the data
18 path_to_flat_words = "./data/flat_words.parquet"
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/sql/session.py:497
in getOrCreate
494 │ │ │ │ │ for key, value in self._options.items():
495 │ │ │ │ │ │ sparkConf.set(key, value)
496 │ │ │ │ │ # This SparkContext may be an existing one.
❱ 497 │ │ │ │ │ sc = SparkContext.getOrCreate(sparkConf)
498 │ │ │ │ │ # Do not update `SparkConf` for existing `SparkCo
499 │ │ │ │ │ # by all sessions.
500 │ │ │ │ │ session = SparkSession(sc, options=self._options)
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:515 in
getOrCreate
512 │ │ """
513 │ │ with SparkContext._lock:
514 │ │ │ if SparkContext._active_spark_context is None:
❱ 515 │ │ │ │ SparkContext(conf=conf or SparkConf())
516 │ │ │ assert SparkContext._active_spark_context is not None
517 │ │ │ return SparkContext._active_spark_context
518
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:201 in
__init__
198 │ │ │ │ " is not allowed as it is a security risk."
199 │ │ │ )
200 │ │
❱ 201 │ │ SparkContext._ensure_initialized(self, gateway=gateway, conf=
202 │ │ try:
203 │ │ │ self._do_init(
204 │ │ │ │ master,
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:436 in
_ensure_initialized
433 │ │ """
434 │ │ with SparkContext._lock:
435 │ │ │ if not SparkContext._gateway:
❱ 436 │ │ │ │ SparkContext._gateway = gateway or launch_gateway(con
437 │ │ │ │ SparkContext._jvm = SparkContext._gateway.jvm
438 │ │ │
439 │ │ │ if instance:
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/java_gateway.py:10
7 in launch_gateway
104 │ │ │ │ time.sleep(0.1)
105 │ │ │
106 │ │ │ if not os.path.isfile(conn_info_file):
❱ 107 │ │ │ │ raise PySparkRuntimeError(
108 │ │ │ │ │ error_class="JAVA_GATEWAY_EXITED",
109 │ │ │ │ │ message_parameters={},
110 │ │ │ │ )
────────────────────────────────────────────────────────────────────────────────
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before
sending its port number. Github link to project:https://github.com/abh2050/searchengine/tree/main/data