Can anyone help me with what I am doing incorrectly? This app runs fine on a local computer but having a hard time deploying it.
Error ```
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before
sending its port number.
[19:50:20] π₯ Provisioning machine...
[19:50:29] π Preparing system...
[19:49:49] π Starting up repository: 'searchengine', branch: 'main', main module: 'data/pyapp.py'
[19:49:49] π Cloning repository...
[19:49:52] π Cloning into '/mount/src/searchengine'...
Warning: Permanently added the ED25519 host key for IP address '140.82.116.3' to the list of known hosts.
[19:49:52] π Cloned repository!
[19:49:52] π Pulling code changes from Github...
[19:49:53] π¦ Processing dependencies...
ββββββββββββββββββββββββββββββββββββββββ uv βββββββββββββββββββββββββββββββββββββββββββ
Using uv pip install.
Resolved 51 packages in 3.84s
Downloaded 51 packages in 18.45s
Installed 51 packages in 185ms
+ altair==5.3.0
+ attrs==23.2.0
+ blinker==1.7.0
+ cachetools==5.3.3
+ certifi==2024.2.2
+ charset-normalizer==3.3.2
+ click==8.1.7
+ gitdb==4.0.11
+ gitpython==3.1.43
+ idna==3.7
+ importlib-metadata==6.11.0
+ jinja2==3.1.3
+ joblib==1.4.0
+ jsonschema==4.21.1
+ jsonschema-specifications==2023.12.1
+ markdown-it-py==3.0.0
+ markupsafe==2.1.5
+ mdurl==0.1.2
+ nltk==3.8.1
+ numpy==1.26.4
+ packaging==23.2
+ pandas==2.2.2
+ pillow==10.3.0
+ protobuf==4.25.3
+ py4j==0.10.9.7
+ pyarrow==16.0.0
+ pydeck==0.8.0
+ pygments==2.17.2
+ pyspark==3.5.1
+ python-dateutil==2.9.0.post0
+ pytz==2024.1
+ referencing==0.34.0
+ regex==2024.4.16
+ requests==2.31.0
+ rich==13.7.1
+ rpds-py==0.18.0
+ six==1.16.0
+ smmap==5.0.1
+ streamlit==1.29.0
+ tenacity==8.2.3
+ toml==0.10.2
+ toolz==0.12.1
+ tornado==6.4
+ tqdm==4.66.2
+ typing-extensions==4.11.0
+ tzdata==2024.1
+ tzlocal==5.2
+ urllib3==2.2.1
+ validators==0.28.1
+ watchdog==4.0.0
+ zipp==3.18.1
Checking if Streamlit is installed
Found Streamlit version 1.29.0 in the environment
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[19:50:21] π Python dependencies were installed from /mount/src/searchengine/data/requirements.txt using uv.
[19:50:21] π¦ WARN: More than one requirements file detected in the repository. Available options: uv /mount/src/searchengine/data/requirements.txt, uv /mount/src/searchengine/requirements.txt. Used: uv with /mount/src/searchengine/data/requirements.txt
Check if streamlit is installed
Streamlit is already installed
[19:50:22] π¦ Processed dependencies!
[19:50:33] β Spinning up manager process...
[nltk_data] Downloading package punkt to /home/appuser/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data] /home/appuser/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found
JAVA_HOME is not set
ββββββββββββββββββββββ Traceback (most recent call last) βββββββββββββββββββββββ
/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptru
nner/script_runner.py:534 in _run_script
/mount/src/searchengine/data/pyapp.py:15 in <module>
12 nltk.download('stopwords')
13
14 # Create Spark session
β± 15 spark = SparkSession.builder.appName("LegalSearch").getOrCreate()
16
17 # Assuming preprocessed data are stored in relative paths in the data
18 path_to_flat_words = "./data/flat_words.parquet"
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/sql/session.py:497
in getOrCreate
494 β β β β β for key, value in self._options.items():
495 β β β β β β sparkConf.set(key, value)
496 β β β β β # This SparkContext may be an existing one.
β± 497 β β β β β sc = SparkContext.getOrCreate(sparkConf)
498 β β β β β # Do not update `SparkConf` for existing `SparkCo
499 β β β β β # by all sessions.
500 β β β β β session = SparkSession(sc, options=self._options)
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:515 in
getOrCreate
512 β β """
513 β β with SparkContext._lock:
514 β β β if SparkContext._active_spark_context is None:
β± 515 β β β β SparkContext(conf=conf or SparkConf())
516 β β β assert SparkContext._active_spark_context is not None
517 β β β return SparkContext._active_spark_context
518
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:201 in
__init__
198 β β β β " is not allowed as it is a security risk."
199 β β β )
200 β β
β± 201 β β SparkContext._ensure_initialized(self, gateway=gateway, conf=
202 β β try:
203 β β β self._do_init(
204 β β β β master,
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/context.py:436 in
_ensure_initialized
433 β β """
434 β β with SparkContext._lock:
435 β β β if not SparkContext._gateway:
β± 436 β β β β SparkContext._gateway = gateway or launch_gateway(con
437 β β β β SparkContext._jvm = SparkContext._gateway.jvm
438 β β β
439 β β β if instance:
/home/adminuser/venv/lib/python3.11/site-packages/pyspark/java_gateway.py:10
7 in launch_gateway
104 β β β β time.sleep(0.1)
105 β β β
106 β β β if not os.path.isfile(conn_info_file):
β± 107 β β β β raise PySparkRuntimeError(
108 β β β β β error_class="JAVA_GATEWAY_EXITED",
109 β β β β β message_parameters={},
110 β β β β )
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before
sending its port number. Github link to project:https://github.com/abh2050/searchengine/tree/main/data