High Level Streamlit System Design Questions

:rotating_light: Before clicking “Create Topic”, please make sure your post includes the following information (otherwise, the post will be locked). :rotating_light:

  1. Are you running your app locally or is it deployed?
    No
  2. If your app is deployed:
    a. Is it deployed on Community Cloud or another hosting platform?
    b. Share the link to the public deployed app.
    No
  3. Share the link to your app’s public GitHub repository (including a requirements file).
    No link yet
  4. Share the full text of the error message (not a screenshot).
    No error message
  5. Share the Streamlit and Python versions.
    Not relevant

I’m in the design phase of creating a financial analysis app. While there is some very good info on the forum, most of it seems to address down-in-the-weeds issues. But I can’t find anything about a Streamlit high-level design that:

  • Uses financial metrics data from a back-end python app that I created

  • Pulls data from public and private data-provider websites

  • Can be used from my Windows dev workstation, my iPad, my Windows laptop, or my Mac laptop via WiFi, OR…

  • Can be used over the Web using standard security protocols from the Community Cloud.

I think I understand the Python basics (classes, libraries, Pandas, etc.), but one big issue is understanding the best way to get my custom data from my workstation up to the cloud. It seems like the suggested method is to check in the data to GitHub and that will resolve the problem. The problem is that my backend software produces multiple custom datasets, each with thousands or tens of thousands of data rows. I’m concerned about controlling the status of the process and having the ability to upload data ad-hoc without having to use GitHub.

Any suggestions on how I can deal with these issues? Or about where (links) I can find high-level design solutions?

Thanks,

Dan.

3 Likes

Hi @DcPublished

I know that @asehmi has some amazing Streamlit apps that are well architectured to scale where the frontend (Streamlit) and backend (FastAPI) are built separately. Perhaps you can get some inspiration from his apps.

Here are some GitHub repos of his apps:

Hope this helps!

2 Likes

Dataprofessor,

First, thanks for the links.

It’s a bit confusing, but it appears that his Streamlit app is spun up from a windows pc. And that the media server is also run from his local windows pc. Media is coming from unsplash.com.

That said, this line indicates that some or all of this app can be run on Heroku Dynos and Heroku PostgreSQL. My backend app is written in Python and the database is PostgreSQL. This implies that:

  1. I could build and test the client app on a my local windows PC

  2. When I’m ready to deploy to the “net”, I can deploy the client app to Heroku Dyno(s) and upload the data to Heroku Postgres. At first glance, the cost of Heroku is reasonable. Of course, this needs more investigation and study, but it looks very interesting.

Since I’ve had 30+ years as a data engineer/scientist, database developer, blah, blah, blah… Bottom line is that I love working with databases.

Here’s a link to their website: Cloud Application Platform | Heroku

I’m not sure how to upload data from my backend app running on my PC workstation would need to securely upload data daily to a server. That said, this commercial service might be worthwhile to provide good answers.

Thanks,

Dan.

3 Likes

Hi,

Thanks @dataprofessor for the shout out. @DcPublished the media explorer app can be configured to run its backend as a monolith embedded in the Streamlit app, or in client-server mode as a standalone Streamlit frontend and FastAPI backend server.

The media explorer app only serves as an example of a flexible client-server design that you can follow. If you want to run it locally, then your media collections can point to local folders, but in the cloud you need to use URLs. You could write a small program that uploaded images to an S3 bucket and built the links list used in the media-service.toml file. Alternatively, you can change the appropriate media list generators in media_service.py. I have another version of this app which uses a sqlite database, for example.

Deploying on Heroku is easy. There are many posts in this forum showing what to do. Since you want to use PostgreSQL I’d suggest you start with a pure Streamlit application and squeeze as much perf from it using data caching and the new partial run decorators. You’ll be able to upload data to Heroku PostgreSQL using the Herkou desktop client, or the tools available in the build pack (I haven’t used it, so just guessing). Why not use Snowflake? There is a data connector that makes it really easy.

Have a look at @C_Quang’s app too: Public-facing, enterprise-grade deployment of Streamlit

3 Likes

Asehmi and dataprofessor,

First, many thanks for the feedback.

Second, I’ve been thinking this through carefully. Although I love databases, I decided that it adds a lot of unnecessary complexity. So I switched back to using a file-based approach. I.e. Including some files in the Streamlit Analysis App checked into GitHub and uploading some files via OneDrive.

I want to use OneDrive because I have a lot of space on it and it’s easily accessible from my workstation. And it appears to be as easily accessible from a Streamlit app running on a cloud server or running from local host.

Attached is a data flow drawing (DFD) of my to-be application environment. I know the Load System App well because it’s 95%+ built and works well. However, I’m fuzzy about Analysis System App and Streamlit Environment. Here are a few questions:

  • Roughly speaking do you seen anything wrong with this overall design?
  • I’m assuming that Streamlit can consume files from OneDrive. But can files be dropped from Streamlit to OneDrive?
  • Suggested improvements to the design?
  • Will this work with Streamlit Community Cloud?
  • Or do I need to deploy my Streamlit app on a Heroku or Snowflake server?

Thanks in advance for any feedback.

Best,

Dan.

p.s. The necessity for a OneDrive load is volume. Once well debugged, the Analysis System App will not be checked in daily, but there likely will be multiple files dropped to OneDrive daily. And some files will be removed from OneDrive daily. Files should note be huge (maybe a few thousand bytes), but there may be a fairly large number of files - all dropped hands free.

3 Likes

@DcPublished, Thank You. i was looking for similar solution.

2 Likes

My original plans for this application was to run everything entirely from the cloud, but I’ve been able to keep all of the various components running without any hiccups for about a month. Unless I’m making changes, it only ties up 10-15 minutes of my time each day the market is open using a local environment as a hub/launching point for 23+ million rows of data using the following:

  • Streamlit
  • Microsoft Power BI
  • Microsoft Fabric
  • PostgreSQL
  • Rest API

Fin Stream

There’s endless possibilities, hopefully you find the one that works best for what your trying to achieve.

Hi @DcPublished

If there’s a Python library that allows the interface of your app with OneDrive files, then sure that works! You can also look into Google Drive as well, I think there’s a Python library that supports this.

2 Likes

I’m not if this repo is officially backed by Microsoft or not, but it appears to have the functionality your looking for.



Another alternative for sharing datasets using Streamlit is embedding a Power BI Dashboard into the application using the public sharing capabilities. The individual report settings for each dashboard allows the administrator/owner/author to either allow or restrict the downloading of data from each visualization within the dashboard. All that’s needed to implement the dashboard is the is a small block of HTML that is provided by Power BI and Streamlit’s components.html() feature. There’s some more info listed below for that option.


import streamlit as st
import streamlit.components.v1 as components


components.html("""<iframe>CUSTOM_HTML</iframe>""", height=600)

First, many thanks to all of you for the great feedback. After some analysis, I realized:

  1. That the benefits of running Streamlit on localhost versus a cloud far outweigh benefits of running Streamlit on a cloud.

  2. One major benefit is that data in my database can be stored and accessed from the Streamlit app ad hoc whenever necessary. Another benefit is that managing the data is much easier with a database. And a major benefit is that it makes it easier to do adhoc, iterative analysis. For example, pulling different types of data based on what the current point in the analysis tells us.

RepNot, your list of products is very interesting and caused me to consider them carefully. From my research, it appears that Microsoft Fabric and Power BI are meant for the corporate market. Given my 30 years in that market, my experience told me that internal customers needed clean, accurate, semantically correct, current, simple data. (“Simple” does NOT imply easy. Far from it.)

Your Fin Stream website is VERY nice but seems to support my point. Many investors and traders will love your Fin Stream website. On the other hand…

My needs are a bit different. I want to combine metrics that you’ve presented with other metrics like linear regressions for multiple datasets, And then match that with sentiment analysis and other metrics for a ticker’s Industry and Sector. And would be loaded into AI models for further crunching. (Yes, I’m a masochist :face_with_spiral_eyes:)

And this brings me back to Microsoft Fabric and Power BI. It seems like a perfect match for your needs, but I’m not sure how it matches my needs. These tools provide a wide scope of services. My needs on the other hand require a narrower but deeper scope of services.

My new DFD is attached. Still needs work but reflects that data will be stored into and accessed from a local PostgreSQL database. It’s a bit rough, but should help explain what I’m trying to.

Many thanks,

Dan.

1 Like

I forgot to mention that I’m considering PyGWalker instead of Plotly which I’m currently using.

This is a library that integrates well with Jupyter (which can run under Streamlit). A nice feature is that it uses Pandas data frames for data retrieval, which can be loaded from delimited files or APIs.

It is being promoted as a competitor to Tableau. I don’t know how well it compares to PowerBI. That said, PyGWalker has a strong cost/benefit ratio. (smile)

Best,

Dan.

1 Like

Repnot,

I tried PyGWalker tonight. Unfortunately I was not impressed. What I was able to bring up did not match the demos I’ve seen. And the documentation was very limited.

After seeing your embedded code and screenshots, I searched further. It appears that Microsoft Fabric is very expensive. But looking closer, PowerBI looks like it’s within my budget.

Can you point me to some info about the PowerBI tools that you are used to integrate with Streamlit?

Best,

Dan

1 Like

Thank you for sharing the info about PyGWalker. The visualization quality looked good on the surface.

As far as the Tableau vs. Power BI debate goes, my personal preference between the two is Tableau, but Microsoft has done a superior job of securing their position as the software provider of choice amongst companies in just about every industry. Tableau’s biggest drawback is that it’s not competitively priced in comparison to Power BI, especially from a user adoption standpoint. The developer version costs $75/month, and the viewer version goes for $15/month. Power BI Pro and Power BI Premium subscriptions go for $10 & $20 a month that includes pretty everything needed to produce reporting solutions that don’t require additional subscriptions for end-users & viewers. In fact, a large portion of Office 365’s business subscriptions include a version of Power BI Desktop without any additional cost. This gives Microsoft an edge in my opinion as an individual is probably more likely to purchase their product for personal use, learning, or career development purposes as it is more affordable and offers pretty much the same functionality and capabilities as Tableau. Tableau does offer a free community desktop version and provides users with community cloud hosting, which offers a lot of appeal that is quickly evaporated as it doesn’t allow files to be saved locally on users devices, in addition to slashing 95% of the options available for data sources. Once the trial version expires, a user is pretty much limited to TXT, CSV, Excel, and Google Sheets for data file types and sources. The $10 and $20 version of Power BI gives developers every data source option that they have available without restrictions and all files can be saved to the user’s device. Without those restrictions, I see Tableau as being the better of the two products in terms of its UI, UX, report & visualization quality, but it unfortunately doesn’t equate to an additional $55-$65/month worth of value as personal expense.

The link below will direct you to Microsoft’s pricing page for Power BI.

As of right now, my total cost for everything equates to $10 a month and about an hour of my time each week, not including my initial time investment developing the automations, databases, and the current dashboard. The only thing that’s 100% clear, is potential future costs associated with utilizing the application/semantic data layer capabilities with my current Power BI cloud workspace. Microsoft Fabric is a new product offering that was introduced in a preview release around the same time I started this project, so I need to confirm that the storage is included in the Power BI subscription and not the trial subscription for Fabric, which I don’t believe it is. The Power BI solution is working well for all intended purposes given that this is a portfolio project and a limited budget. In a perfect world, Snowflake would be my preferred backend solution for both Power BI, Tableau, local and cloud hosted Streamlit applications. It comes with added investment, but they also provide cost visibility that I haven’t experienced with other cloud service providers that paints a clear picture of the costs within 24hours of setting up a new account. In addition to adding the capability to fully automate this application, ease of use & data migration, documentation, and flexible external connectivity are the primary advantages Snowflake data cloud has to offer.

I built the original database for this project on Snowflake Data Cloud while building the automations used for compiling and prepping the data. For budgetary reasons, I ended up seeking out and building other alternatives. The file size for the SQLite version of the database exceeded the 1GB storage limit included with the community cloud version of Streamlit by 5-7GB’s with 21-22 million rows of data stored and it wasn’t performing well. That led to some brief testing with Big Query prior to switching to a local hosted Postgres backend which has worked out well from a cost and performance standpoint. I’m able to connect Power BI desktop directly to my database and store the entire payload in an application/semantic data layer in a Power BI cloud workspace that feeds the dashboard embedded in the Streamlit application. The Streamlit application can be ran locally and utilizes the full backend capabilities of Postgres without any additional software or tooling.

Not sure how far along in the journey you are, but you’re more than welcome to use this code if it would be of any help. It will need to be modified to account for the removal of the SQL, but it will generate back-to-back plots, with the first being a standard regression line with the data values arranged as a scatter plot and the second plot is a time series regression forecast. The coefficient, intercept, and forecast values print to console. I added a few comments marked with the [HACK] tag for informational purposes.

For time series purposes, I don’t see linear regression as being a reliable forecasting method but does offer value in quantifying correlations between data when the relationship is either unknown and, or not easily explainable. Using multiple moving averages combined in a single visualization seems to offer more clearly identifiable patterns clearly depicting trends over time. My interest in all of this from a machine learning standpoint is geared more towards utilizing the plots and images generated from visualization tools for visual analysis and deep learning, as it seems to be an area of research that is potentially untapped.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import streamlit as st
from pprint import pprint
import psycopg2
import numpy as np

conn = psycopg2.connect(
    database="fin",
    host="localhost",
    user=st.secrets.db['user'],
    password=st.secrets.db['details'],
    port=st.secrets.db['port']
)

cur = conn.cursor()

statement = """
    SELECT
        ROW_NUMBER() OVER (ORDER BY date asc) AS period,
        date,
        round(close, 2) as close,
        round(close - open, 2) as change_amount,
        round((close - open) / open, 3) as change_rate,
        round(
            avg(close)
                over(order by date desc
                    rows between 3 preceding and 1 preceding), 2) as CLOSE_MAVG_3D,
        round(
            avg(close)
                over(order by date desc
                    rows between 5 preceding and 1 preceding), 2) as CLOSE_MAVG_5D,
        round(
            avg(close)
                over(order by date desc
                    rows between 7 preceding and 1 preceding), 2) as CLOSE_MAVG_7D,
        round(
            avg(close)
                over(order by date desc
                    rows between 10 preceding and 1 preceding), 2) as CLOSE_MAVG_10D,
        round(
            avg(close)
                over(order by date desc
                    rows between 12 preceding and 1 preceding), 2) as CLOSE_MAVG_12D,
        round(
            avg(close)
                over(order by date desc
                    rows between 15 preceding and 1 preceding), 2) as CLOSE_MAVG_15D,
        round(
            avg(close)
                over(order by date desc
                    rows between 20 preceding and 1 preceding), 2) as CLOSE_MAVG_20D,
        round(
            avg(close)
                over(order by date desc
                    rows between 30 preceding and 1 preceding), 2) as CLOSE_MAVG_30D
    FROM DAILY_PRICE_HISTORY
    WHERE COMPANY_ID = 3396
        and date between '1/23/2021' and '1/23/2024'
    order by date asc
    LIMIT 1095;

"""

cur.execute(statement)
data = cur

#[HACK] THE VARIABLE DATA IS A TUPLE OF TUPLE 
data = data.fetchall()  
conn.close()
#[HACK] ALL CODE BELOW THIS LINE IS REUSABLE

stPeriod = [x[0] for x in data]
stDate = [x[1] for x in data]
stClose = [x[2] for x in data]
stChangeAmount = [x[3] for x in data]
stChangeRate = [x[4] for x in data]
mavg_3d = [x[5] for x in data]
mavg_5d = [x[6] for x in data]
mavg_7d = [x[7] for x in data]
mavg_10d = [x[8] for x in data]
mavg_12d = [x[9] for x in data]
mavg_15d = [x[10] for x in data]
mavg_20d = [x[11] for x in data]
mavg_30d = [x[12] for x in data]

data = {
    "PERIOD" : stPeriod,
    "DATE" : stClose,
    "CLOSE" : stClose,
    "CHANGE_AMOUNT" : stChangeAmount,
    "CHANGE_RATE" : stChangeRate,
    "MAVG_3D" : mavg_3d,
    "MAVG_5D" : mavg_5d,
    "MAVG_7D" : mavg_7d,
    "MAVG_10D" : mavg_10d,
    "MAVG_12D" : mavg_12d,
    "MAVG_15D" : mavg_15d,
    "MAVG_20D" : mavg_20d,
    "MAVG_30D" : mavg_30d,
}

data = pd.DataFrame(data)

x = data["PERIOD"].values.reshape(-1, 1)
y = data["CLOSE"].values.reshape(-1, 1)

reg = LinearRegression().fit(x, y)

m, b = reg.coef_, reg.intercept_
print(f"m = {m}\nb = {b}")

plt.plot(x, y, 'o')
plt.plot(x, m * x + b)
plt.show()

fc_period_x = np.append(data["PERIOD"], np.arange(1096, 1462))
fc_period_x = fc_period_x.reshape(-1, 1)

forecast = reg.predict(fc_period_x)

print(str(forecast))

plt.scatter(data["PERIOD"], data["CLOSE"])
plt.plot(fc_period_x, forecast, "r--")
plt.title("Closing Price Projections")
plt.xlabel("Period (Days)")
plt.ylabel("Close")
plt.show()

SNOW - CURRENT OBV - 02/09/2020 to 02/09/2024



Hi,

I would create separate modules for each major piece of functionality, especially for the data processing parts. These should be written as plain old python objects (POPOs). Then I would layer a CLI on top of those so that they can be executed manually as required. Use Typer to implement the CLI. Test your core functions fully at this stage via the CLI (you can build a separate CLI test harness, use Jupyter, or even a test Streamlit app). Next you can layer an API on the modules, using FastAPI. This API can perform similar functions as the CLI, and expose endpoints which can be called from your Streamlit app (or any web app). You’ll find the API is mostly just mapping between POPOs and JSON and calling your module functions (like the CLI).

This structure gives you a lot of deployment flexibility and you can focus on adding different levels of capability in the API, CLI or Streamlit app as you see fit. For example, you can easily schedule jobs or build a data pipeline from your modules.

The key thing is the business logic (your modules) is encapsulated and reusable in different ways (in CLI, API, Streamlit app, etc.). Over time your modules can take on their own structure to reflect their independent responsibilities.

Start with a solid initial design and you’ll be able maintain and evolve your system more easily over time. Streamlit encourages building monoliths which is great up to a point. Thereafter, you should introduce a few simple modular design principles into the app architecture… before your app starts “doing things to you, rather than for you!”

Solely my opinions. Your milage may vary.

Arvindra

1 Like

First, many thanks for all the great feedback. Great direct and indirect ideas.

This is a data analysis/reporting app with a small data mart as the base layer. A key characteristic about data warehouses and data marts is that they store data from multiple sources, transform it, and attempt make the dataset consistent. There are multiple other characteristics bit those are the critical ones. And of course, it must meet business needs.

How the data is distributed and transmitted are important, but these are lesser requirements. Which brings me to the driving force that pushes me: generating signals for trading and investing.

The question is how? To help answer this, I created a simplified Process Flow/Data Flow Drawing (attached below) for discussion. Some details:

  • All current processing is on a robust PC workstation. It will be upgraded if necessary (memory, cpu, disks, GPU). Note that upgrading a PC workstation is not super-expensive these days. For example, high-quality NVME drives cost $165 for a two terabyte drive that runs more than three times faster than the same size drive two years ago. And adding 128GB can cost less then $300. Certainly not cheap, but bang for the buck is pretty high now.

  • A PostgreSQL database is used to store structured data - mostly financial transactions.

  • Moving forward, a MongoDB (or other NoSql) database will be added to store unstructured data from external sources (sentiment, news, events, sector/industry data, etc). Tools to analyze this is TBD. I need feedback on this.

  • Data is processed daily using downloaded daily and intraday data. It iterates through approximately 1,300 tickers, looking for tickers that meet criteria. I.e. generate a “signal” ticker.

  • These signal tickers are then analyzed using other methods and other data (both structured and unstructured) as a second level. At this point, it’s not clear which tools to use. Probably Jupyter and PowerBI. I tried several graphical/analysis packages and libraries, but all had issues. But these fit my needs the best. HOW I use them is an open question. I appreciate any feedback.

  • The results can be presented for trading/investing using either Streamlit or PowerBI. Again, this is still TBD. Any feedback is appreciated. Note that virtually no other users will use this system. (Not 100% sure on this yet.)

Best,

Dan.

1 Like

That makes sense and think your project will work well. I didn’t mean to impose, the context of this project caught my attention and seemed to have several things in common with some of the projects I’ve been working on. For example, I recently published a separate dashboard to my website presenting U.S. Power Generation Data that stored to the same database that houses my stock market price data under a separate schema. While the two dashboards are presented as separate projects, the data from each schema can be combined as views as they’re housed in a central location.

Just sharing an interest in your idea of combing data from separate sources, your on the right path. To add merit to your concept, the following report shown below initially seems unrelated to stock market analysis, however it presents information that is very much directly related. My findings from this analysis has shown that the amount of electricity being generated in the U.S. using Natural Gas as a primary energy source has quadrupled over the past 10-12 years and now accounts for approx. 30% of all electricity produced in the country and 50% of the countries total generation capacity, indicating strong demand (or dependency) for natural gas moving forward. This supports further research into publicly traded companies with operations involved in the transportation of natural gas, natural gas liquification, natural gas exploration & extraction, and natural gas retailers.

Again, I think your project ideas and concepts are excellent and will be successful. I wish you luck going forward and would be interested in seeing your project once its completed. Feel free to message me directly anytime, as I’m very interested in this subject area.

You describe a classic data pipeline scenario (choose your flavour: ETL, ELT, Data Mesh). Your case sounds straightforward enough to hand craft the pipeline, but I may be wrong. I imagine your system is largely unattended, so you’ll want decent error handling, telemetry, and observability, then start using a workflow orchestration tool. In the python world “celery” is very popular, but you can look at newer tools like “pathway.com”, “prefect.io”, “kedro.org” (I wrote a Streamlit blog post on that.) Building modular processing units, as I mentioned earlier, is ideal for orchestration using these tools. Those newer tools will be able to use Redis, which would be super handy in your system as a data staging area and as a pub/sub to trigger your workflows.

1 Like

Repnot,

Very nice looking webpage. Clean and clear. It shows the power (pardon the pun) of using a database to store and retrieve data.

I don’t know your schema, but I suspect that you have a calendar table. And can be (or is being) used for multiple applications. I have used calendar tables in multiple databases and have expanded them over the years to include multiple dimensions of data. This can save a ton of work.

One small issue… While the graphs on the right side are very understandable, I don’t understand the color coding for the two graphs on the left.

After going to your website, it’s a bit more understandable, but it still a little confusing to me.

Please consider adding legends and titles. When a user clicks on part of a map or table, the title and maybe the legend will change.

And if possible, keep the colors consistent. For example, clicking on Texas in the map, shows the highest megawatts generation to be a red bar with a “Natural Gas” label.

On the other hand, clicking on Washtington State shows the highest generation to be Hydroelectric, but it is also colored in red. (I live in Washington State, so I believe your numbers are correct.) The colors confuse me.

In general, this is an excellent website and in general demonstrates the strengths of PowerBI. The static screenshot doesn’t do it justice. I strongly encourage others to visit the website at: Power BI Report.

FYI, I just figured out how to message you directly. Check messages.

Best,

Dan

1 Like

asehmi,

Yes, this is a classic pipeline scenario. But that is completely intentional because I’ve been creating pipelines since about 1998 on multiple enterprise systems (SQL Server, Azure, Teradata, Oracle, PostgreSQL and several others) using everything from simple batch scripts to C#, Visual Basic, Perl, SQL scripts, Azure pipelines, Databricks, and Python (and probably a couple others I’ve forgotten).

Looking at my data/process flow drawing, ALL of these solid lines are dataflows written in Python and PostgreSQL stored procedures. Many people think Python is slow. That is true to some extent, until you use tools like Pandas data frames, numpy, and other libraries, and leverage PostgreSQL functions and stored procedures. Then it can be very fast. And almost all of the processing is in batches - very little is in row-by-row processing (called “RBAR” by data warehouse pros - Row-By-Agonizing-Row).

PostgreSQL is my “staging” area. It holds master data and control data, as well as application data. Right now, it has something like 10 million rows in the tables, down from 100 million after some judicious trimming.

Virtually all of the processing is done in the Python app and PostgreSQL database. This makes it much easier to access any data whenever needed, knowing that it’s all in one place.

After Derek turned me on to Power BI, I can see that it will be very useful for what I’m doing. So, I’m about ready to pull the trigger on a Power BI Pro subscription - the lowest level subscription.

Workflow is done in Python, with the Python app being kicked off by Windows Task Scheduler (TS). Right now, TS runs the Python app directly. That said, it would be a simple matter of writing a wrapper app for more sophisticated control.

Note that I’ve used commercial workflow apps before. The major issue I’ve found that they (like virtually everything else) have brick wall you can run into. And when you do, it can be a bear to get around that brick wall. For example…

I once had a contract with a large, Seattle-based coffee company. They used a sophisticated IBM workflow system that could only be used if you licensed a “seat”. The problem was that it cost $4,000 per seat in the mid-2000’s. I was creating software for them and needed an algorithm they used to spit out a specific metric. But that algorithm was buried inside this IBM software and not documented anywhere. I had to wait a MONTH before the company finally realized that they needed to fork over $4,000 to get me the needed license.

When I finally got the license, it took about an hour to figure out how to use the system to find the algorithm. Then about 10 minutes to copy it down and exit. MAJOR wasted time. I can give further examples, but I’ve got some concerns with these systems.

Anyway…

After downloading PowerBI Desktop (free version), it only took me a few minutes to get it running and connected to my PostgreSQL database. While I’ve had years of experience with reporting systems (including Tableau), I’ve never used PowerBI. Unlike other tools I’ve tried, it was extreme stable, easy to use, and had lots of features.

Right now, I’m trying to figure out the right balance of Streamlit, Jupyter, PowerBI, Python, and some AI libraries. It’s going to be fun.

Best,

Dan.

1 Like