Detecting Data Anomalies with Benford's Law and Streamlit

Hello, Streamlit community!

I’m excited to share a project I’ve been working on—a Streamlit application designed to validate datasets using Benford’s Law . For those who may not be familiar, Benford’s Law is a fascinating observation about the frequency distribution of leading digits in many real-life sets of numerical data. It’s a powerful tool for detecting anomalies and is widely used in fields like auditing to identify potential data manipulation or fraud.

The application allows users to:

  • Upload data from a file (CSV, Excel, or TXT).

  • Paste numerical data directly into a text area.

  • Visualize the observed frequency of leading digits against the expected distribution according to Benford’s Law.

  • Set a deviation threshold to automatically flag significant anomalies.

This tool is a great example of how Streamlit can be used to create interactive and practical applications for data analysis and validation. I’ve put together a video to demonstrate how it works:

Video

Note on Audio: The original audio for the video is in Spanish . However, YouTube’s automatic dubbing feature is enabled , so you can easily switch to the English audio track to follow along.

I hope you find this project interesting and useful. I’d love to hear your feedback and see if anyone has ideas for expanding its functionality!

Benford’s Law is such an interesting concept, thanks for sharing!