I registered my website on the Google Search Control.
Google indexed my site but does not add it to the index because it consider it a copy of https://awesome-streamlit.org/ that it sets as the canonical version.
I guess I should set a “user defined canonical” but I have no idea how to do it. My site runs on nginx.
I understand it is a part of nginx. In that case yes. I have already changed the config file to let google access the site map file. Is it a similar process?
I googled around but could not find anything to fix my problem. They talk of internal canonical pages (many pages with same content in a given website).
The issue is that google thinks that my site is a copy of the awesome streamlit site.
It is a google algorithm that guards against copycat sites. Since google thinks my site is a copy, google does not add my ‘copycat’ site to the index. I do not think I can simply add a parameter somewhere saying “I am not a copycat“ because in that case all the copycat sites would do it and that would defy the aim of google.
The other website that had the same problem (link above) did not solve it. What they did was make a separate standard site that google crawls and put a link on that site to the streamlit app. Not nice.
I wonder if this is a common issue or it happens only if the streamlit app has some particular characteristics that make google think that it is equivalent to the awesome streamlit site.
My markdown content has nothing in common and the code has nothing in common.
So, by exclusion, is it that my script is relatively long (6000 lines)? So if you want to be indexed keep you code under lines?
I have a dedicated server. I do not use any CDN service like Cloudflare (expect if Hetzner the hosting company I get my server from uses such services).
I use nginx. I understand that nginx has a CMS function. I can share my nginx configuration if it helps.
I am new to this so I am not sure what you mean by CMS, but other than nginx I have nothing installed on the server.
My problem (and these people^s unsolved problem as well https://support.google.com/webmasters/thread/45113463?hl=en ) is that Google thinks my website is a copy of an external website. The medium thing is about three copies of the same stuff in a give website.
Google crawls my website and says
“Oh this looks really like a copy of awesome-streamlit.org. So I will not put it in my index because I do not want Randy to end up on a copycat site.”. Basically what streamlit “exposes” to the Google spider is the same for my site and for awesome-streamlit even though the content is completely different.
But since Google has invested in steamlit. I have hope
Right, but those posts show how to declare the canonical link. From your first post, it appears that because there is no user-declared canonical link (i.e. None), then Google selects it to be awesome-streamlit. If you set the canonical value using nginx, I would hope Google would respect that.
I have a question. Is this something that happens only to me? I was trying to understand what google is looking at to dedide my page is copycat. If it happens only to me with 250000 streamlit application out there it should be fairly easy to understand what pisses Google off…
Otherwise either most of these 250000 are not interested in Google indexing or they do it by putting the content in some site from where the link to the streamlit app.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.