Jump to content
Chinese-Forums

Site Outage


Recommended Posts

Posted

Thanks so much for the update! The ongoing enshittification of the internet had me worried that I had returned to the forums just in time to see them mysteriously vanish forever. Glad to see my worries were entirely unfounded.

 

Posted

Update: it seems like it was the latter case in #1 - a couple of AI bots were making an inordinate number of requests per second and so for the moment I've turned on proactive blocking for those. (literally half of the requests we received in the first hour after going back online were from a single IP address belonging to a large American tech company)

 

I would not presume to decide for the entire site whether to allow AI training - though that's probably a discussion we ought to have - but when they degrade service for everyone else, they become a server administration problem, and at that point it's an easy block.

  • Like 2
  • Helpful 1
Posted

Human to human exchanges are used to train AI.  Redditt recently made lucrative deals regarding this:  https://www.reddit.com/r/investing/comments/1gfadkb/reddits_shares_surge_on_ai_data_licensing_deals/

 

One thing that surprised me is that the email system from Enron (that went bankrupt >20 years ago) is being used to teach many AI systems.  The book I read didn't explain why Enron's email was used, but I'm guessing it probably became public during litigation during the bankruptcy.  

 

The Wall St Journal noted that the internet is not big enough to train the next version of Chatgpt, because the internet lacks sufficient info.  AI is likely now looking in every spot to teach itself.  

 

Hence, it's easily understandable that AI wants to use Chinese forums for training.  It's too bad the site can't benefit from this and instead is being hurt by it.  It's why so many sites now have captchas - to prove the reader is human and deter AI.  

Posted
On 3/2/2025 at 4:08 PM, mikelove said:

I would not presume to decide for the entire site whether to allow AI training - though that's probably a discussion we ought to have

Absolutely not, if you ask me. I do not give permission to have my 20 years' worth of posts used to train any AI.

  • Like 2
Posted

It's not entirely possible to stop AI training because this is an open website anybody can scrape - indeed, we want it to be accessible to search engines like Google because they send us traffic. Basically the most we can do with AI specifically is a) ask them not to in robots.txt and b) proactively block them by monitoring behavior and banning any client that looks like an AI bot. After this incident we're now doing both of those things.

 

One other measure we could theoretically take would be to require a login to view posts, but I don't know whether that's good on balance, because it also imposes a burden on people who find their way here through a web search looking for information, a group that I believe most people here would be happy to help.

 

This site is not under a Creative Commons license or in any other legal way structured so that anybody has a *positive* right to train an AI on it - they only have the rights to do it if courts decide that it's fair use (TBD), or if they obtain permission from the copyright holder. I haven't revised Roddy's site T&C and I don't believe in their current form that they require anybody to assign copyright to anything they've posted, and as a matter of principle I would be very much against that anyway, so I don't believe that as currently structured I would have the right to license everybody's content to an AI company even if I wanted to, which I very much don't.

  • Like 1
Posted
On 3/2/2025 at 5:37 PM, mikelove said:

so I don't believe that as currently structured I would have the right to license everybody's content to an AI company even if I wanted to, which I very much don't.

Glad to hear it.

 

I understand the site is in general open to Google and other crawlers, and that is still a good thing, I think (how else would people find us).

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...