People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

January 3, 2018

People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

The way that machine learning works is basically this: you input some models, let’s say of what tables look like, and then the code generates some things it thinks are tables. You click yes on the things that look like tables and the code reinforces the processes that made those and makes some more attempts. You rate again, and with each rating the elements of the process that produce table-like things are strengthened and the ones that produce non-table-like things are weakened.

It doesn’t have to be making things — it can be recognition as well. In fact, as long as you have some human feedback in the mix you can train an machine learning process to recognize and rate tables that another machine learning process makes, in something called a generative adversarial network.

People often use machine learning and AI interchangeably (and sometimes I do too). In reality machine learning is one approach to AI, and it works very well for some things and not so well for others. So far, for example, it’s been a bit of a bust in education. It’s had some good results in terms of self-driving cars. It hasn’t done great in medicine.

It will get better in these areas but there’s a bit of a gating factor here — the feedback loops in these areas are both delayed and complex. In medicine we’re interested in survival rates that span from months to decades — not exactly a fast paced loop — and the information that is currently out there for machines to learn from is messy and inconclusive. In learning, the ability to produce custom content is likely to have some effect, but bigger issues such as motivation, deep understanding, and long-term learning gains are not as simple as recognizing tables. In cars machine learning has turned out to be more useful, but even there you can use machine learning to recognize stop signs, but it’s a bit harder to test the rarer and more complex instances of “you-go-no-you-go” yielding protocols.

You know what machine learning is really good at learning, though? Like, scary, Skynet-level good?

What you click on.

Think about our tables example, but replace it with headlines. Imagine feeding into a machine learning algorithm the 1,000 most shared headlines and stories, and then having the ML generate over the next hour 10,000 headlines that it publishes by 1,000 bots. The ones that are successful get shared and those parts of the ML net are boosted (produce more like this!). The ones that don’t get shared let the ML know to produce less along those lines.

That’s hour one of our disinfo Skynet. If the bots have any sizable audience, you’re running maybe 20,000 tests per piece of content — showing it to 20,000 people and seeing how they react. Hour two repeats that with better content. By the next morning you’ve run millions of tests on your various pieces of content, all slowly improving the virality of the material.

At that scale you can start checking valence, targeting, impact. It’s easy enough for a network analysis to show whether certain material is starting fights for example, and stuff that starts fights can be rated up. You can find what shares well and produces cynicism in rural counties if you want. Facebook’s staff will even help you with some of that.

In short, the social media audience becomes one big training pool for your clickbait or disinfo machine. And since there is enough information from the human training to model what humans click on, that process can be amplified via generative adversarial networks, just like with our tables.

It doesn’t stop there. The actual articles can be written by ML, with their opening grafs adjusted for maximum impact. Videos can be automatically generated off of popular articles and flood YouTube.

Even the bots can get less distinguishable. An article in the New York Times today details the work being done in ML face generation, where believable fake faces are generated. Right now the process is slow, partially because it relies solely on GAN, and because it’s processor intensive. But imagine generating out a 1,000 fake faces for your bot avatars and tracking which ones get the most shares, then regenerating a thousand more based on that and updating. Or even easier, autogenerating and re-generating user bios.

You don’t even need to hand-grow the faces, as with the NYT article. You could generate 1.000 morphs, or combos of existing faces.

Just as with the last wave of disinformation the first adopters of this stuff will be the clickbait farms, finding new and more effective means to get us to sites selling dietary supplements, or watch weird autogenerated YouTube videos. There will be a flood of low-information ML-based content. But from there it will be weaponized, and used to suppress speech and manipulate public opinion.

These different elements of ML-based gaming of the system have different ETAs, and I’m not saying all of this is imminent. Some of it is quite far off. But I am saying it is unavoidable. You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops. The two things fit together like a lock and a key. And once these two things come together it is likely to have a profoundly detrimental effect on online culture, and make our current mess seem quite primitive by comparison.

Mike Caulfield

Posted by:

mikecaulfield

The infolit guy.

3 responses to “People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough”

https://hapgood.us/2018/01/03/people-are-not-freaking-out-over-machine-learning-clickbait-and-misinformation-nearly-enough/ "You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business

January 3, 2018 at 5:57 pm

[…] https://hapgood.us/2018/01/03/people-are-not-freaking-out-over-machine-learning-clickbait-and-misinf… "You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops." […]

Reply
Info Literacy Learning Community – Week 4 Newsletter – Keegan Long-Wheeler

March 6, 2018 at 12:04 pm

[…] Reading: People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough by Mike Caulfield […]

Reply
Squad Goals Network | Info Literacy Learning Community – Week 4 Newsletter

August 26, 2018 at 9:29 pm

[…] Reading: People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough by Mike Caulfield […]

Reply

About Me

Mike Caulfield is a research scientist at the University of Washington’s Center for an Informed Public, where he studies the spread of online rumors and misinformation. Creator of the SIFT methodology, he has taught thousands of teachers and students how to verify claims and sources through his workshops. His new book with Sam Wineburg, Verified: How to Think Straight, Get Duped Less, and Make Better Decisions about What to Believe Online, was published by the University of Chicago Press in November 2023.

Hapgood