web analytics

AI Bots on X (Twitter) – Source: www.schneier.com

Rate this post

Source: www.schneier.com – Author: Bruce Schneier

HomeBlog

Comments

Tom


January 22, 2024 8:12 AM

Something I’ve been expecting to see for a while now is LLM chatbots asking questions on sites like reddit and stack overflow so that the answers can be used to further train the model. But how would we know it was happening?

Clive Robinson


January 22, 2024 9:58 AM

@ Tom, ALL,

Re : LLM interrogators.

“But how would we know it was happening?”

Because they currently,

“Speak lightly randomized average.”

Every example I’ve seen so far has that vague feeling of “Marketing Speak” or “Ad copy” and “lack of Empathy”.

Almost like one of those dred psychopathic carefully prepared “PR Spokes persons” syatments. Given by some one who stands in front of the 24hour News cameras and spouts the faux “We understand your feelings of loss at this time and our organisation offers it’s sympathies…” when the organisation has figuratively “crashed and burned” the bus by “driving it off the cliff” as an almost direct result of the policies laid down by senior managment.

Whilst we can spot this now, things will change, faking human warmth, empathy and similar is surely high on the “todo list”.

However if you are able to ask questions LLMs will just give average replies from a limited stock or “go way off of the reservation” (so called AI Hallucination).

Currently the only way for operators of LLMs to stop this is by “Real Life”(RL) “Human Intervention” which can not be done in “Real Time”(RT) currently even with ML giving corpus feedback.

These are “obvious tells” against LLMs even with ML adjustment.

However these “obvious tells” will get to be either faked out or mitigated out. One such way is to stop “unexpected questions during human interlocution”.

One way is ensure only questions which have already been answered by the LLM safely.

It’s a game and the amount of money to be made from even LLM AI without human or AI feedback is immense for a chosen “self selecting” few. Therefore the likes of Micro$haft and Giigle who have both most to gain and most to loose from interogative AI are going to invest a large chunk of change ironing the more obvious tells out.

The ability to remove overt tells is something you realise has been going on for some time now when hearing just a few “prepared” PR Disaster statments from various organisations. But there are still a whole slew of less obvious tells. Working out how to “covert question” to make these less obvious tells more visable is a skill that some have already developed…

So expect this to develop into an “arms race”.

Clive Robinson


January 22, 2024 10:30 AM

@ Bruce, ALL,

“I hadn’t thought about this before: identifying bots by searching for distinctive bot phrases.”

That is a very “obvious tell” almost “first order”.

Then there is less obvious tells based around behaviours such as response times.

Then you get into comprehension tells,

Then style tells.

And so on.

The problem for the AI owners trying to fake human response, is the massive level of load it puts on the bots.

One way they will try to get around this is by “sniping tactics” or “drive by comments”.

The thing is not only does the load increase for the AI operators, trying to close down one type of tell just opens up a diffetent type of tell.

Consider it like a Gus Simmon’s Prisoner Problem. As each tell gets in effect randomised the level of redundancy must go up. As the redundancy goes up the information bandwidth also has to go up. Thus any side channel bandwidth also goes up…

Like Traffic Analysis reveals by meta-data analysis the same applies to finding bots.

Importantly though, it’s not just the presence of meta data it’s also it’s absence, especialyy by meta-meta-data analysis.

The thing is there are some real experts out there looking at meta-meta-data and the statistics of the holes it creates. It’s been an less than obvious but growing part of daya forensics that first kind of came to light with what some called “The archeology of hard drives” where tampwring could be spotted because those tampering did not fully understand the OS algorithms for using hard drive sectors and the likes of free lists and how HD performance was enhanced by making sure “minimum head travel” techniques were employed.

I’ve occasionally talked about it when also mentioning “Paper, Paper, NEVER data”. The process of “printing out” or putting files sequentially through a converter to a “virgin drive” that is “non-journaled”. Such processes remove the basic stratification many forensic architecture techniques rely on… Further it puts things in a known time order and mostly –but not always– removes tail end cluster issues, where data in buffers has not been overwritten befor the buffer gets written to the drive.

bisento


January 22, 2024 11:00 AM

One could spot AI generated content even on amazon by looking for error messages


https://futurism.com/amazon-products-ai-generated

A sideboard description reads “I’m sorry but I cannot fulfill this request it goes against OpenAI use policy,”

These are the early stages. Its all downhill for the web from here (some say since the eternal september)

Clive Robinson


January 22, 2024 11:33 AM

@ bisento, ALL,

From the “futurism.com” article you link to,

“lists a variety of goods ranging from dashboard-mounted compasses for boats to corn cob strippers and pelvic floor strengtheners.”

It does not say if it was an “all in one” list. Such a list of product features in one could potentially bring tears to your eyes 😉

@ ALL,

But on the more serious side, unless it’s a large “General Store” type site, that eclectic range is a clear indicator of,

“Avoid at all costs as you will be ripped off.”

For those that have watched “Futurama” from the same people who did the “Simpsons” they regularly did spoof adverts that would bring a mixture of laughs, winces, and tears to your eyes, just thinking about them.

Thay say that parody holds a mirror to life, well sometimes life just goes the extra light year.

Clive Robinson


January 22, 2024 12:27 PM

@ Bruce, ALL,

This from Cory Doctorow is probably relavant,

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

It mostly agrees with things I’ve been saying, but before anyone asks,

“No I’m not Cory in a wooly over the head one piece, nor vice versa.”

What Cory briefly notes but does not amplify on is that the jobs that get hit by AI will be those of the upper middle class well educated. Something you don’t hear a lot about. It’s going to happen this way because AI is very expensive, so can only compeate with those on high wages. And that is not something that is going to change unless there is a very real break through in the way we design micro electronics / nano machines. The human brain runs on less power than a netbook and not much more than some pocket smart devices. In almost all more than basic tasks humans get to the solution faster. Richard Feynman noted Nobel Physicist told a story about a man using an abacus. He was quick and accurate on the basic stuff but complexity left him in the dust.

The same is actually true for all AI sysyems we currently have, and it’s unlikely to change any time soon.

The other thing Cory did not mention that I almost always do is that LLM AI is the next generation of “surveillance technology” for the likes of Giigle and M$ and I’ll be honest and say as it’s predicated on the supposed value of “Personal and Private Information”(PPI), they are likely to get a very cold bath.

As can be seen with the continued downward slope on advertising income for X/Twitter, and similar issues for Meta, the PPI market is a bubble that is deflating and in all honesty the AI Bubble is to the PPI Bubble what a boil is to a pigs backside.

My view is LLM AI has missed it’s spot in the spotlight, thus the real question is,

“Are there other types of AI comming down the pipe, that will keep the AI bubble inflated?”

So far the signs do not look good on this. Thus I suspect LLM’s to go the same way as Crypto-Coins and ML to be much like the following NFT market.

Where Cory is spot on is that of “Cost” it realy does Kill. And so far “Humans are way less expensive thus more productive for any metric where cost is included.

And it’s why I can see only very high waged niche jobs on the edges of academia with respect to teaching not research being effected. Though those knowledge workers such as traditional librarians that support researchers will get hit fairly hard unless they broaden their services beyond those where AI can compete.

lurker


January 22, 2024 12:45 PM

@bisento

The scary thing about those Amazon blurbs is that they were bots using AI for the product placements. When those bots get tuned to filter out the AI-GPT disclaimers, is time to go back to the village market, run by people, for people.

@Tom

Right now the chatbot GPTs have no curiosity, they are unable to ask questions by their construction, and like witnesses under (cross-)examination in a law court, by the rules of the game. When it happens I expect like @Clive that we will detect it by the language they use, and by the disclaimers “I’m sorry OpenAI policy prevents me asking this,…”

lurker


January 22, 2024 7:17 PM

@David in Toronto

The internet died seven years ago, or earlier …

‘https://en.wikipedia.org/wiki/Dead_Internet_theory

Snarki, child of Loki


January 22, 2024 7:33 PM

“parody holds a mirror to life”?

Absolutely. But all too much of modern culture is parody-invariant.

“if x is some statement, Px is the parody of x (i.e., parody transform of x). Normal statements have Px=-x: the statement gets ‘inverted’ when subjected to a parody transform.

But when Px=x, it’s parody-invariant and has the same meaning even when parodied”

(Some will note the similarity to “parity”. And so it is)

Clive Robinson


January 23, 2024 1:28 AM

@ David in Toronto, lurker, ALL,

Re : It’s not just the baby with the bath water…

“In relate[d] news, the Internet is proving to be a race to the bottom”

And down the plug hole it went…

As I’ve noted before LLM AI has no intelligence, it’s a “matched filter” that is tuned to match with “average” plus a little random noise.

Think of it if you like, as standing above a crowd at an outdoor event. Untill something causes the crowd to “get” not just “on message” but also saying it “in sync” what you hear is a jumbled mess of random.

You can tune your system for say “Male Germanic” frequencie ranges and phonems (as mobile phone CLEP does) but in the process you loose other voices.

Keep tuning to what is average and all you get is more of the same average plus noise…

One of the reasons “fan chants” are simple is to increase the “on message” and “in sync” effect so,

“The chant rises above the rest”

And in the case of Sports Fans hopefully “drowns out the rest” or atleast “the other team message”.

So the LLM AI does the same…

1, “tunes in”


2, gets “in sync”


3, thus “On message”


4, and “shouts it out”


5, to “drown out the rest”

I’d say it’s “working as designed”.

The only problem is “tunes in” is what the user asks for… Consider the average of “garbage’ is still garbage but with less usefull content (ie low pass integration). To “fake content” the LLM adds “shaped noise” but the average of “shaped noise” is the integration of the shaping curve… So the result is the “shaping curve” gets reinforced, and if not correctly restrained it “howls around” just as an audio system does when the microphone picks up the speaker output.

So in essence it’s a “feedback GIGO” system at work… It even howls but they give it fancy names “AI Hallucination” being just one such “nonsense phrase” designed to cover up there is no “intelligence” what so ever in the system.

The fact Mr AI Sam Altman is easily bamboozled by a journalists simple question and flaps around like “a fish out of water” should be telling people something important.

“LLM AI is a con game designed to seperate idiots from their money.”

It’s not even a new idea, in fact it predates computers by a century or two[1].

See Mozart’s “Musikalisches Wurfelspiel” though he was not the first, he’s the most remembered.

He was also a bit of a practical joker. He wrote a piece of music to be played on the piano. He produced it at a party as a new composition. Every one gathered around to hear the parties host play it… But he could not as the music had the player with his hands far appart on the keyboard and a note from the middle was needed… The host reasonably said after trying, it could not be played. Mozart said it could and after a little banter sat down and proved his point, by hiting the note with his nose, much to everyones laughter (yup entertainment was a little limited back then 😉

Broadly Mozart’s musical dice game system was,

You throw two dice and add them together to get a number. Repeate this sixteen times. Use the numbers as an index into musical phrases. Play the phrases in the order you wrote down the numbers to get your very own Minuet.

There are two basic things to remember,

Firstly all the musical phrases must start and end at an average point otherwise the resulting minuet will sound “discordant”.

Secondly adding two dice together does not give a flat distribution, but a crude “first order approximation” to the “Normal Distribution Curve”[2]. So you need to put your musical phrases in an appropriate order in the table.

In more recent times people have even “computerized” this…

http://www.lottemeijer.com/create/?p=286

The important point is LLM’s are the language version of a “Musical Dice Game” the difference is the users query produces a much more complex distribution to modulate the table by (the table is in effect encoded by the weights in the “neural network”).

So looking behind the Wizards curtain reveals there is no magic, which is why Sam Altman stumbled with the journalists question.

There is a song from the mid “sixties” from the “Mamas and Papas” that has the words,

“You’re going to trip, stumble and fall”

https://m.youtube.com/watch?v=t6EgQFXYxbg

Sam Altman has done the first two steps, thus the question is “how long before the fall?”

In fact the whole song lyrics could be sage advice for Sam and friends at OpenAI and other “bandwaggon establishments”,

If you turn off javascript you can read the lyrics,

https://songmeanings.com/songs/view/3530822107858646547/

[1] “Musikalisches Wurfelspiel” basically translates to “musical dice game”. They were popular in the latter half of the 1700’s when “Home Entertainment” even for the very rich was at best limited… )think about it like those Commercial Radio Stations at the end of the 1900’s which had a “half the tunes every hour must be “Top Ten” policy). So variety, any variety was very definately “the spice of life”.

https://en.m.wikipedia.org/wiki/Musikalisches_Würfelspiel

[2] As Donald Knuth explains in his discussion about generating random numbers with a non flat distribution the more dice values you add up the closer the approximation gets.

Jos


January 23, 2024 2:15 AM

@Tom

On LLM questions:

That’s how Quora.com operates nowadays, although there is an opt-out for answer writers.


They used a “partner program” which rewarded users to ask questions for a while, before terminating it (likely they trained the automated process sufficient and a fair number of humans were just spamming the site with garbage questions).

Based on what happened/happens on Quora, I expect that automated question generation would show a fair increase in questions where you would start to wonder what happened, such as: How can we prevent more people from dying and being exploited as they migrate? (https://www.quora.com/unanswered/How-can-we-prevent-more-people-from-dying-and-being-exploited-as-they-migrate)


It does not necessarily mean that all questions are this bad, and if they are not bad we might argue that answering them does add value.


It would be more interesting if one LLM started to generate questions, and another answers. Quora is developing Poe (.com), in which they aim to aggregate other automated creative content creators as well as their own tools.

As LLM’s become more sophisticated I expect that the quality of questions will increase, which might even lead to questions we didn’t think of asking yet.

Since we are still far from deductive and abductive reasoning in AI – as far as I understand – I do not expect that any recent LLM question generator can generate a fair amount of questions to feed back to the LLM without generating a lot of low quality questions as the one above amongst them. On Quora it’s the reason that many experienced users muted the “Quora promp generator”, the bot generating the questions.

I predicted this would happen years ago, prior to the LLM rise, when Quora started their partner program. I told people the input would be used to automate the “question” process. They just needed humans to train the bots.

ResearcherZero


January 23, 2024 2:49 AM

They should probably develop a model that saves young men from Hollywood, kissing (especially same-sex) and anything else that challenges their insecurities.


Atom Feed
Subscribe to comments on this entry

Sidebar photo of Bruce Schneier by Joe MacInnis.

Original Post URL: https://www.schneier.com/blog/archives/2024/01/ai-bots-on-x-twitter.html

Category & Tags: Uncategorized,artificial intelligence,chatbots,identification,Twitter – Uncategorized,artificial intelligence,chatbots,identification,Twitter

Views: 0

LinkedIn
Twitter
Facebook
WhatsApp
Email

advisor pick´S post