This article summarizes Evaluating Commercial AI Chatbots as News Intermediaries by Mirac Suzgun, et al. According to the paper, generative AI chatbots can be as high as 95% accurate when describing the daily news (which is rather better than Fox) but accuracy can drop to 19% after interactions where a user misremembers a detail. "Users who ask AI chatbots about news while misremembering details will frequently get confident answers that reinforce the error." It also found systematic inaccuracies in Hindi language interactions. As the paper authors say, "these results suggest that evaluating AI news intermediaries on aggregate accuracy alone is insufficient." This is important, of course, in the light of plans to replace search results with chatbot interactions. I don't know whether this summary was AI-generated, but I read the original paper as well (53 page PDF, but only 15 pages of actual paper) to verify it is at least accurate; we'd want more than a "fourteen-day real-time evaluation of six commercial AI chatbots" before drawing full conclusions.
Today: Total: [] [Share]

