{"id":363,"date":"2024-03-29T12:00:00","date_gmt":"2024-03-29T13:00:00","guid":{"rendered":"http:\/\/www.washnow.me\/?p=363"},"modified":"2024-03-29T14:24:38","modified_gmt":"2024-03-29T14:24:38","slug":"ai-agents-could-do-real-work-in-the-real-world-that-might-not-be-a-good-thing","status":"publish","type":"post","link":"http:\/\/www.washnow.me\/index.php\/2024\/03\/29\/ai-agents-could-do-real-work-in-the-real-world-that-might-not-be-a-good-thing\/","title":{"rendered":"AI \u201cagents\u201d could do real work in the real world. That might not be a good thing."},"content":{"rendered":"<br \/>\n<figure>\n      <img decoding=\"async\" alt=\"An illustration shows a screen with an anthropomorphic robotic head attached. Speech and text bubbles appear to float out of the screen and into the air.\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" class=\"lazyload\" data-src=\"http:\/\/www.washnow.me\/wp-content\/uploads\/2024\/03\/GettyImages_1533302708.0.jpg\"><figcaption>Malorny\/Getty Images<\/figcaption><\/figure>\n<p>Why AI agents that could book your vacation or pay your bills are the next frontier in artificial intelligence.<\/p>\n<p id=\"u4U2J6\">ChatGPT and its large language model (LLM) competitors that produce text on demand are very cool. So are the other fruits of the <a href=\"https:\/\/www.vox.com\/2023\/4\/28\/23702644\/artificial-intelligence-machine-learning-technology\" data-source=\"encore\">generative AI<\/a> revolution: <a href=\"https:\/\/www.midjourney.com\/home\">art generators<\/a>, <a href=\"https:\/\/www.suno.ai\/\">music generators<\/a>, <a href=\"https:\/\/www.deepl.com\/en\/translator\">better automatic subtitles and translation<\/a>. <\/p>\n<p id=\"EiBb8o\">They can do a lot (including <a href=\"https:\/\/www.vox.com\/future-perfect\/2024\/3\/15\/24101088\/anthropic-claude-opus-openai-chatgpt-artificial-intelligence-google-consciousness\">claim that they\u2019re conscious<\/a>, not that we should believe them), but there\u2019s one important respect in which AI models are unlike people: They are processes that are run only when a human triggers them and only to accomplish a specific result. And then they stop.<\/p>\n<p id=\"qXwmwI\">Now imagine that you took one of these programs \u2014 a really good chatbot, let\u2019s say, but still just a chatbot \u2014 and you gave it the ability to write notes to itself, store a to-do list and the status of items on the to-do list, and delegate tasks to other copies of itself or other people. And instead of running only when a human prompted it, you had it work on an ongoing basis on these tasks \u2014 just like an actual human assistant.<\/p>\n<p id=\"2tO2vq\">At that point, without any new leaps in technology whatsoever \u2014 just some basic tools glued onto a standard language model \u2014 you\u2019d have what is called an \u201cAI agent,\u201d or an AI that acts with independent agency to pursue its goals in the world. <\/p>\n<p id=\"J96pG5\">AI agents have been called the \u201c<a href=\"https:\/\/botpress.com\/blog\/what-is-an-ai-agent\">future of artificial intelligence\u201d<\/a> that will <a href=\"https:\/\/fetch.ai\/agents\">\u201creinvent the way we live and work<\/a>,\u201d the \u201c<a href=\"https:\/\/www.forbes.com\/sites\/alexanderpuutio\/2024\/03\/22\/what-ceos-need-to-know-about-the-next-frontier-of-ai-ai-agents\/?sh=694e39027ea1\">next frontier of AI.\u201d<\/a> OpenAI is reportedly working on developing such agents, as are many different well-funded <a href=\"https:\/\/imbue.com\/\">startups<\/a>.<\/p>\n<p id=\"GrP4hJ\">They may sound even more sci-fi than everything else you\u2019ve already heard about AI, but AI agents are not nonsense, and if effective, could fundamentally change how we work. <\/p>\n<p id=\"yPTxHu\">That said, they currently don\u2019t work very well, and they pose obvious challenges for AI safety. Here\u2019s a quick primer on where we\u2019re (maybe) headed, and why.<\/p>\n<h3 id=\"mpQNJl\">Why would you want one of these?<\/h3>\n<p id=\"4aDrMj\">Today\u2019s AI chatbots are fun to talk to and useful assistants \u2014 if you are willing to overlook a set of limitations that includes <a href=\"https:\/\/www.wired.com\/story\/fast-forward-chatbot-hallucinations-are-poisoning-web-search\/\">making things up.<\/a> Such models have already found sizable and important economic niches, from art to audio and video transcription (which have been quietly revolutionized over the last few years) to <a href=\"https:\/\/www.vox.com\/2023\/9\/23\/23886163\/google-microsoft-amazon-generative-ai-assistants\">assisting programmers with tools like Copilot<\/a>. But the investors pouring <a href=\"https:\/\/www.goldmansachs.com\/intelligence\/pages\/ai-investment-forecast-to-approach-200-billion-globally-by-2025.html\">hundreds of billions of dollars into AI<\/a> are hoping for something more transformative than that.<\/p>\n<p id=\"ROSrTs\">Many people I talk to who use AI in their work describe it as like having a slightly scatterbrained but very fast intern. They do useful work, but you have to define each problem for them and carefully check their work, meaning that much of what you might gain in productivity is lost in oversight. <\/p>\n<p id=\"DwYi9A\">Much of the <a href=\"https:\/\/www.vox.com\/future-perfect\/24108787\/ai-economic-growth-explosive-automation\">economic case for AI<\/a> is that it could do more than that. The people at work on AI agents hope that their tools won\u2019t just help software developers, but that the tools could <em>be<\/em> software developers. In this future, you wouldn\u2019t just <a href=\"https:\/\/www.morganstanley.com\/ideas\/generative-ai-travel\">consult AI for trip planning ideas<\/a>; instead, you could simply text it \u201cplan a trip for me in Paris next summer,\u201d as you might a really good executive assistant.  <\/p>\n<p id=\"0L8OWl\">Today\u2019s AI agents do not live up to that dream \u2014 yet. The problem is that you need a very high accuracy rate on each step of a multistep process, or very good error correction, to get anything valuable out of an agent that has to take lots of steps. <\/p>\n<p id=\"1QCyB0\">But there\u2019s good reason to expect that future generation AI agents will be much better at what they do. First of all, the agents are built on increasingly powerful base models, which perform much better on a wide range of tasks, and which we can expect to continue to improve. Secondly, we\u2019re also learning more about how to build agents themselves. <\/p>\n<p id=\"dRBsxM\">A year ago, the first publicly available AI agents \u2014 AutoGPT, for example, which was just a very simple agent based on ChatGPT \u2014 were basically useless. But a few weeks ago, the startup Cognition Labs released Devin, an AI software engineer that can build and deploy <a href=\"https:\/\/www.cognition-labs.com\/introducing-devin\">entire small web applications<\/a>. <\/p>\n<p id=\"XTNWc3\">Devin is an impressive feat of engineering, and good enough to take some small gigs on Upwork and deliver working code. It had an almost 14 percent <a href=\"https:\/\/www.business-standard.com\/technology\/tech-news\/devin-all-about-us-based-startup-cognition-s-ai-powered-software-engineer-124031400138_1.html\">success rate<\/a> on a benchmark that measures ability to resolve issues on the software developer platform <a href=\"https:\/\/www.swebench.com\/\">GitHub<\/a>. <\/p>\n<p id=\"T8fX5l\">That\u2019s a big leap forward for which there\u2019s surely an economic niche \u2014 but at best, it\u2019s a very junior software engineer who\u2019d need close supervision by a more senior one. Still, like most things AI, we can expect improvement in the future. <\/p>\n<h3 id=\"dXpHeE\">Should we make billions of AI agents?<\/h3>\n<p id=\"PSGC49\">Would it be cool for everyone in the world to have an AI personal assistant who could plan dinner, order groceries, buy a birthday present for your mom, plan a trip to the zoo for the kids, and pay your bills for you while notifying you of any unexpected ones? Yes, absolutely. Would it be incredibly economically valuable to have AI software engineers who can do the work of human software engineers? Yes, absolutely. <\/p>\n<p id=\"77tCvW\">But: Is there something potentially worrying about creating agents that can reason and act independently, earn money independently, make copies of themselves independently, and do complex things without human oversight? Oh, definitely. <\/p>\n<p id=\"MfFNEN\">For one, there are questions of liability. It\u2019d be just as easy to make \u201cscammer\u201d AIs that spend their time convincing the elderly to send them money as it would to make useful agents. Who would be responsible if that happens? <\/p>\n<p id=\"T4Ikcq\">For another, as AI systems get more powerful, the moral quandaries they pose become more pressing. If Devin earns a lot of money as a software engineer, is there a sense that Devin, rather than the team that created him, is entitled to that money? What if Devin\u2019s successors are created by a team that\u2019s made up of hundreds of copies of Devin? <\/p>\n<p id=\"WKYf2P\">And for those who worry about humanity losing control of our future if we build extremely powerful AI systems without thinking about the consequences (<a href=\"https:\/\/www.vox.com\/future-perfect\/2018\/12\/21\/18126576\/ai-artificial-intelligence-machine-learning-safety-alignment\">I\u2019m one of them<\/a>), it\u2019s pretty obvious why the idea of AIs with agency is nerve-racking. <\/p>\n<p id=\"yXwn7y\">The transition from systems that act only when users consult them to systems that go out and accomplish complex goals in the real world risks what leading AI scientist Yoshua Bengio calls \u201crogue AI\u201d: <a href=\"https:\/\/yoshuabengio.org\/2023\/05\/22\/how-rogue-ais-may-arise\/\">\u201can autonomous AI system that could behave in ways that would be catastrophically harmful.\u201d<\/a> <\/p>\n<p id=\"Ie8sS1\">Think of it this way: It\u2019s hard to imagine how ChatGPT could kill us, or could even be the kind of thing that would want to. It\u2019s easy to imagine how a hyper-competent AI executive assistant\/scam caller\/software engineer could.<\/p>\n<p id=\"qHQZSF\">For that reason, <a href=\"https:\/\/www.vox.com\/23889632\/paul-christiano-beth-barnes-alignment-research-center-evaluations-ai-future-perfect-50-2023\">some researchers are trying<\/a> to develop good tests of the capabilities of AI agents built off different language models, so that we\u2019ll know in advance before we widely release ones that can make money, make copies of themselves, and function independently without oversight. <\/p>\n<p id=\"7aBPHS\">Others are working to try to set good regulatory policy in advance, including liability rules that might discourage unleashing an army of super-competent scammer-bots.<\/p>\n<p id=\"cbQ5L5\">And while I hope that we have a few years to solve those technical and political challenges, I doubt we\u2019ll have forever. The commercial incentives to make agent AIs are overwhelming, and they can genuinely be extremely useful. We just have to iron out their extraordinary implications \u2014 preferably before, rather than after, billions of them exist.<\/p>\n<p id=\"pe2mKq\"><em>A version of this story originally appeared in the <\/em><a href=\"https:\/\/www.vox.com\/future-perfect\"><em><strong>Future Perfect<\/strong><\/em><\/a><em> newsletter. <\/em><a href=\"https:\/\/www.vox.com\/pages\/future-perfect-newsletter-signup\"><em><strong>Sign up here!<\/strong><\/em><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Malorny\/Getty Images Why AI agents that could book your vacation or pay your bills are the next frontier in artificial intelligence. ChatGPT and its large language model (LLM) competitors that produce text on demand are very cool. So are the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":365,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/posts\/363"}],"collection":[{"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/comments?post=363"}],"version-history":[{"count":2,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/posts\/363\/revisions"}],"predecessor-version":[{"id":366,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/posts\/363\/revisions\/366"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/media\/365"}],"wp:attachment":[{"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/media?parent=363"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/categories?post=363"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.washnow.me\/index.php\/wp-json\/wp\/v2\/tags?post=363"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}