TechCrunch: “In a pitch to investors last spring, Anthropic said it intended to build AI to power virtual assistants that could perform research, answer emails, and handle other back-office jobs on their own. The company referred to this as a “next-gen algorithm for AI self-teaching” — one it believed that could, if all goes according to plan, automate large portions of the economy someday. It took a while, but that AI is starting to arrive. Anthropic on Tuesday released an upgraded version of its Claude 3.5 Sonnet model that can understand and interact with any desktop app. Via a new “Computer Use” API, now in open beta, the model can imitate keystrokes, button clicks, and mouse gestures, essentially emulating a person sitting at a PC. “We trained Claude to see what’s happening on a screen and then use the software tools available to carry out tasks,” Anthropic wrote in a blog post shared with TechCrunch. “When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place.”
Anthropic: “…We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone. We’re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time. Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete. For example, Replit is using Claude 3.5 Sonnet’s capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they’re being built for their Replit Agent product…”
Sorry, comments are closed for this post.