Playwright + LLM = The Browser Agent That Actually Ships
Every six months a startup raises 50 million dollars to make an "AI that uses the browser." Most of them are slow, gated, and hilariously expensive. Meanwhile, a small Python library called browser-use does the same job, runs locally, and costs whatever your model API costs. Spoiler: it's better than the commercial options.
The Setup
It's a Playwright wrapper with a smart DOM extractor and a tight agent loop. The model sees an accessibility-tree representation of the page, picks an action, executes, repeats. No bespoke screen-pixel models, no proprietary backend.
pip install browser-use
playwright install chromiumThe Money Pattern
One Agent, one task string, one run call. The agent navigates, clicks, types, scrolls, and reports back. I use it for filling out compliance forms and scraping Google Ads UI bits the API doesn't expose.
import asyncio
from browser_use import Agent, Browser, BrowserConfig
from langchain_anthropic import ChatAnthropic
async def main():
browser = Browser(config=BrowserConfig(headless=False))
agent = Agent(
task=(
"Open Pipedrive, find deals stuck in 'Awaiting Inspection' "
"for more than 14 days, and export the list as CSV."
),
llm=ChatAnthropic(model="claude-opus-4-5"),
browser=browser,
max_steps=25,
)
result = await agent.run()
print(result.final_result())
await browser.close()
asyncio.run(main())The Catch
SPAs still trip it up. Anything with aggressive virtualised lists or weird custom focus management will confuse the DOM extractor. Captchas defeat it instantly, as they should — if your agent solves captchas you have bigger problems. And running it without headless=False in dev is a great way to misdiagnose a flaky selector for an hour.
The Verdict
browser-use is the project that quietly made commercial browser-agent startups look overpriced. It's MIT licensed, the maintainers ship weekly, and the code is small enough to fork if you need to. I'm wiring it into a Rebuild Relief operations dashboard so the team can ask "open these three claims in Pipedrive and tag them" in plain English. This is what the agent future actually looks like — small, open, composable. Do not @ me when this kills another funded startup.