Playwright + LLM = The Browser Agent That Actually Ships
Every six months a startup raises 50 million dollars to make an "AI that uses the browser." Most of them are slow, gated, and hilariously expensive. Meanwhile, a small Python library called browser-use does the same job, runs locally, and costs whatever your model API costs. Spoiler: it's better than the commercial options.
The Setup
It's a Playwright wrapper with a smart DOM extractor and a tight agent loop. The model sees an accessibility-tree representation of the page, picks an action, executes, repeats. No bespoke screen-pixel models, no proprietary backend.
{`pip install browser-use
playwright install chromium`}The Money Pattern
One Agent, one task string, one run call. The agent navigates, clicks, types, scrolls, and reports back. I use it for filling out compliance forms and scraping Google Ads UI bits the API doesn't expose.
{`import asyncio
from browser_use import Agent, Browser, BrowserConfig
from langchain_anthropic import ChatAnthropic
async def main():
browser = Browser(config=BrowserConfig(headless=False))
agent = Agent(
task=(
"Open Pipedrive, find deals stuck in 'Awaiting Inspection' "
"for more than 14 days, and export the list as CSV."
),
llm=ChatAnthropic(model="claude-opus-4-5"),
browser=browser,
max_steps=25,
)
result = await agent.run()
print(result.final_result())
await browser.close()
asyncio.run(main())`}The Catch
SPAs still trip it up. Anything with aggressive virtualised lists or weird custom focus management will confuse the DOM extractor. Captchas defeat it instantly, as they should — if your agent solves captchas you have bigger problems. And running it without headless=False in dev is a great way to misdiagnose a flaky selector for an hour.
The Verdict
browser-use is the project that quietly made commercial browser-agent startups look overpriced. It's MIT licensed, the maintainers ship weekly, and the code is small enough to fork if you need to. I'm wiring it into a Rebuild Relief operations dashboard so the team can ask "open these three claims in Pipedrive and tag them" in plain English. This is what the agent future actually looks like — small, open, composable. Do not @ me when this kills another funded startup.