Browser-Use Is The Open Source Agent We Needed

Every six months a startup raises 50 million dollars to make an "AI that uses the browser." Most of them are slow, gated, and hilariously expensive. Meanwhile, a small Python library called browser-use does the same job, runs locally, and costs whatever your model API costs. Spoiler: it's better than the commercial options.

The Setup

It's a Playwright wrapper with a smart DOM extractor and a tight agent loop. The model sees an accessibility-tree representation of the page, picks an action, executes, repeats. No bespoke screen-pixel models, no proprietary backend.

pip install browser-use
playwright install chromium

The Money Pattern

One Agent, one task string, one run call. The agent navigates, clicks, types, scrolls, and reports back. I use it for filling out compliance forms and scraping Google Ads UI bits the API doesn't expose.

import asyncio
from browser_use import Agent, Browser, BrowserConfig
from langchain_anthropic import ChatAnthropic

async def main():
    browser = Browser(config=BrowserConfig(headless=False))
    agent = Agent(
        task=(
            "Open Pipedrive, find deals stuck in 'Awaiting Inspection' "
            "for more than 14 days, and export the list as CSV."
        ),
        llm=ChatAnthropic(model="claude-opus-4-5"),
        browser=browser,
        max_steps=25,
    )
    result = await agent.run()
    print(result.final_result())
    await browser.close()

asyncio.run(main())

The Catch

SPAs still trip it up. Anything with aggressive virtualised lists or weird custom focus management will confuse the DOM extractor. Captchas defeat it instantly, as they should — if your agent solves captchas you have bigger problems. And running it without headless=False in dev is a great way to misdiagnose a flaky selector for an hour.

The Verdict

browser-use is the project that quietly made commercial browser-agent startups look overpriced. It's MIT licensed, the maintainers ship weekly, and the code is small enough to fork if you need to. I'm wiring it into a Rebuild Relief operations dashboard so the team can ask "open these three claims in Pipedrive and tag them" in plain English. This is what the agent future actually looks like — small, open, composable. Do not @ me when this kills another funded startup.

Agents

Playwright + LLM = The Browser Agent That Actually Ships

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?