The Current Approach for the problem comprises of server-client architecture, the extension resides on client side and the API acts as the server. The system comprises of one single agent that resides on server side. The agent analyses the carefully curated DOM structure, web UI’s screenshot along with the user’s query to select its next course of action.

Tools:

The agent responds only in terms of tools. Agent has a total of 19 tools to use. Refer the below document for tools.

🛠️ Browser Automation Tools: Implementation Guide

Working:

System Architecture

The browser automation system operates through a sophisticated client-server architecture that seamlessly bridges natural language understanding with precise browser control:

Client Side (Chrome Extension):

Server Side (FastAPI Backend):

Core Working Principles