Browser Integration

Antigravity Track

Module 39

Antigravity Track — Module 39

The Agent Tests Itself: ThreadCo's developer adds a new checkout form field. Instead of opening a browser and manually testing it, she tells the agent: "Test the gift wrapping field end-to-end in the browser." The agent opens Chrome, navigates to localhost:3000, fills the form, submits it, reads the response, and reports back — with a screenshot proving it worked.

Antigravity's Chrome extension lets agents autonomously interact with web applications — navigating, clicking, filling forms, reading DOM state, and taking screenshots — without you touching the browser. This module covers installation, capabilities, automation patterns, visual testing, web scraping, and screenshot analysis in depth.

Why Browser Integration Matters

In traditional development, the write-test-verify cycle for UI changes looks like this: write code in the editor, switch to the browser, refresh the page, manually test the change, switch back to the editor, fix issues, repeat. This context switching is slow, error-prone, and tedious.

Antigravity's browser agent collapses this entire cycle into a single autonomous operation. The agent writes code, starts the dev server (if needed), opens Chrome, navigates to the correct page, interacts with the UI, reads the result, and captures a screenshot — all without your involvement. The verification loop that normally requires manual effort becomes part of the agent's automated workflow.

Not Just for Testing

While browser verification is the primary use case, the browser agent is also valuable for web scraping (reading documentation, pulling reference data), visual regression testing (comparing screenshots before and after changes), and debugging UI issues (reading console errors, inspecting network requests). It is a general-purpose browser automation tool controlled by natural language.

Setup and Installation

Install Once, Use Always

Install the Antigravity Chrome extension from the Chrome Web Store. It connects automatically to any running Antigravity session. No per-project configuration needed — it works across all your projects.

The extension requires permissions to read and interact with web pages. This is necessary for form filling, clicking, DOM reading, and screenshot capture. The extension communicates with Antigravity over a local WebSocket connection — no data leaves your machine unless the agent navigates to an external URL.

Connection Verification

After installation, verify the connection: in Antigravity, go to Settings → Browser. You should see "Chrome extension connected" with a green indicator. If the indicator is red, restart Chrome, then restart Antigravity. The extension auto-connects when both are running on the same machine.

Chrome Profile Isolation

The browser agent operates in your default Chrome profile. If you want isolation (recommended for testing), create a dedicated Chrome profile: Chrome → Profile icon → Add Profile → "Antigravity Testing." Install the extension in that profile. This prevents the agent from accessing your logged-in sessions, bookmarks, or saved passwords.

Headless vs Visible Mode

By default, the browser agent runs in visible mode — you can watch Chrome as the agent interacts with it. For faster execution, enable headless mode in Settings → Browser → "Run browser agent headless." Headless mode is faster but you cannot watch the agent work in real time. Screenshots are still captured in headless mode.

How the Browser Sub-Agent Works

The browser agent is a specialised sub-agent within Antigravity's architecture. Here is how it operates:

Task includes browser instruction

When your task description includes browser-related instructions (e.g., "test the form in the browser," "verify the page loads correctly," "take a screenshot of the checkout page"), Antigravity automatically activates the browser sub-agent. You do not need to explicitly request it — the agent detects browser intent from natural language.

Chrome session initialises

The browser sub-agent opens a Chrome tab (or reuses an existing one) via the Antigravity extension. It navigates to the specified URL — typically localhost:3000 or whatever your dev server runs on. If your dev server is not running, the agent can start it via a terminal command first.

DOM analysis

Before interacting with the page, the agent reads the full DOM structure. It identifies interactive elements (buttons, inputs, links, checkboxes), their labels, their states (enabled/disabled, checked/unchecked), and their positions. This analysis is what allows the agent to "see" the page and understand what to click or fill.

Action execution

The agent executes actions sequentially: navigate, click, type, select, scroll. After each action, it re-reads the DOM to verify the expected result. If an action fails (element not found, page error), the agent logs the failure and either retries or reports it. Each action is logged in the browser agent's timeline.

Screenshot capture

After every significant action (form submission, page navigation, error state), the agent captures a full-page screenshot. These screenshots are saved as artifacts in the Manager Surface. You can review them to verify the agent saw what you expected.

Result reporting

The browser agent reports its findings back to the parent agent (or directly to you). The report includes: actions taken, DOM state observed, any errors found, and screenshots. If the browser test was part of a larger code task, the parent agent can use this feedback to iterate — for example, fixing a CSS issue discovered during browser testing.

Chromium Automation Capabilities

Full DOM Access

Agents can read any element on the page — text content, input values, computed CSS styles, element attributes, ARIA labels, and data attributes. They use standard CSS selectors and XPath to locate elements. The agent can also read the full HTML source of any element for inspection.

Form Interaction

Agents fill text inputs, select dropdown options, tick/untick checkboxes, toggle radio buttons, click buttons, and handle file uploads. They can complete multi-step forms end-to-end: login forms, multi-page wizards, checkout flows. For file uploads, the agent can generate test files on the fly.

Network Monitoring

The browser agent can monitor network requests and responses. It detects 4xx and 5xx errors, reads response bodies, checks request headers, and measures response times. This is invaluable for API integration testing — the agent can fill a form, submit it, and verify the correct API request was made with the expected payload.

Console Log Reading

The agent reads the browser console output: JavaScript errors, warnings, log statements. If your code throws a runtime error, the agent sees it. This is especially useful for detecting issues that only appear at runtime — type errors, undefined variables, failed API calls — that static analysis would miss.

Agent Testing the Checkout Form

Antigravity — Browser Agent — Chrome

Browser Agent Log
✓ Navigate to localhost:3000
✓ Click "Add to cart"
✓ Click "Checkout"
✓ Fill shipping form
→ Find gift wrap checkbox
○ Tick checkbox
○ Type gift message
○ Submit order
○ Verify £2.50 added to total
○ Take screenshot
Browser preview — localhost:3000/checkout
Order Summary
Sunset Gradient Tee × 1£29.99
Add gift wrapping (+£2.50)
Gift message: "Happy Birthday! 🎂"
Total£32.49

          ✓ Screenshot captured — gift_wrap_test_01.png saved as artifact
        

What Browser Agents Can Do

Action	Example	Detail
Navigate	"Go to the product page for SKU #4821"	Supports absolute URLs, relative paths, and query parameters. Agent waits for page load before proceeding.
Click	"Click the Add to Cart button"	Locates elements by text content, CSS class, ID, ARIA label, or XPath. Waits for element to be visible and clickable.
Fill forms	"Fill the shipping form with test data"	Generates realistic test data (names, addresses, emails) or uses specific values you provide. Handles text inputs, textareas, and contenteditable elements.
Select dropdowns	"Select 'Express Shipping' from the dropdown"	Works with native HTML select elements and custom dropdown components. Searches by visible text or value attribute.
Read DOM	"What is the total shown in the order summary?"	Reads text content, input values, computed styles, and attributes from any element on the page.
Verify state	"Confirm the gift wrap checkbox is checked and £2.50 added"	Asserts element states (checked, disabled, visible, hidden) and text content. Reports pass/fail with details.
Screenshot	Automatic after every significant step	Full-page screenshots saved as PNG artifacts. Named sequentially: step_01_navigate.png, step_02_fill_form.png, etc.
Multi-step flows	"Complete a full checkout as a guest user"	Chains multiple actions into a single flow: navigate → fill → click → verify → screenshot. Agent handles page transitions and loading states.
Error detection	"Report any console errors or network 4xx/5xx responses"	Monitors browser console and network tab. Reports JavaScript errors, failed API calls, and slow responses.
Wait for element	"Wait for the loading spinner to disappear"	Agent waits for elements to appear, disappear, or change state. Configurable timeout (default 30 seconds).

Visual Testing Patterns

The browser agent enables several visual testing patterns that are difficult or impossible with unit tests alone:

Before/After Screenshots

Before making a CSS or layout change, dispatch an agent to take a screenshot of the current state. Then make the change (or have another agent make it). Dispatch a second browser agent to take an "after" screenshot. Compare the two screenshots in the Artifacts panel to verify the visual change is correct.

Responsive Testing

Instruct the browser agent to resize the viewport and take screenshots at key breakpoints: "Take screenshots of the checkout page at 1440px, 1024px, 768px, and 375px widths." The agent resizes the Chrome window and captures each state. Review all four screenshots to catch responsive layout issues.

Error State Verification

Test error handling visually: "Navigate to the checkout page, leave the email field empty, and click Submit. Take a screenshot showing the validation error." The agent triggers the error state and captures it. You verify that error messages appear correctly and are user-friendly.

Full User Journey

Test an entire user flow end-to-end: "As a new user, sign up, add a product to the cart, go to checkout, fill the form, submit the order, and verify the confirmation page." The agent walks through every step, capturing screenshots at each stage. The result is a complete visual record of the user journey.

Web Scraping with Browser Agents

The browser agent can also scrape web content — useful for pulling reference data, reading documentation, or comparing your app against a design spec:

Use Case	Task Description Example	Output
Read API docs	"Navigate to docs.stripe.com/api/charges and extract the list of required fields for creating a charge"	Structured list of field names and types
Compare with design	"Navigate to our Figma embed at [URL] and compare the button styles with what we have on localhost:3000/checkout"	Comparison report with screenshots
Pull reference data	"Navigate to [competitor URL] and list all the product categories they show in their navigation"	Text list of categories
Check link health	"Navigate to every link in the footer on localhost:3000 and report any that return 404"	List of broken links with status codes

Screenshot Analysis

Antigravity does not just capture screenshots — the agent can analyse them. Because the underlying AI model is multimodal (it can "see" images), the agent can reason about visual content:

Visual verification

The agent captures a screenshot and analyses it: "Does the checkout page show the gift wrap option below the shipping form?" It reads the visual layout and confirms or denies. This catches CSS issues that DOM reading alone would miss — an element might exist in the DOM but be invisible due to display: none or opacity: 0.

Layout comparison

Given two screenshots (before and after a change), the agent can describe the visual differences: "The button moved from the right side to the centre. The font size of the heading increased. The background colour changed from white to light grey." This is useful for reviewing visual changes without opening the browser yourself.

Accessibility observations

The agent can flag potential accessibility issues from screenshots: "The grey text on the light background appears to have low contrast. The button text is very small." While not a substitute for proper accessibility testing tools, this catches obvious issues early.

Works with Any Web Framework

The browser agent works with any web app that runs in Chrome — React, Vue, Next.js, Angular, Svelte, plain HTML, Django templates, Rails views, PHP pages. It does not care about the stack. As long as it runs on a URL, the agent can interact with it. Single-page apps, server-rendered pages, and static sites all work equally well.

Security Considerations

Never Point Browser Agents at Production

Browser agents can click buttons, submit forms, and trigger real actions. Always point them at localhost or a staging environment. Never give a browser agent a production URL — it could accidentally submit orders, delete data, or mutate state. Create a dedicated Chrome profile with no saved passwords or sessions to prevent the agent from accessing your production accounts.

Risk	Mitigation
Agent submits real orders on production	Always use localhost or staging URLs. Never provide production URLs in task descriptions.
Agent accesses saved passwords/sessions	Use a dedicated Chrome profile with no saved credentials for Antigravity.
Agent navigates to malicious URLs	Review the agent's plan before dispatching. The agent only navigates to URLs you specify or that it finds in your codebase.
Screenshots contain sensitive data	Use test data, not real customer data. Screenshots are stored locally in the Antigravity workspace.
Extension permissions too broad	The extension only activates when Antigravity dispatches a browser task. It does not monitor your browsing activity.

Combining Code Changes with Browser Verification

The most powerful pattern is a single agent task that writes code AND verifies it in the browser:

Combined Code + Browser Task

# Task description
Add a "Promo Code" input field to the checkout form in
components/CheckoutForm.tsx. The field should:
- Accept a text input (max 20 chars, uppercase only)
- Show a "Apply" button next to it
- On click, call POST /api/promo/validate with the code
- Show "Valid! 10% off" in green or "Invalid code" in red
 
# After writing the code:
Start the dev server, navigate to localhost:3000/checkout,
enter promo code "SUMMER10", click Apply, and take a
screenshot showing the success message. Then try an invalid
code "INVALID" and take a screenshot showing the error.

This single task produces: code changes (the new form field), unit tests (if requested), and visual evidence (two screenshots proving it works). The agent's diff includes the code, and the artifacts include the screenshots. You review everything in one place.

Browser Agent Limitations

The browser agent is powerful but has clear boundaries. Understanding these prevents frustration:

Limitation	Detail	Workaround
Single browser only	Only Chrome is supported. No Firefox, Safari, or Edge.	Use Chrome for agent testing. Run cross-browser testing separately with tools like Playwright or BrowserStack.
No file download verification	The agent cannot verify that a file download completed correctly.	Write a unit test for the download endpoint instead of relying on browser verification.
Authentication complexity	OAuth flows, multi-factor auth, and CAPTCHA cannot be automated.	Pre-authenticate in the test Chrome profile, or use test accounts with simplified auth.
Canvas and WebGL	The agent cannot interact with canvas elements or WebGL content meaningfully.	Use screenshots for visual verification. Interactive canvas testing requires specialised tools.
Iframes and popups	Cross-origin iframes are not accessible. Popup windows have limited support.	Test iframe content separately. For popups, configure your app to use inline modals instead during testing.
Speed	Browser operations are slower than unit tests (seconds vs milliseconds).	Use browser testing selectively — for UI verification, not as a replacement for unit tests.

Browser Agent vs Traditional E2E Testing

How does Antigravity's browser agent compare to dedicated end-to-end testing tools like Playwright, Cypress, or Selenium?

Aspect	Antigravity Browser Agent	Playwright / Cypress
Setup	Install Chrome extension, describe test in natural language	Write test scripts in JavaScript/TypeScript, configure test runner
Maintenance	No scripts to maintain — describe the test each time	Scripts require updates when UI changes
Repeatability	Non-deterministic — agent may take slightly different paths	Fully deterministic — same script, same execution
CI/CD integration	Not suitable for CI pipelines (requires running IDE)	Designed for CI pipelines
Best for	Exploratory testing, quick verification during development	Regression testing, CI/CD gates, production monitoring
Cross-browser	Chrome only	Chrome, Firefox, Safari, Edge

Complementary, Not Competing

The browser agent is best for quick, exploratory verification during development — "does this change look right?" Traditional E2E tools (Playwright, Cypress) are best for repeatable regression tests in CI/CD. Use both: the browser agent during development, Playwright for your test suite. You can even dispatch an agent to write Playwright test scripts based on the browser agent's exploratory test results.

Hands-On Exercises

  Exercise 1 — Basic Browser Navigation: Install the Antigravity Chrome extension if you have not already. Dispatch a browser-only task: "Navigate to localhost:[your port], take a screenshot of the homepage, and list all navigation links visible on the page." Review the screenshot artifact and the agent's DOM analysis. Verify the list matches what you see manually.

  Exercise 2 — Form Testing: Choose a form in your application (login, signup, contact, checkout). Dispatch a browser agent with: "Navigate to [form URL], fill the form with realistic test data, submit it, and report: (a) did the form submit successfully? (b) were there any console errors? (c) what was the server response?" Compare the agent's report with your own manual test.

  Exercise 3 — Responsive Screenshot Suite: Dispatch a browser agent with: "Navigate to [your app URL] and take screenshots at these viewport widths: 1440px, 1024px, 768px, 375px. Save each screenshot with the width in the filename." Review the four screenshots for responsive layout issues. Note any breakpoints where the layout breaks.

  Exercise 4 — Code + Browser Combo: Write a combined task that makes a code change and verifies it in the browser. For example: "Add a character counter below the [textarea field] that shows 'X/500 characters'. Then navigate to the page, type 100 characters into the field, and take a screenshot showing the counter displays '100/500'." Review both the code diff and the screenshot.

  Exercise 5 — Error State Testing: Dispatch a browser agent to test error handling: "Navigate to [form URL]. Submit the form with all fields empty. Take a screenshot showing the validation errors. Then fill only the email field with an invalid format (e.g., 'not-an-email') and submit again. Take a screenshot showing the email validation error." Use the screenshots to evaluate whether your error messages are clear and helpful.

← Manager Surface Next: Best Practices →