Skip to main content

Overview

The Google agent performs web searches to discover social media profiles, company affiliations, news mentions, and personal websites. It uses browser-use Agent to extract structured data from Google search results.

Implementation

backend/agents/google_agent.py
class GoogleAgent(BaseBrowserAgent):
    """Searches Google for person information via browser-use."""
    
    agent_name = "google"
    
    def __init__(self, settings: Settings, *, inbox_pool=None):
        super().__init__(settings, inbox_pool=inbox_pool)

Architecture Decision

backend/agents/google_agent.py
# RESEARCH: Checked googlesearch-python (1k stars), SerpAPI (paid), Google Custom Search API
# DECISION: Browser Use for Google — avoids API key costs, can extract rich snippets
# ALT: SerpAPI if we need scale (paid, $50/mo)
Why Browser Use?
  • No API key costs (SerpAPI is $50/month minimum)
  • Can extract rich snippets and knowledge panels
  • Handles Google’s anti-bot measures
  • Fast for single queries (5-10s)
Why not SerpAPI?
  • Cost prohibitive for high-volume usage
  • Limited free tier (100 searches/month)
  • Browser Use gives richer context from visual layout

Implementation Details

Search Execution

backend/agents/google_agent.py
async def _run_task(self, request: ResearchRequest) -> AgentResult:
    if not self.configured:
        return AgentResult(
            agent_name=self.agent_name,
            status=AgentStatus.FAILED,
            error="Browser Use not configured (BROWSER_USE_API_KEY or OPENAI_API_KEY missing)",
        )
    
    query = self._build_search_query(request)
    logger.info("google agent searching: {}", query)
    
    try:
        task = (
            f"Go to https://www.google.com/search?q={query.replace(' ', '+')} "
            f"and use the extract tool to pull from the FIRST page only:\n"
            f"- Social media profile links (LinkedIn, Twitter/X, Instagram, GitHub)\n"
            f"- Company affiliations and job titles\n"
            f"- News articles and notable mentions\n"
            f"- Personal website or blog\n"
            f"Do NOT scroll. Do NOT click into results. "
            f"After extracting, immediately call done with the result."
        )
        
        agent = self._create_browser_agent(task, max_steps=3)
        result = await agent.run()
        final_result = result.final_result() if result else None
        
        if final_result:
            profiles: list[SocialProfile] = []
            output_str = str(final_result)
            
            # Extract any social profile URLs mentioned
            platform_indicators = {
                "linkedin.com": "linkedin",
                "twitter.com": "twitter",
                "x.com": "twitter",
                "instagram.com": "instagram",
                "github.com": "github",
                "facebook.com": "facebook",
            }
            
            for indicator, platform in platform_indicators.items():
                if indicator in output_str.lower():
                    profiles.append(
                        SocialProfile(
                            platform=platform,
                            url=f"https://{indicator}",
                            display_name=request.person_name,
                        )
                    )
            
            return AgentResult(
                agent_name=self.agent_name,
                status=AgentStatus.SUCCESS,
                profiles=profiles,
                snippets=[output_str],
                urls_found=[p.url for p in profiles],
            )
        
        return AgentResult(
            agent_name=self.agent_name,
            status=AgentStatus.SUCCESS,
            snippets=["No Google results found"],
        )
    
    except Exception as exc:
        logger.error("google agent error: {}", str(exc))
        return AgentResult(
            agent_name=self.agent_name,
            status=AgentStatus.FAILED,
            error=f"Google agent error: {exc}",
        )

Search Strategy

Query Building

backend/agents/browser_agent.py
def _build_search_query(self, request: ResearchRequest) -> str:
    """Build a search query string from the request."""
    parts = [request.person_name]
    if request.company:
        parts.append(request.company)
    return " ".join(parts)
Examples:
  • Person only: "Elon Musk"
  • Person + company: "Satya Nadella Microsoft"
  • Person + context: "Tim Cook Apple CEO"

Extraction Focus

The agent is instructed to extract specific types of information:
  1. Social Media Links: LinkedIn, Twitter/X, Instagram, GitHub, Facebook
  2. Professional Info: Company affiliations, job titles
  3. Media Mentions: News articles, press releases
  4. Personal Sites: Blogs, portfolio sites, personal domains

Speed Optimization

The agent is optimized for speed:
backend/agents/google_agent.py
agent = self._create_browser_agent(task, max_steps=3)
# Only 3 steps: navigate, extract, done
# No scrolling, no clicking into results
# Just surface-level extraction from Google's SERP

Extracted Data

The Google agent discovers:
  • Social Profiles: Platform-specific profile URLs
  • Company Affiliations: Current and past employers
  • Job Titles: Current and notable past positions
  • News Mentions: Articles featuring the person
  • Personal Websites: Blogs, portfolios, personal domains
  • Knowledge Panel: Google’s structured data (if available)

Usage Example

from agents.google_agent import GoogleAgent
from agents.models import ResearchRequest, AgentStatus
from config import Settings

settings = Settings()
agent = GoogleAgent(settings)

request = ResearchRequest(
    person_name="Mark Zuckerberg",
    company="Meta",
    timeout_seconds=30.0,
)

result = await agent.run(request)

if result.status == AgentStatus.SUCCESS:
    print(f"Found {len(result.profiles)} social profiles:")
    for profile in result.profiles:
        print(f"  - {profile.platform}: {profile.url}")
    
    print("\nSearch Results:")
    for snippet in result.snippets:
        print(snippet[:200])

Performance

  • Duration: 5-10s typical
  • Cost: Browser Use API usage only
  • Success Rate: ~95% (Google always returns something)
  • Data Quality: High for discovery, medium for details

Integration with Other Agents

The Google agent serves as a discovery layer:
# Orchestrator uses Google to find profile URLs
# Then specialized agents extract detailed data

# 1. Google discovers LinkedIn URL
google_result = await google_agent.run(request)
# Result: "linkedin.com/in/satyanadella"

# 2. LinkedIn agent extracts full profile
linkedin_result = await linkedin_agent.run(request)
# Result: Full profile with experience, education, etc.

Advanced Query Patterns

# Search only LinkedIn
query = f"{person_name} site:linkedin.com"

# Search only Twitter
query = f"{person_name} site:twitter.com OR site:x.com"

# Search only news sites
query = f"{person_name} site:nytimes.com OR site:wsj.com OR site:reuters.com"

Excluding Domains

# Exclude Wikipedia and social media for cleaner results
query = f"{person_name} -site:wikipedia.org -site:facebook.com"
# Google's time filters via URL parameters
url = f"https://www.google.com/search?q={query}&tbs=qdr:y"  # Past year
url = f"https://www.google.com/search?q={query}&tbs=qdr:m"  # Past month
url = f"https://www.google.com/search?q={query}&tbs=qdr:w"  # Past week

Troubleshooting

No Results Found

# Google almost always returns something, so if you get empty results:
if result.status == AgentStatus.SUCCESS and not result.snippets:
    # This likely means browser-use timed out or failed to extract
    print("Browser agent failed to extract from Google")
    # Try increasing max_steps or timeout

Browser Use Not Configured

# Check Browser Use API key
from config import Settings
settings = Settings()
if not settings.browser_use_api_key and not settings.openai_api_key:
    print("Error: Set BROWSER_USE_API_KEY or OPENAI_API_KEY")

Extraction Quality

# If extraction quality is poor, you can:
# 1. Increase max_steps for more thorough extraction
agent = self._create_browser_agent(task, max_steps=5)

# 2. Add more specific instructions to the task
task = (
    f"Go to https://www.google.com/search?q={query} "
    f"and extract ONLY direct profile links (no wikipedia, no news). "
    f"Focus on: LinkedIn, Twitter, Instagram, GitHub, personal websites."
)

Rate Limiting

# Google has rate limits, but browser-use handles this automatically
# If you hit limits:
# 1. Add delays between requests
import asyncio
await asyncio.sleep(2)  # 2 second delay

# 2. Use Browser Use Cloud which rotates IPs
# (already configured if BROWSER_USE_API_KEY is set)

Best Practices

1. Use as Discovery Layer

# Google finds URLs, specialized agents extract details
google_result = await google_agent.run(request)
for profile in google_result.profiles:
    if profile.platform == "linkedin":
        linkedin_result = await linkedin_agent.run(request)
    elif profile.platform == "twitter":
        twitter_result = await twitter_agent.run(request)

2. Combine with Exa

# Google for breadth, Exa for depth
google_result = await google_agent.run(request)
exa_result = await exa_client.enrich_person(request)

# Merge results
all_urls = google_result.urls_found + exa_result.urls_found
unique_urls = list(set(all_urls))

3. Filter Noise

# Google returns many irrelevant results, filter them
SKIP_DOMAINS = {"wikipedia.org", "facebook.com", "youtube.com"}

filtered_urls = [
    url for url in result.urls_found
    if not any(skip in url for skip in SKIP_DOMAINS)
]

Comparison: Google vs Exa

FeatureGoogle AgentExa API
Speed5-10s1-3s
CostBrowser Use usageFree/paid tiers
Results10-20 links10 curated hits
QualityNoisyPre-filtered
Use CaseDiscoveryDeep search
Recommendation: Use both in parallel for comprehensive coverage.

Next Steps

LinkedIn Agent

Extract detailed LinkedIn profiles

Twitter Agent

Scrape Twitter/X profiles and tweets

Deep Researcher

Multi-phase pipeline using all agents

Agent Overview

Full agent system architecture

Build docs developers (and LLMs) love