Web scraper

This example demonstrates a practical program that combines multiple standard library modules to create a simple web scraper simulator that processes data and generates reports.

Complete program

import "std/io";
import "std/sys";

// Simulate fetching web page content
fn fetch_page : url {
    println(f"Fetching: {url}");
    // In a real scraper, this would fetch actual web content
    return "Sample page content with multiple words and links.";
}

// Count words in text
fn count_words : text {
    let words = text.split(" ");
    return len(words);
}

// Extract statistics from content
fn analyze_content : content {
    let word_count = count_words(content);
    let char_count = len(content);
    
    return {
        "words": word_count,
        "chars": char_count,
        "avg_word_len": char_count / word_count
    };
}

// Generate HTML report
fn generate_report : url, stats {
    let html = "<html>\n";
    html += "<head><title>Scraper Report</title></head>\n";
    html += "<body>\n";
    html += f"<h1>Report for {url}</h1>\n";
    html += f"<p>Word count: {stats['words']}</p>\n";
    html += f"<p>Character count: {stats['chars']}</p>\n";
    html += f"<p>Average word length: {stats['avg_word_len']}</p>\n";
    html += "</body>\n";
    html += "</html>\n";
    return html;
}

// Main program
let args = sys.args();

if len(args) < 2 {
    println("Usage: walrus web_scraper.walrus <url>");
    sys.exit(1);
}

let url = args[1];
println(f"Starting web scraper...");
println(f"Working directory: {sys.cwd()}");

// Fetch and analyze
let content = fetch_page(url);
let stats = analyze_content(content);

// Generate report
let report = generate_report(url, stats);
let output_file = "/tmp/scraper_report.html";

io.write_file(output_file, report);
println(f"Report saved to: {output_file}");

// Display summary
println("\nSummary:");
println(f"  URL: {url}");
println(f"  Words: {stats['words']}");
println(f"  Characters: {stats['chars']}");
println(f"  Average word length: {stats['avg_word_len']}");

println("\nDone!");

How to run

Save the code to a file called web_scraper.walrus and run it with a URL argument:

walrus web_scraper.walrus https://example.com

Expected output

Starting web scraper...
Working directory: /home/user/projects
Fetching: https://example.com
Report saved to: /tmp/scraper_report.html

Summary:
  URL: https://example.com
  Words: 8
  Characters: 50
  Average word length: 6

Done!

Generated HTML report

The program generates an HTML file at /tmp/scraper_report.html:

<html>
<head><title>Scraper Report</title></head>
<body>
<h1>Report for https://example.com</h1>
<p>Word count: 8</p>
<p>Character count: 50</p>
<p>Average word length: 6</p>
</body>
</html>

Features demonstrated

Command-line arguments

let args = sys.args();
if len(args) < 2 {
    println("Usage: walrus web_scraper.walrus <url>");
    sys.exit(1);
}
let url = args[1];

Environment information

println(f"Working directory: {sys.cwd()}");

File writing

io.write_file(output_file, report);

Data structures

return {
    "words": word_count,
    "chars": char_count,
    "avg_word_len": char_count / word_count
};

Key concepts

Module imports: Combining std/io and std/sys modules
Command-line arguments: Using sys.args() to get user input
Error handling: Checking argument count and exiting with status codes
File I/O: Writing generated reports to disk
String manipulation: Building HTML with format strings
Dictionaries: Organizing statistics in structured data
Functions: Breaking down complex tasks into reusable components

Extension ideas

Add support for multiple URLs
Process real HTTP responses (when network support is added)
Parse HTML content to extract links
Save results to different formats (JSON, CSV)
Add logging to track scraping progress

Code Examples

Complete program

How to run

Expected output

Generated HTML report

Features demonstrated

Command-line arguments

Environment information

File writing

Data structures

Key concepts

Extension ideas

Build docs developers (and LLMs) love

Code Examples

​Complete program

​How to run

​Expected output

​Generated HTML report

​Features demonstrated

​Command-line arguments

​Environment information

​File writing

​Data structures

​Key concepts

​Extension ideas

Build docs developers (and LLMs) love

Complete program

How to run

Expected output

Generated HTML report

Features demonstrated

Command-line arguments

Environment information

File writing

Data structures

Key concepts

Extension ideas