Guides
Task-oriented walkthroughs for the things people actually do with ami: feeding it a seed, tuning it for throughput, and sharding a run across machines.
Each guide is built around a job rather than a flag: getting your URLs into ami whatever shape they are in, pushing the box as hard as it goes without being rude, and splitting a large seed across a fleet. They assume you have worked through the quick start.
An end-to-end run
Take a list of URLs from raw seed to a queryable WARC archive: prepare the seed, crawl it, and read the results back.
Seed formats
Feed ami a list of URLs as a text file, newline JSON, a Parquet column, a sitemap, or stdin.
Tuning a crawl
Push the box as hard as it goes with workers, transport shards, and per-host caps, or dial it back to be polite.
Sharding a run
Split one big seed across several machines so each fetches a disjoint partition.