Hello!<p>I wanted to share my recent project: Tadpole. It is a custom DSL built on top of KDL specifically for web scraping and browser automation. I wanted there to be a standardized way of writing scrapers and reusing existing scraper logic. This was my solution.<p>Why?<p><pre><code> Abstraction: Simulating realistic human behavior (bezier curves, easing) through high-level composed actions.
Zero Config: Import and share scraper modules directly via Git, bypass NPM/Registry overhead.
Reusability: Actions and evaluators can be composed through slots to create more complex workflows.
</code></pre>
Example<p>This is a fully running example, @tadpole/cli is published on npm:<p>tadpole run redfin.kdl --input '{"text": "Seattle, WA"}' --auto --output output.json<p><pre><code> import "modules/redfin/mod.kdl" repo="github.com/tadpolehq/community"
main {
new_page {
redfin.search text="=text"
wait_until
redfin.extract_from_card extract_to="addresses" {
address {
redfin.extract_address_from_card
}
}
}
}
</code></pre>
Roadmap?
Planned for 0.2.0<p><pre><code> Control Flow: Add maybe (effectively try/catch) and loop (while {}, do {})
DOMPick: Used to select elements by index
DOMFilter: Used to filter elements using evaluators
More Evaluators: Type casting, regex, exists
Root Slots: Support for top level dynamic placeholders
Error Reporting: More robust error reporting
Logging: More consistent logging from actions and add log action to global registry
</code></pre>
0.3.0<p><pre><code> Piping: Allowing different files to chain input/output.
Outputs: Complex output sinks to databases, s3, kafka, etc.
DAGs: Use directed acylic graphs to create complex crawling scenarios and parallel compute.
</code></pre>
Github Repository: <a href="https://github.com/tadpolehq/tadpole" rel="nofollow">https://github.com/tadpolehq/tadpole</a><p>I've also created a community repository for sharing scraper logic:
<a href="https://github.com/tadpolehq/community" rel="nofollow">https://github.com/tadpolehq/community</a><p>Feedback would be greatly appreciated!