2 comments

  • bobajeff2 hours ago
    I had to look up what KDL is and what `Functional Source License, Version 1.1, ALv2 Future License` is.<p>So KDL is like another JSON or Yaml. FSL-1.1-ALv2 is an, almost but not really, open source license that after a 2 years becomes available under a real open source license. It&#x27;s to prevent free loading from companies or something. Sounds fine to me actually.
    • zachperkitny1 hour ago
      Effectively, it&#x27;s not meant to restrict people from using it, even in a commercial setting, just to protect my personal interests in what I want to do with it in a commercial setting.<p>KDL is more than just JSON or YAML. It&#x27;s node based. It&#x27;s output in libraries is effectively an AST and its use cases are open ended.
  • zachperkitny4 hours ago
    Hello!<p>I wanted to share my recent project: Tadpole. It is a custom DSL built on top of KDL specifically for web scraping and browser automation. I wanted there to be a standardized way of writing scrapers and reusing existing scraper logic. This was my solution.<p>Why?<p><pre><code> Abstraction: Simulating realistic human behavior (bezier curves, easing) through high-level composed actions. Zero Config: Import and share scraper modules directly via Git, bypass NPM&#x2F;Registry overhead. Reusability: Actions and evaluators can be composed through slots to create more complex workflows. </code></pre> Example<p>This is a fully running example, @tadpole&#x2F;cli is published on npm:<p>tadpole run redfin.kdl --input &#x27;{&quot;text&quot;: &quot;Seattle, WA&quot;}&#x27; --auto --output output.json<p><pre><code> import &quot;modules&#x2F;redfin&#x2F;mod.kdl&quot; repo=&quot;github.com&#x2F;tadpolehq&#x2F;community&quot; main { new_page { redfin.search text=&quot;=text&quot; wait_until redfin.extract_from_card extract_to=&quot;addresses&quot; { address { redfin.extract_address_from_card } } } } </code></pre> Roadmap? Planned for 0.2.0<p><pre><code> Control Flow: Add maybe (effectively try&#x2F;catch) and loop (while {}, do {}) DOMPick: Used to select elements by index DOMFilter: Used to filter elements using evaluators More Evaluators: Type casting, regex, exists Root Slots: Support for top level dynamic placeholders Error Reporting: More robust error reporting Logging: More consistent logging from actions and add log action to global registry </code></pre> 0.3.0<p><pre><code> Piping: Allowing different files to chain input&#x2F;output. Outputs: Complex output sinks to databases, s3, kafka, etc. DAGs: Use directed acylic graphs to create complex crawling scenarios and parallel compute. </code></pre> Github Repository: <a href="https:&#x2F;&#x2F;github.com&#x2F;tadpolehq&#x2F;tadpole" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tadpolehq&#x2F;tadpole</a><p>I&#x27;ve also created a community repository for sharing scraper logic: <a href="https:&#x2F;&#x2F;github.com&#x2F;tadpolehq&#x2F;community" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tadpolehq&#x2F;community</a><p>Feedback would be greatly appreciated!
    • bobajeff2 hours ago
      I like the idea of a DSL for scraping but my scrapers do more than extract text. I also download files (+monitor download progress) and intercept images (+ check for partial or failed to load images). So it seems my use case isn&#x27;t really covered with this.
      • zachperkitny1 hour ago
        Thanks for the idea actually! It&#x27;s difficult to cover every use case in the 0.1.0 release. I&#x27;ll take this into account. Downloading Files&#x2F;Images could likely be abstracted into just an HTTP source and the data sources could be merged in some way.