19 comments

  • pramodbiligiri11 minutes ago
    A repo with the English translation of each of the rules files, using Google Translate: <a href="https:&#x2F;&#x2F;github.com&#x2F;pramodbiligiri&#x2F;open-code-review-rules" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pramodbiligiri&#x2F;open-code-review-rules</a>.<p>The original rules files (in Chinese): <a href="https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;internal&#x2F;config&#x2F;rules&#x2F;rule_docs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;intern...</a>
  • eranation4 hours ago
    Ran it on a subset of 10 of the 50 PRs in this benchmark <a href="https:&#x2F;&#x2F;codereview.withmartian.com" rel="nofollow">https:&#x2F;&#x2F;codereview.withmartian.com</a><p>- very good recall (~74%, e.g. found a lot of the golden issues)<p>- not so good precision (~12%, e.g. lots of false positives)<p>- the precision causes the F1 to tank (~20%, if this stays the same on the full 50 sample it would puts it almost last, even less than Kilo+Grok)
    • akie4 hours ago
      I would say that recall is the most important metric here though. I&#x27;d want it to catch all the issues.<p>False positives are easy to ignore.
      • witx3 hours ago
        What, no they&#x27;re not. You still need to analyze them to understand they are false positives. It&#x27;s time wasted
        • chaoz_12 minutes ago
          Agree, it&#x27;s something that will eventually teach your developers to ignore points raised as it&#x27;s mostly garbage.
        • onion2k2 hours ago
          Finding problems is optimizing for the customer. Avoiding false positives is optimizing for the developer. Which is right depends on your org&#x27;s culture.
          • evolve-maz2 hours ago
            If I flag every line in your PR as a potential security bug then I have 100% recall.<p>Obviously you need a mixture of high recall and low false positive rate. If 7&#x2F;8 flagged items are fine its much more likely people will ignore the warnings, much like they would any security tool with a 90% false positive rate. That is not optimized for the customer.
            • onion2k1 hour ago
              The <i>ideal</i> is finding all the problems without getting any false positives, but the reality is that you can&#x27;t often have that. An org&#x27;s engineering culture should be designed to fix problems with systems. If you&#x27;re seeing an 87.5% false positive rate that should be seen as another engineering problem to fix. However, that&#x27;s a separate issue to whether or not you accept false positives in a system designed to find problems.<p>Presenting it as either a system that misses real problems or a system that has a huge number of false positives is a false dilemma. You can have a system that&#x27;s designed to find all the problems <i>and then</i> optimize it to reduce the false positives. If you can&#x27;t reduce the number then you optimize to identify false positives as fast as possible. Just ignoring the identified problems on the assumption that they&#x27;re false is giant red flag and a signal that the org has a very a broken engineering culture (but, as you say, that&#x27;s quite common.)
            • eranation2 hours ago
              Yep. Similarly - you can predict with 99.9% accuracy if a Volcano will erupt today by using a rock that has &quot;No&quot; written on it.
            • williamdclt42 minutes ago
              &gt; If I flag every line in your PR as a potential security bug then I have 100% recall.<p>No. A code review isn&#x27;t about &quot;flagging a line of code&quot;, it&#x27;s about identifying an issue or a risk. If a 10-line PR has one issue and you leave a comment on every single character, if you still miss the issue you have 0% recall.
    • isabellehue2 hours ago
      [flagged]
  • hrpnk2 hours ago
    Rule files are in <a href="https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;internal&#x2F;config&#x2F;rules&#x2F;rule_docs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;intern...</a> (in Chinese)
    • pramodbiligiri1 hour ago
      An English rendering of the Java.md (Google Translate): <a href="https:&#x2F;&#x2F;github-com.translate.goog&#x2F;alibaba&#x2F;open-code-review&#x2F;blob&#x2F;main&#x2F;internal&#x2F;config&#x2F;rules&#x2F;rule_docs&#x2F;java.md?_x_tr_sl=auto&amp;_x_tr_tl=en&amp;_x_tr_hl=en-US&amp;_x_tr_pto=wapp" rel="nofollow">https:&#x2F;&#x2F;github-com.translate.goog&#x2F;alibaba&#x2F;open-code-review&#x2F;b...</a>
      • embedding-shape55 minutes ago
        And for comparison, here&#x27;s a GitHub gist with three versions, first the original Chinese one, then the Google Translate version you put and finally a translated done with ChatGPT Pro: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;7a51d565214bd676890729cba4154dd8" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;7a51d565214bd676890...</a><p>Done that way mainly to see how the Google Translate version compared with a ChatGPT translation (revision: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;7a51d565214bd676890729cba4154dd8&#x2F;revisions?short_path=cf7680d#diff-cf7680dd4298c1c968b4f4ee68ecde49f70925564d9490c01b1c9dd7947fde87" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;7a51d565214bd676890...</a>)
  • gbrindisi2 hours ago
    I like the pattern of making a dedicated cli&#x2F;harness and just build a skill to teach coding agents to use it.<p>At $work we built a thorough workflow to do security reviews, which is a pure skill to simplify adoption <a href="https:&#x2F;&#x2F;www.synthesia.io&#x2F;post&#x2F;automating-code-security-reviews-with-claude-mythos-level-capabilities" rel="nofollow">https:&#x2F;&#x2F;www.synthesia.io&#x2F;post&#x2F;automating-code-security-revie...</a><p>But the user experience is tricky because if we aim for very low false positives the run time for this kind of workflows is too long, it&#x27;s then hard to justify blocking PRs.
  • faangguyindia8 hours ago
    If you&#x27;ve codex what does it add over codex&#x27;s default app? I am confused. Can&#x27;t you simply ask codex in another tab to just do a code review?
    • eranation6 hours ago
      Developers should definitely use whatever tool they use to review the code they (or the tool) just wrote. We have a skill that does this in a loop - spin subagents, review (based on our coding standards), triage the review in another subagent, fix what&#x27;s applicable, push back on what&#x27;s not, and we run this in a loop. This is before you even open a PR.<p>The idea of a PR is for others to find things that you have a blind spot to, and also leave some paper trail on the thought process. E.g. if something was not fixed, there is a history of a comment and a reason on WHY it wasn&#x27;t fixed. If you do all that only locally, that context is lost.<p>We noticed that even after doing this self review loop multiple times, we still find issues (either via other models &#x2F; tools or via humans that have the &quot;tribal knowledge&quot;)<p>Maybe one day AI will write perfect code and can review itself, but even if it&#x27;s 0.1% chance it has a bug, or 1 in a million it will do something a bit sinister (like open a backdoor just in case you try to shut it down) - then I really think there is always going to be a need for humans to review something.
    • pramodbiligiri55 minutes ago
      Mechanics of running their command aside, I think the main value add is all the rules: <a href="https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;internal&#x2F;config&#x2F;rules&#x2F;rule_docs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alibaba&#x2F;open-code-review&#x2F;tree&#x2F;main&#x2F;intern...</a><p>Like with &quot;SKILL&quot; files in general, it&#x27;s got to do with Prompt Engineering: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Prompt_engineering#Rationale" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Prompt_engineering#Rationale</a>
    • cheema337 hours ago
      &gt; Can&#x27;t you simply ask codex in another tab to just do a code review?<p>You are likely to get better results if you do not use the same model for review that wrote the code. I typically use Opus for code editing and GPT 5.5 for peer review using an automation with skills.<p>Training set is different between models. If there are gaps in coverage in one model, you want a different model reviewing the work. The second model will its own gaps, but the gap list is not identical.
      • sdevonoes2 hours ago
        &gt; You are likely to get better results if you do not use the same model for review that wrote the code<p>There’s no evidence of this. I guess you are anthropomorphising models (i.e., it’s good that - different human reviews your code)
        • embedding-shape49 minutes ago
          Yeah, one model over another seems to matter less, they respond differently to the same prompts, so if anything, I&#x27;d use multiple prompts over choosing one model over another.<p>However, using two models to generate two reviews easily beats doing one model and one review, as some models seem to &quot;care&quot; more about certain things, but you&#x27;ll just miss different things if you change the model rather than add more.
      • krzyk4 hours ago
        Results also depend on the prompt. You get different results if you ask to review the PR and focus on particular file than if you don&#x27;t make it focus.<p>Or if you make it &quot;be a security engineer&quot; with particular focus points.<p>Or make it a grammar nazi, it will find way more typos than without such focus.<p>Of course all of those &quot;focuses&quot; needs to be in a separate context (agent&#x2F;subagent) to make it work.
      • Art96817 hours ago
        I would suggest that you reverse those roles. gpt-5.5 as the implementer and Opus as the reviewer.
        • hombre_fatal6 hours ago
          They find different things, and there&#x27;s no reason to use one model for review. You want to review it until there&#x27;s nothing left to be unearth.<p>And if you put the review effort into polishing an impl plan, then it doesn&#x27;t matter which model implements it either.
        • pluralmonad6 hours ago
          How come? I find Opus to have better taste and GPT to have more rigor.
    • eyeris7 hours ago
      Presumably nothing. Do note the publisher—Alibaba presumably would rather their own tools and models instead of licensing.<p>They do open source a fair bit of internal tooling, so it’s always interesting to see their approach
    • krzyk4 hours ago
      It can be used outside of local machine.<p>We built something similar, it looks for new PRs where the bot is added and does reviews. Makes the code more tuned toward similar rules. I can&#x27;t assume that a developer run a code review tool himself (just as I don&#x27;t assume he&#x2F;she run a build - so we run builds also).<p>It is just another perspective for code review, besides human. Unfortunately it uses a lot of tokens, and considering that Anthropic, OpenAI and Github Copilot all moved to token based pricing, it is quite a money burner.
    • esafak8 hours ago
      We&#x27;d need a benchmark to tell.
  • elpakal8 hours ago
    At a kill s@@s hackathon at work, I was able to build something that<p>uses a node image installs claude code runs a &#x2F;review-like command puts inline comments to PR deletes old comments when rerunning<p>OCR seems cool, but overkill, and I&#x27;m definitely not using Code Rabbit after their CEO was on here acting snobbish a while back.<p>Point being AI code review in Git** itself isn&#x27;t hard to do and can add a lot of value quickly.
    • eranation6 hours ago
      Nothing against coderabbit or SaaS specifically, but this was one of the reasons I stopped using it <a href="https:&#x2F;&#x2F;kudelskisecurity.com&#x2F;research&#x2F;how-we-exploited-coderabbit-from-a-simple-pr-to-rce-and-write-access-on-1m-repositories" rel="nofollow">https:&#x2F;&#x2F;kudelskisecurity.com&#x2F;research&#x2F;how-we-exploited-coder...</a><p>It&#x27;s very easy to build a basic code review tool. It&#x27;s hard to build one that developers won&#x27;t ask you to turn off because of false positives (or one that will miss your next escaped bug)<p>I think if all the tool does is run a claude code level &#x2F;review skill (which all developers should definitely run before they even open a PR) then isn&#x27;t this a bit of a review theater? Just a guardrail to those developers who don&#x27;t run a &#x2F;review-triage-fix skill in &#x2F;loop before they take the PR out of draft?<p>I wonder how many PRs in the world got to production where several developers commented on each other&#x27;s code, and none of them read anything, just used their gh cli &#x2F; MCP to post &#x2F; answer comments &#x2F; fix issues on their behalf.<p>There is going to be an exponential growth of code generated, and you can&#x27;t escape AI code review, but also there is no real difference between having Claude Code write the code and review itself locally, vs communicating with itself via a slow and downtime prone medium of &quot;PR comments&quot;<p>tl;dr - without any human in the loop reviewing the AI code review, or skimming to see what the AI code review missed, there is no real reason to use a &quot;code review&quot; you can just run it as part of the CI&#x2F;CD and hope AI won&#x27;t miss anything (according to my linkedin feed, there are people out there who really thing this way...)
      • krzyk4 hours ago
        I think that in most cases you either agree on a PR comment or you don&#x27;t. But it has to leave a mark in PR. This is how we do reviews, ignoring PR comment is one of the worst offenses one can make. I don&#x27;t let it go.
      • s900mhz5 hours ago
        Yes! Where it gets really interesting is the scenario in which every developer has their own unique review skill&#x2F;workflow, so the reviews end up being different than you running it yourself, but nobody is reading them still.
    • gardnr7 hours ago
      How snobbish was the CEO acting?
  • singingtoday8 hours ago
    I&#x27;m interested in trying this.<p>We have our own internal automated review which has shown positive results, but I would love to drop it if I find something better.<p>Code review is currently our bottleneck, so any possibility of better automating it is welcome.
    • pramodbiligiri31 minutes ago
      Thermonuclear suggested by someone below is good. Matt Poccock did a demo&#x2F;breakdown of that: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=mh5XZ-L5SFQ" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=mh5XZ-L5SFQ</a>. He has his own &quot;improve-codebase-architecture&quot; skill: <a href="https:&#x2F;&#x2F;github.com&#x2F;mattpocock&#x2F;skills&#x2F;blob&#x2F;main&#x2F;skills&#x2F;engineering&#x2F;improve-codebase-architecture&#x2F;SKILL.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mattpocock&#x2F;skills&#x2F;blob&#x2F;main&#x2F;skills&#x2F;engine...</a><p>Some of them are about general coding guidelines and code quality, not necessarily vetting your current PR against specs! There&#x27;s AbsolutelySkilled with clean-code and clean-architecture. Linking to older version of repo because they seem to be no longer on trunk: <a href="https:&#x2F;&#x2F;github.com&#x2F;AbsolutelySkilled&#x2F;AbsolutelySkilled&#x2F;tree&#x2F;8f704b39e2c652ab199d3c4d6b6cb709426f692c&#x2F;skills" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;AbsolutelySkilled&#x2F;AbsolutelySkilled&#x2F;tree&#x2F;...</a><p>I&#x27;ve been creating some rules to help with my Java coding: <a href="https:&#x2F;&#x2F;github.com&#x2F;bitkentech&#x2F;shipsmooth&#x2F;tree&#x2F;main&#x2F;skills&#x2F;experimental&#x2F;refine&#x2F;rules" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;bitkentech&#x2F;shipsmooth&#x2F;tree&#x2F;main&#x2F;skills&#x2F;ex...</a>. These are assembled into a SKILL file when this skill file template is built: <a href="https:&#x2F;&#x2F;github.com&#x2F;bitkentech&#x2F;shipsmooth&#x2F;blob&#x2F;main&#x2F;skills&#x2F;experimental&#x2F;refine&#x2F;SKILL.jte.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;bitkentech&#x2F;shipsmooth&#x2F;blob&#x2F;main&#x2F;skills&#x2F;ex...</a>
    • sergeym7 hours ago
      I&#x27;ve been liking this code review skill lately, it has pointed out some good improvements. <a href="https:&#x2F;&#x2F;github.com&#x2F;cursor&#x2F;plugins&#x2F;blob&#x2F;main&#x2F;cursor-team-kit&#x2F;skills&#x2F;thermo-nuclear-code-quality-review&#x2F;SKILL.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cursor&#x2F;plugins&#x2F;blob&#x2F;main&#x2F;cursor-team-kit&#x2F;...</a>
    • Supermancho7 hours ago
      [flagged]
  • nutifafa1 hour ago
    this is a great tool, until you try reading the rule files, I had find a translator to make heads of it. given that it is CLI tool is great dev the tinker with it at no additional cost.
  • pi-victor3 hours ago
    i did something like this, but somewhat in reverse. you are the one that reviews the code and you instruct AI what to do through code review comments: <a href="https:&#x2F;&#x2F;parley.cloudflavor.io" rel="nofollow">https:&#x2F;&#x2F;parley.cloudflavor.io</a>.<p>thinking about it, it would be funny to first run alibaba&#x27;s tool and then run parley after.<p>posted it here a few days ago: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48369782">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48369782</a> i guess with AI there are too many Show HN now, and i never got any type of feedback.
    • viblo2 hours ago
      Just a small note, the font on your site is very annoying to read, the characters are not aligned horizontally (Windows w Chrome). Looks to be a scaling issue, if I zoom to 200% it shows fine.
      • pi-victor1 hour ago
        ah, sorry about that - will try to see what is going on. thanks for letting me know!
  • sfortis2 hours ago
    Is not working with gpt5.x models (Unsupported parameter: &#x27;max_tokens&#x27; is not supported with this model. Use &#x27;max_completion_tokens&#x27; instead.) which is hardcoded. I dont know why this is on the front page. My review-with-codex skill is working just fine, consuming my usage and not API tokens.
  • weird-eye-issue5 hours ago
    &gt; After installation, the ocr command is available globally.<p>Wish they chose a different acronym...
  • atestu8 hours ago
    We&#x27;ve been using Coderabbit, great deal ($30&#x2F;mo&#x2F;dev flat) and finds a lot.<p>I also built a skill I call `&#x2F;meta-review` that asks Codex, Cursor, and Gemini to review the code (I use Claude Code). It always finds little things claude &amp; I missed.<p>Coderabbit just came out with their own PR review UI that&#x27;s great for big PRs, it groups files together etc. <a href="https:&#x2F;&#x2F;www.coderabbit.ai&#x2F;blog&#x2F;introducing-atlas-the-first-ai-native-code-review-interface" rel="nofollow">https:&#x2F;&#x2F;www.coderabbit.ai&#x2F;blog&#x2F;introducing-atlas-the-first-a...</a>
    • eranation7 hours ago
      Not sure why you got downvoted, and I have nothing against CodeRabbit, but this comment feels a bit like a paid ad :)<p>How do you see CodeRabbit against other AI code review solutions? E.g. cubic.dev, Qodo, Graphite, Greptile, Baz, Augment Code...<p>An alternative UI to GitHub is well overdue. But once someone will get it right, everyone will copy them...
    • causal8 hours ago
      Is it actually flat fee? I loved Cursor bugbot which was flat fee but they moved to per-run and that killed it for me, but a lot of others are doing the same.
      • atestu7 hours ago
        Yes! They just have a rate limit but we never run into it (we’re just 3 people though).<p>Yea I liked bugbot too but it became pretty pricey.
    • lukaslalinsky5 hours ago
      I&#x27;ve tried many AI code review tools. Nothing comes close to the depth of CodeRabbit reviews. It&#x27;s the only such tool that can find real logical bugs. I&#x27;d love to be able to get Claude Code to do similar quality of review, but I can&#x27;t get it right, no matter how I try.
  • eranation7 hours ago
    I wonder how they do against this benchmark (not that I vetted this benchmark... but still interesting to know...)<p><a href="https:&#x2F;&#x2F;codereview.withmartian.com" rel="nofollow">https:&#x2F;&#x2F;codereview.withmartian.com</a>
  • causal8 hours ago
    I recently moved off Cursor&#x27;s BugBot because it&#x27;s no longer a flat $40, and I feel a little lost trying to find a viable alternative because there are so many and the pricing kind of sucks for all of them. Curious if anyone has a recommendation.
    • lukeasrodgers7 hours ago
      My team tried coderabbit and qodo and they are both trash compared to a tool we quickly built in-house that is more or less a thin wrapper around claude&#x2F;codex, along with per-repo skills. PR review is triggered by webhooks from github to the review tool&#x27;s web app. The tool shared by OP from alibaba certainly does some things ours does not and appears more sophisticated, but we have never had the problems they mention.<p>&quot;The agent can read full file contents, search the codebase, inspect other changed files for context, and produce deep reviews — not just surface-level diff feedback.&quot; our tool does all this too. It catches dumb typos as well as more complicated bugs. Not to mention it is great as a ratchet (<a href="https:&#x2F;&#x2F;qntm.org&#x2F;ratchet" rel="nofollow">https:&#x2F;&#x2F;qntm.org&#x2F;ratchet</a>). It is not a substitute for reviews from other engineers though, since obviously it does nothing to achieve one of the main goals of code review, which is to socialize knowledge of the codebase.<p>Alibaba&#x27;s work here is almost certainly more advanced than what we&#x27;ve done, but ours has been perfectly satisfactory and better than the paid offerings we&#x27;ve tried. I think <i>most</i> teams should not be paying SaaS fees for AI code review, that is the kind of business that mostly should not exist any more.
      • mrklol2 hours ago
        In which areas do you feel like the mentioned are bad? Do they find less and your own solution has more success?<p>If the latter, do you know why?
    • kageiit1 hour ago
      gitar.ai is flat with no limits
  • songting5911 hour ago
    [flagged]
  • AashmanShukla4 hours ago
    [flagged]
  • Aegis_014 hours ago
    [flagged]
  • xuanlin3147 hours ago
    [flagged]
  • lizhengfeng1015 hours ago
    [dead]