5 comments

  • ltbarcly318 days ago
    Misleading title, they didn't make the packaging library 3x faster, they made reading one attribute of a package 3x faster. The whole library is still very, very slow compared to alternatives.
  • djoldman18 days ago
    &gt; _canonicalize_table = str.maketrans( &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZ_.&quot;, &quot;abcdefghijklmnopqrstuvwxyz--&quot;, )<p>&gt; ...<p>&gt; value = name.translate(_canonicalize_table)<p>&gt; while &quot;--&quot; in value:<p>&gt; value = value.replace(&quot;--&quot;, &quot;-&quot;)<p>translate can be wildly fast compared to some commonly used regexes or replacements.
    • teaearlgraycold18 days ago
      I would expect however that a regex replacement would be much faster than your N^2 while loop.
      • dgrunwald17 days ago
        That loop isn&#x27;t N²: if there are long sequences of dashes, every iteration will cut the lengths of those sequences in half. So the loop has at most lg(N) iterations, for a O(N*lg(N)) total runtime.
      • notpushkin18 days ago
        It would be, if it was a common situation.<p>This loop handles cases like `eggtools._spam` → `eggtools-spam`, which is probably rare (I guess it’s for packages that export namespaced modules, and you probably don’t want to export _private modules; sorry in advance for non-pythonic terminology). Having more than two separator characters in a row is even more unusual.
    • est18 days ago
      I am curious, why not .lower().translate(&#x27;_.&#x27;, &#x27;--&#x27;)
      • fwip18 days ago
        .lower() has to handle Unicode, right? I imagine the giant tables slow it down a bit.
        • mort9617 days ago
          It&#x27;s so annoying how so many languages lack a basic &quot;ASCII lowercase&quot; and &quot;ASCII uppercase&quot; function. All the Unicode logic is not only unnecessary, but actively unwanted, when you e.g want to change the case of a hex encoded string or do normalization on some machine generated ASCII-only output.
          • tracker117 days ago
            I&#x27;ll say, C#&#x27;s .ToLowerInvariant, etc. are pretty nice when you need them.
          • est17 days ago
            &gt; It&#x27;s so annoying how so many languages lack a basic &quot;ASCII lowercase&quot; and &quot;ASCII uppercase&quot; function<p>How about b&#x27;&#x27;.lower() ?
            • mort9616 days ago
              What if I have a string and not a byte string?
  • imtringued18 days ago
    Unrelated, but I personally am not satisfied with the performance of Panda&#x27;s XLSX export. As you can see here [0], the code does really strange things. It takes cell.style and throws it into json.dumps() to generate a key for a dictionary so that they can cache the XlsxStyler.convert(cell.style) result. Except, the vast majority of cells do not have any styling whatsoever, so json.dumps is producing the string &quot;null&quot;, which is then used to lookup None. The low hanging fruit are jaw dropping. You can easily speed up the code 10%+ by adding a simple check &quot;if cell.style is not None or fmt is not None:&quot; and switching from json.dumps(cell.style) to str(cell.style). If I wanted an easy weekend project that positively impacts many people this is what I&#x27;d work on.<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;pandas-dev&#x2F;pandas&#x2F;blob&#x2F;main&#x2F;pandas&#x2F;io&#x2F;excel&#x2F;_xlsxwriter.py#L257-L273" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pandas-dev&#x2F;pandas&#x2F;blob&#x2F;main&#x2F;pandas&#x2F;io&#x2F;exc...</a>
    • rthz17 days ago
      Have you tried opening an issue about it? Maybe someone would be happy to work on it. I concur that Excel parsing is rather slow.
  • zahlman21 days ago
    Previously: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46557542">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46557542</a>
  • YouAreWRONGtoo18 days ago
    [dead]