How we made Python's packaging library 3x faster

(iscinumpy.dev)

88 points by rbanffy22 days ago

5 comments

ltbarcly318 days ago
Misleading title, they didn't make the packaging library 3x faster, they made reading one attribute of a package 3x faster. The whole library is still very, very slow compared to alternatives.
djoldman18 days ago
> _canonicalize_table = str.maketrans( "ABCDEFGHIJKLMNOPQRSTUVWXYZ_.", "abcdefghijklmnopqrstuvwxyz--", )> ...> value = name.translate(_canonicalize_table)> while "--" in value:> value = value.replace("--", "-")translate can be wildly fast compared to some commonly used regexes or replacements.
- teaearlgraycold18 days ago
 I would expect however that a regex replacement would be much faster than your N^2 while loop.
 - dgrunwald18 days ago
 That loop isn't N²: if there are long sequences of dashes, every iteration will cut the lengths of those sequences in half. So the loop has at most lg(N) iterations, for a O(N*lg(N)) total runtime.
 - notpushkin18 days ago
 It would be, if it was a common situation.This loop handles cases like `eggtools._spam` → `eggtools-spam`, which is probably rare (I guess it’s for packages that export namespaced modules, and you probably don’t want to export _private modules; sorry in advance for non-pythonic terminology). Having more than two separator characters in a row is even more unusual.
- est18 days ago
 I am curious, why not .lower().translate('_.', '--')
 - fwip18 days ago
 .lower() has to handle Unicode, right? I imagine the giant tables slow it down a bit.
 - mort9618 days ago
 It's so annoying how so many languages lack a basic "ASCII lowercase" and "ASCII uppercase" function. All the Unicode logic is not only unnecessary, but actively unwanted, when you e.g want to change the case of a hex encoded string or do normalization on some machine generated ASCII-only output.
 - tracker118 days ago
 I'll say, C#'s .ToLowerInvariant, etc. are pretty nice when you need them.
 - est18 days ago
 > It's so annoying how so many languages lack a basic "ASCII lowercase" and "ASCII uppercase" functionHow about b''.lower() ?
 mort9617 days ago
 What if I have a string and not a byte string?
imtringued18 days ago
Unrelated, but I personally am not satisfied with the performance of Panda's XLSX export. As you can see here [0], the code does really strange things. It takes cell.style and throws it into json.dumps() to generate a key for a dictionary so that they can cache the XlsxStyler.convert(cell.style) result. Except, the vast majority of cells do not have any styling whatsoever, so json.dumps is producing the string "null", which is then used to lookup None. The low hanging fruit are jaw dropping. You can easily speed up the code 10%+ by adding a simple check "if cell.style is not None or fmt is not None:" and switching from json.dumps(cell.style) to str(cell.style). If I wanted an easy weekend project that positively impacts many people this is what I'd work on.[0] <a href="https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_xlsxwriter.py#L257-L273" rel="nofollow">https://github.com/pandas-dev/pandas/blob/main/pandas/io/exc...</a>
- rthz18 days ago
 Have you tried opening an issue about it? Maybe someone would be happy to work on it. I concur that Excel parsing is rather slow.
zahlman22 days ago
Previously: <a href="https://news.ycombinator.com/item?id=46557542">https://news.ycombinator.com/item?id=46557542</a>
YouAreWRONGtoo18 days ago
[dead]