Banning "length" from the codebase and splitting the concept into count vs size is one of those things that sounds pedantic until you've spent an hour debugging an off-by-one in serialization code where someone mixed up "number of elements" and "number of bytes." After that you become a true believer.<p>The big-endian naming convention (source_index, target_index instead of index_source, index_target) is also interesting. It means related variables sort together lexicographically, which helps with grep and IDE autocomplete. Small thing but it adds up when you're reading unfamiliar code.<p>One thing I'd add: this convention is especially valuable during code review. When every variable that represents a byte quantity ends in _size and every item count ends in _count, a reviewer can spot dimensional mismatches almost mechanically without having to load the full algorithm into their head.
Relatedly, a survey of array nomenclature was performed for the ISO C committee when choosing the name of the new countof operator: <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3469.htm" rel="nofollow">https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3469.htm</a><p>It was originally proposed as lengthof, but the results of the public poll and the ambiguity convinced the committee to choose countof, instead.
The reason many languages prefer `length` to `count`, I think, is that the former is clearly a noun and the latter could be a verb. `length` feels like a simple property of a container whereas `count` could be an algorithm.<p>`countof` removes the verb possibility - but that means that a preference for `countof` over `lengthof` isn't necessarily a preference for `count` over `length`.
As @SkiFire correctly observes[^1], off-by-1 problems are more fundamental than 0-based or 1-based indices, but the latter still vary enough that some kind of discrimination is needed.<p>For many years (decades?) now, I've been using "index" for 0-based and "number" for 1-based as in "column index" for a C/Python style [ix] vs. "column number" for a shell/awk/etc. style $1 $2. Not sure this is the best terminology, but it <i>is</i> nice to have something consistent. E.g., "offset" for 0-based indices means "off" and even the letter "o" in some case becomes "the zero of some range". So, "offset" might be better than "index" for 0-based.<p>[^1]: <a href="https://news.ycombinator.com/item?id=47100056">https://news.ycombinator.com/item?id=47100056</a>
Using the same length of related variable names is definitely a good thing.<p>Just lining things up neatly helps spot bugs.<p>It’s the one thing I don’t like about strict formatters, I can no longer use spaces to line things up.
I've never yet seen a linter option for assignment alignment, but would definitely use it if it were available
I know prettier can isolate a code section from changes by adding comments. And I think others can too.
Is there any reason to not just switch to 1-based indexing if we could? Seems like 0-based indexing really exacerbates off-by-one errors without much benefit
I'm not sure what that has to do with the article, but anyway: <a href="https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html" rel="nofollow">https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831...</a><p>That said, I'm not sure how 1-based indexing will solve off-by-1 errors. They naturally come from the fencepost problem, i.e. the fact that sometimes we use indexes to indicate elements and sometimes to indicate boundaries between them. Mixing between them in our reasoning ultimately results in off-by-1 issues.
Fundamentally, CPUs use 0-based addresses. That's unavoidable.<p>We can't choose to switch to 1-based indexing - either we use 0-based everywhere, or a mixture of 0-based and 1-based. Given the prevalence of off-by-one errors, I think the most important thing is to be consistent.
This is a matter of opinion.<p>My opinion is that 1-based indexing really exacerbates off-by-one errors, besides requiring a more complex implementation in compilers, which is more bug-prone (with 1-based addressing, the compilers must create and use, in a manner transparent for the programmer, pointers that do not point to the intended object but towards an invalid location before the object, which must never be accessed through the pointer; this is why using 1-based addressing was easier in languages without pointers, like the original FORTRAN, but it would have been more difficult in languages that allow pointers, like C, the difficulty being in avoiding to expose the internal representation of pointers to the programmer).<p>Off-by-one errors are caused by mixing conventions for expressing indices and ranges.<p>If you always use a consistent convention, e.g. 0-based indexing together with half-open intervals, where the count of elements equals the difference between the interval bounds, there are no chances for ever making off-by-one errors.
I would bet that in the opposite circumstance you'd say the same thing:<p>"Is there any reason to not just switch to 0-based indexing if we could? Seems like 1-based indexing really exacerbates off-by-one errors without much benefit"<p>The problem is that humans make off-by-one errors and not that we're using the wrong indexing system.
No indexing system is perfect, but one can be better than another. Being able to do array[array.length()] to get the last item is more concise and less error prone than having to add -1 every time.<p>Programming languages are filled with tiny design choices that don’t completely prevent mistakes (that would be impossible) but do make them less likely.
Having to use something like array[length] to get the last element demonstrates a defect of that programming language.<p>There are better programming languages, where you do not need to do what you say.<p>Some languages, like Ada, have special array attributes for accessing the first and the last elements.<p>Other languages, like Icon, allow the use of both non-negative indices and of negative indices, where non-negative indices access the array from its first element towards its last element, while negative indices access the array from its last element towards its first element.<p>I consider that your solution, i.e. using array[length] instead of array[length-1], is much worse. While it scores a point for simplifying this particular expression, it loses points by making other expressions more complex.<p>There are a lot of better programming languages than the few that due to historical accidents happen to be popular today.<p>It is sad that the designers of most of the languages that attempt today to replace C and C++ have not done due diligence by studying the history of programming languages before designing a new programming language. Had they done that, they could have avoided repeating the same mistakes of the languages with which they want to compete.
You say "seems like", can you argue/show/prove this?
I hoped to learn some more excel lookup tactics, alas
With modern IDE and AI there is no need to save letters in identifier (unless too long). It should be "sizeInBytes" instead of "size". It should be "byteOffset" "elementOffset" instead of "offset".
When correctness is important I much prefer having strong types for most primitives, such that the name is focused on describing semantics of the use, and the type on how it is represented:<p><pre><code> struct FileNode {
parent: NodeIndex<FileNode>,
content_header_offset: ByteOffset,
file_size: ByteCount,
}
</code></pre>
Where `parent` can then only be used to index a container of `FileNode` values via the `std::ops::Index` trait.<p>Strong typing of primitives also help prevent bugs like mixing up parameter ordering etc.
Long names become burdensome to read when they are used frequently in the same context
When the same name is used a thousand times in a codebase, shorter names start to make sense. See aviation manuals or business documentation, how abbreviation-dense they are.
Isn't that more tokens though?
Only until they develop some kind of pre-AI minifier and sourcemap tool.
Sure you get one or two word extra worth of tokens, but you save a lot more compute and time figuring what exactly this offset is.
Not significantly, it's one word.
The 'same length for complementary names' thing is great.
I can't read the starts of any lines, the entire page is offset about 100 pixels to the left. :) Best viewed in Lynx?
Is there any other example of "length" meaning "byte length", or is it just Rust just being confusing? I've never seen this elsewhere.<p>Offset is ordinarily just a difference of two indices. In a <i>container</i> I don't recall seeing it implicitly refer to byte offset.
In general in Rust, “length” refers to “count”. If you view strings as being sequences of Unicode scalar values, then it might seem odd that `str::len` counts bytes, but if you view strings as being a subset of byte slices it makes perfect sense that it gives the number of UTF-8 code units (and it is analoguous to, say, how Javascript uses `.length` to return the number of UTF-16 code units). So I think it depends on perspective.
It's the usual convention for systems programming languages and has been for decades, e.g. strlen() and std::string.length(). Byte length is also just more useful in many cases.
A length could refer to lots of different units - elements, pages, sectors, blocks, N-aligned bytes, kbytes, characters, etc.<p>Always good to qualify your identifiers with units IMO (or types that reflect units).
The invariant of index < count, of course, only works when using Djikstra's half-open indexing standard, which seems to have a few very vocal detractors.
See <a href="https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html" rel="nofollow">https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831...</a> for Dijkstra's thoughts on indexing.
Fortunately only a few. Djikstra's is obviously the most reasonable system.
Or learn an array language and never worry about indexing or naming ;-)<p>Everything else looks disgustingly verbose once you get used to them.