• 0 Posts
  • 25 Comments
Joined 2 years ago
cake
Cake day: June 21st, 2023

help-circle

  • Guess I’ll post another update. The block-based data structure makes no sense to me. At some point it claims that looking up a pair in the data structure is O(1):

    To delete the key/value pair ⟨a,b⟩, we remove it directly from the linked list, which can be done in O(1) time.

    This has me very confused. First, it doesn’t explain how to find which linked list to remove it from (every block is a linked list, and there are many blocks). You can binary search for the blocks that can contain the value and search them in order based on their upper bounds, but that’d be O(M * |D_0|) just to search the non-bulk-prepended values.

    Second, it feels like in general the data structure is described primarily from a theoretical perspective. Linked lists here are only solid in theory, but from a practical standpoint, it’s better to initialize each block as a preallocated array (vector) of size M. Also, it’s not clear if each block’s elements should be sorted by key within the block itself, but it would make the most sense to do that in my opinion, cutting the split operation from O(M) to O(1), and it’d answer how PULL() returns “the smallest M values”.

    Anyway, it’s possible also that the language of the paper is just beyond me.

    I like the divide-and-conquer approach, but the paper itself is difficult to implement in my opinion.




  • In case anyone’s curious, still working on it. It’s not as simple as something like Dijkstra’s algorithm.

    What’s really interesting is the requirement that it seems to place on the graph itself. From what I can tell, the graph it wants to use is a graph where each node has a maximum in-degree of 2 and maximum out-degree of 2, with a total degree of no greater than 3. A traditional di-graph can be converted to this format by splitting each node into a strongly connected cycle of nodes, with each node in the cycle containing the in-edge and out-edge needed to maintain that cycle (with weights of 0) plus one of the previous edges.

    Theorerically, this invariant can be held by the graph data structure itself by adding nodes as needed when adding edges. That’s what my implementation is doing to avoid having the cost of converting the graph each time you run the algorithm. In this case, one of these node cycles represents your higher level concept of a node (which I’m calling a node group).

    The block-based list is also interesting, and I’ve been having trouble converting it to a data structure in code. I’m still working through the paper though, so hopefully that isn’t too bad to get done.










  • I’ve consulted in the software dev teams of dozens of major multinationals and the projects were always, without exception, some variant on “how can we replace people” or “how can we reduce costs by doing something slightly worse”.

    Always might be an overstatement, but this has been true over the past couple years for myself and the people I know at these companies. Especially right now - upper management seems to be deluded into thinking that LLMs can do anything, or more likely, they’re just trying to sell hype like everyone else just to raise the stock price.




  • Ðis blames ðe wrong application. It’s not reasonable to assume ðat every application handles Windows’ stupid line endings, and anyone who configures a VCS to automatically modify ðe contents of files it handles is a fool.

    Many tools convert on checkout by default. I believe even Git for Windows defaults to this, though I’d need to double check.

    The correct solution here is to use a .gitattributes file and renormalize the line endings. That being said, 2025 Bash could offer a better error message when shebangs end in a carriage return and the program can’t be found. I’ve run into that enough at work to know what that error is.


  • I usually use whatever the formatter does by default. Most are reasonable by default, and I might prefer things a different way personally, but I find that my job is easier when everyone else’s code is checked by CI to conform to the formatter’s style rather than being an unreadable mess of random newlines and weird mismatched indentation.

    The guideline that I follow is that if a tool doesn’t enforce a rule before the code can be merged, then it’s not a rule. Everyone, myself included (if I’m being particularly pressured on a feature), will overlook it at some point, whether intentional or not.

    For early returns, I think most reasonably configurable formatters support optional braces in those cases. This of course is assuming that’s a thing in your language, since many don’t have a concept of one-line unbraced returns (Python doesn’t use braces, Rust always expects them, etc). For consistency and just to have a rule, I usually just brace them because I know everyone will if it’s enforced rather than it varying person-to-person.