I have a strong memory that AFL - american fuzzy lop the binary fuzzer had a feature similar to what this was doing based on the highlighted portions and screenshots. It wasn't the AFL status screen, it was (may have been a) third party app, and it would color code parts of the input files based on the outputs or whatever from afl's processing.<p>For example, there was a color key that explained that say, purple meant "magic bytes", like "0x4a46494600" for JFIF0, and if any part of the input file caused errors it meant it was probably a checksum and needed to be "fixed" so afl could properly fuzz all the functions in the source code.<p>I'm not super in to fuzzing or that realm anymore, so i doubt i could describe it better than i did, here. I clicked through to see if someone have leveraged the AFL stuff for use in another tool, which would be cool.<p>edit: i think it was afl-analyze - i had a go at the source code for aflplusplus:<p>> A nifty utility that grabs an input file and takes a stab at explaining its structure by observing how changes to it affect the execution path.<p>> Another tool in AFL++ is the afl-analyze tool. It takes an input file, attempts to sequentially flip bytes and observes the behavior of the tested program. It then color-codes the input based on which sections appear to be critical and which are not; while not bulletproof, it can often offer quick insights into complex file formats.
Other tools for parsing and analyzing binary data are listed here: <a href="https://github.com/dloss/binary-parsing">https://github.com/dloss/binary-parsing</a>
Great write up!<p>I looked at ImHex a good while back and I think I had some runtime issues or maybe even compilation issues and didn't dig deeper. Even though the definition language piqued my curiosity.<p>These days I tend to just use xxd, bless, ghex, or seldom wxHexEditor, depending on what I need. But ImHex looks really powerful, like it could replace all the GUI ones.
I'm looking forward to giving it another go tomorrow.<p>Though these days I spend most of my time in wireshark, which is kind of a hex viewer in a way.<p>How does it manage with huge files? Does it try to load the entire thing into memory.
I remember wxHexEditor being good for that, and even being able to open block devices directly and process memory IIRC. Might be getting mixed up with HxD.<p>The decompression and combining compressed with decompressed sections looks very cool. Is the decompression in memory or written to disk?<p>// TagRecord Tags[while(!std::mem::eof())];<p>This loop based length stuff is very cool too, though for large files I'd imagine it could be slow as it will need to iterate through all records to determine the offset for records at the end of the file.<p>To be fair, wireshark / pcap files have this problem too.
> though for large files I'd imagine it could be slow as it will need to iterate through all records to determine the offset for records at the end of the file.<p>Yeah, it's not doing lazy evaluation, so you need to watch out. It's probably not the solution you want for (for example) looking at 500GB disk images.
There's an ImHex WebAssembly build accessible online at: <a href="https://web.imhex.werwolv.net/" rel="nofollow">https://web.imhex.werwolv.net/</a>.
Kind of related, a tool that allows you to hand write ASCII-art-annotated hex dump files, while also able to generate the original binary file from such text file: <a href="https://github.com/netspooky/xx/blob/main/examples/elf.xx">https://github.com/netspooky/xx/blob/main/examples/elf.xx</a>
Wow, I've never thought of it, but "syntax"-highlighting for binary files would be awesome.. e.g. "these bytes indicate the beginning of the next frame" (when talking about MP3/video files), maybe with mouseover support where it says e.g. "this value at this location indicates it's a $FOO variant of the file".<p>Anyone know of such a tool?
Kaitai Struct has an online demo which basically does this; <a href="https://ide.kaitai.io/" rel="nofollow">https://ide.kaitai.io/</a>
I deal with a lot of cryptographic documents (e.g. public keys) and <a href="https://lapo.it/asn1js/" rel="nofollow">https://lapo.it/asn1js/</a> is a godsend for making sense of them. You just paste in hex or pem, and it shows the full deconstructed format along with two-way 'syntax highlighting' where if you hover over part of the deconstruction it highlights the equivalent part of the binary data. Hit the 'load' button for a representative example.
010 editor has something like this. Okteta too. They both use DSLs to represent formats
I wasn’t aware that ImHex had this feature - perhaps I’ll try it!<p>I’ve been singing the praises of 010 Editor for years specifically because of its template and scripting features, the former of which is nearly identical to this DSL.
Looks slightly more expressive than Kaitai's binary format DSL.