260
Fucking nerd. (hexbear.net)
you are viewing a single comment's thread
view the rest of the comments
[-] buckykat@hexbear.net 13 points 1 week ago

You can go back but it's very difficult. Only the biggest nerds can do it, with great dedication and time. That process is called reverse engineering.

For a very simple example, suppose I wrote some code to add how many apples Jack and Jill have together. The source code might look like

jackApples = 3

jillApples = 4

numApples = jackApples + jillApples

But the computer doesn't care about Jack, or Jill, or apples for that matter. It only cares about numbers. So when the compiler puts it into ones and zeros all those useful names get dropped. And when I decompile the binary (what we call those ones and zeros) what I get back might look more like

var1 = 3

var2 = 4

var3 = var1 + var2

And if I want to change how many apples Jill has it's a whole process of trial and error to figure out which variable is Jill's number of apples.

Now expand that to thousands or millions of lines of code and you begin to see why nerds want source code instead of binaries.

[-] addie@feddit.uk 4 points 1 week ago

The compiler will see that var3 is just two numbers added together and replace it with 7, which saves having to do an addition every time you run through that code, and is therefore faster. var1 and var2 may be removed from the output as well; shorter code runs faster since you can fit more in the cache. In fact, since var3 is just a number, you can replace every place that it's used with a 7 as well; if you have some functions:

// be careful!  if the number of apples is less than six then the UI will not line up properly
auto getTheNumberOfApples() -> int {
  auto jackApples = 3;
  auto jillApplies = 4;
  return jackApples + jillApplies;
}

auto appleWeight() -> float {
  return 0.2 * getTheNumberOfApples();
}

... then the compiler will look at all that, delete the lot, and just use 1.4f wherever the appleWeight() function was called. Comment is gone, the decision making is gone, it's impossible to go backwards any more.

[-] Halosheep@lemm.ee 1 points 1 week ago

I'm not a professional programmer and just a hobbyist, but if you also had a set function that changes jackApples to an input integer, what happens at compilation?

[-] addie@feddit.uk 2 points 6 days ago

That disables a whole pile of the potential optimisations, of course. You could define jackApples as a "static variable" (as opposed to making it eg. a field in a class or struct):

namespace {
  auto jackApples = 3;
}

auto setJackApples(int newJackApples) -> void {
  jackApples = newJackApples;
}

The most obvious consequence of this is that jackApples now has an address in memory, which you could find out with &jackApples. Executable programs are arranged into a sequence of blocks when they're compiled, which have some historical names based on what they used to be for:

  • the header section, which identifies what kind of program it is, and where the other blocks start
  • the text section, which contains all of the executable code, and which might be made read-only by the OS.
  • the data section, which contains variables that have a known value at startup
  • the bss section, which contains variables that we know will exist but don't have a value. Might be zero'd out by the OS, might contain unknown leftover values.
  • after those sections, the heap starts. This is where we allocate anything that we don't know the size of at startup. Your program will ask the operating system to "move the end of the heap" if it needs some space to eg. load a picture from disk that your program will then use.
  • at the very end of memory, and counting down, the OS will allocate "the stack". This is where all of your variables that are local to each function are kept - it's the "working area"

Because it's statically allocated, jackApples will be in the data section; if you opened up the executable with a hex editor, you'd see a 3 there.

getTheNumberOfApples() will be optimised by the compiler to return the contents of the memory address plus 4. That still counts as a very simple and short function, and it's quite likely that the compiler would inline it and remove the initial function. The actual process of calling a function is to:

  • push the address of where we are in the program onto the stack
  • push any variables used by the function onto the stack (which would be none, in this case)
  • if on x86 / x64, do a whole pile of stack alignment operations :-(
  • set the address of "where we are in the program" to the address of the function
  • push some extra space on the stack for all the variables used by the function
  • run all the code in the function
  • put the result of the function into one of the CPU registers
  • pop all of our "working space" back off the stack again
  • pop the address of "where we came from" off of the stack, and make that the place that we'll continue running the program for

That takes a while, and worse - modern CPUs will try to "pipeline" all the instructions that they know are coming so that it all runs faster. Jumping to a function might break that pipeline, causing a "stall", which slows things down enormously. Much better to inline short functions - the fact that the value is "number in memory address plus four" might be optimised away a little wherever it's used, too.

this post was submitted on 08 Apr 2025
260 points (99.6% liked)

games

20829 readers
333 users here now

Tabletop, DnD, board games, and minecraft. Also Animal Crossing.

Rules

founded 4 years ago
MODERATORS