263

submitted 2 months ago by Yuritopiaposadism@hexbear.net to c/games@hexbear.net

41 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] carpoftruth@hexbear.net 15 points 2 months ago

Hm ok. So one writes source code in a coding language, it gets turned into 1s and 0s. Why can't you go back? Source code gets compiled into a specific order of 1s and 0s, but the same set of 1s and 0s could be made from different types of source code?

[-] Le_Wokisme@hexbear.net 21 points 2 months ago

it's pretty hard to un-bake a cake

[-] HelluvaBottomCarter@hexbear.net 18 points 2 months ago

It's like trying to figure out the exact tools used to build a house by looking at the finished house. You can figure out some tools (a hammer, a paintbrush, etc) but it's hard to know exactly. Programs are so interdependent on the components that make them up, guessing isn't a good solution.

[-] bluesheep@lemm.ee 17 points 2 months ago

Like others said, you sort of can. But I also want to add that things like functions names, or comments explaining how a function works, are not needed by your computer when running the program, and thus they get lost after compiling. After running a program designed to reverse engineer a compiled program, you'll be able to see a very dumbed down version; no meaningful function or variable names nor comments explaining the code. You have to figure those out all by yourself.

And add to that that some companies/programmers make some parts of the program difficult to read on purpose, so you have more guesswork to do when reverse engineering, and you've got a giant task ahead of you reverse engineering even small games.

On a side note, the original source code can also just be interesting or funny to read. Valve's source code comments come to mind.

[-] HexReplyBot@hexbear.net 3 points 2 months ago

I found a YouTube link in your comment. Here are links to the same video on alternative frontends that protect your privacy:

[-] oscardejarjayes@hexbear.net 17 points 2 months ago

Why can't you go back

You sort of can, there are de-compilers like Ghidra that can help with this, but it usually takes a lot of manual effort to properly decode.

the same set of 1s and 0s could be made from different types of source code

Yeah, basically. Companies will also take extra steps to make it so people can't get source code from software, since it's their proprietary IP and whatever.

[-] buckykat@hexbear.net 13 points 2 months ago

You can go back but it's very difficult. Only the biggest nerds can do it, with great dedication and time. That process is called reverse engineering.

For a very simple example, suppose I wrote some code to add how many apples Jack and Jill have together. The source code might look like

jackApples = 3

jillApples = 4

numApples = jackApples + jillApples

But the computer doesn't care about Jack, or Jill, or apples for that matter. It only cares about numbers. So when the compiler puts it into ones and zeros all those useful names get dropped. And when I decompile the binary (what we call those ones and zeros) what I get back might look more like

var1 = 3

var2 = 4

var3 = var1 + var2

And if I want to change how many apples Jill has it's a whole process of trial and error to figure out which variable is Jill's number of apples.

Now expand that to thousands or millions of lines of code and you begin to see why nerds want source code instead of binaries.

[-] addie@feddit.uk 4 points 2 months ago

The compiler will see that var3 is just two numbers added together and replace it with 7, which saves having to do an addition every time you run through that code, and is therefore faster. var1 and var2 may be removed from the output as well; shorter code runs faster since you can fit more in the cache. In fact, since var3 is just a number, you can replace every place that it's used with a 7 as well; if you have some functions:

// be careful!  if the number of apples is less than six then the UI will not line up properly
auto getTheNumberOfApples() -> int {
  auto jackApples = 3;
  auto jillApplies = 4;
  return jackApples + jillApplies;
}

auto appleWeight() -> float {
  return 0.2 * getTheNumberOfApples();
}

... then the compiler will look at all that, delete the lot, and just use 1.4f wherever the appleWeight() function was called. Comment is gone, the decision making is gone, it's impossible to go backwards any more.

[-] Halosheep@lemm.ee 1 points 2 months ago

I'm not a professional programmer and just a hobbyist, but if you also had a set function that changes jackApples to an input integer, what happens at compilation?

[-] addie@feddit.uk 2 points 2 months ago

That disables a whole pile of the potential optimisations, of course. You could define jackApples as a "static variable" (as opposed to making it eg. a field in a class or struct):

namespace {
  auto jackApples = 3;
}

auto setJackApples(int newJackApples) -> void {
  jackApples = newJackApples;
}

The most obvious consequence of this is that jackApples now has an address in memory, which you could find out with &jackApples. Executable programs are arranged into a sequence of blocks when they're compiled, which have some historical names based on what they used to be for:

the header section, which identifies what kind of program it is, and where the other blocks start
the text section, which contains all of the executable code, and which might be made read-only by the OS.
the data section, which contains variables that have a known value at startup
the bss section, which contains variables that we know will exist but don't have a value. Might be zero'd out by the OS, might contain unknown leftover values.
after those sections, the heap starts. This is where we allocate anything that we don't know the size of at startup. Your program will ask the operating system to "move the end of the heap" if it needs some space to eg. load a picture from disk that your program will then use.
at the very end of memory, and counting down, the OS will allocate "the stack". This is where all of your variables that are local to each function are kept - it's the "working area"

Because it's statically allocated, jackApples will be in the data section; if you opened up the executable with a hex editor, you'd see a 3 there.

getTheNumberOfApples() will be optimised by the compiler to return the contents of the memory address plus 4. That still counts as a very simple and short function, and it's quite likely that the compiler would inline it and remove the initial function. The actual process of calling a function is to:

push the address of where we are in the program onto the stack
push any variables used by the function onto the stack (which would be none, in this case)
if on x86 / x64, do a whole pile of stack alignment operations :-(
set the address of "where we are in the program" to the address of the function
push some extra space on the stack for all the variables used by the function
run all the code in the function
put the result of the function into one of the CPU registers
pop all of our "working space" back off the stack again
pop the address of "where we came from" off of the stack, and make that the place that we'll continue running the program for

That takes a while, and worse - modern CPUs will try to "pipeline" all the instructions that they know are coming so that it all runs faster. Jumping to a function might break that pipeline, causing a "stall", which slows things down enormously. Much better to inline short functions - the fact that the value is "number in memory address plus four" might be optimised away a little wherever it's used, too.

[-] sodium_nitride@hexbear.net 11 points 2 months ago

To add on to what the others have said, the compiler will also optimise your code (which is why professional coders write in common patterns as much as possible, so the compiler can recognise them and optimise).

So many times, you literally won't even have the same program.

Also machine understandable code (assembly or 1s and 0s) is different depending on the processor used. You could give me machine code made for a risc-v processor and I could reconstruct a c program that made it. But if I had the same program compiled for an x86 processor ...

this post was submitted on 08 Apr 2025

263 points (99.6% liked)

games

20944 readers

88 users here now

Tabletop, DnD, board games, and minecraft. Also Animal Crossing.

3rd International Volunteer Brigade (Hexbear gaming discord)
Chapo Gamedev Discord
ChapoChat NationStates Region
Vanilla Minecraft Server
The Axe and Sickle DnD Discord

Rules

No racism, sexism, ableism, homophobia, or transphobia. Don't care if it's ironic don't post comments or content like that here.
Mark spoilers
No bad mouthing sonic games here :no-copyright:
No gamers allowed :soviet-huff:
No squabbling or petty arguments here. Remember to disengage and respect others choice to do so when an argument gets too much

founded 4 years ago

MODERATORS

Nakoichi@hexbear.net

MiraculousMM@hexbear.net

ZoomeristLeninist@hexbear.net

jack@hexbear.net

gaystyleJoker@hexbear.net

Sulvy@hexbear.net