104

Why can't code be uncompiled? (lemmy.world)

submitted 2 years ago by Squizzy@lemmy.world to c/nostupidquestions@lemmy.world

54 comments fedilink hide all child comments

I see a lot about source codes being leaked and I'm wondering how it that you could make something like an exact replica of Super Mario Bros without the source code or how you can't take the finished product and run it back through the compilation software?

you are viewing a single comment's thread
view the rest of the comments

[-] Dark_Arc@social.packetloss.gg 67 points 2 years ago

I actually work on a C++ compiler... I think I should weigh in. The general consensus here that things are lossy is correct but perhaps non-obvious if you're not familiar with the domain.

When you compile a program you're taking the source, turning into a graph that represents every aspect of the program, and then generating some kind of IR that then gets turned into machine code.

You lose things like code comments because the machine doesn't care about the comments right off the bat.

Then you lose local variable and function parameter names because the machine doesn't care about those things.

Then you lose your class structure ... because the machine really just cares about the total size of the thing it's passing around. You can recover some of this information by looking at the functions but it's not always going to be straight forward because not every constructor initializes everything and things like unions add further complexity ... and not every memory allocation uses a constructor. You won't get any names of any data members/fields though because ... again the machine doesn't care.

So what you're left with is basically the mangled names of functions and what you can derive from how instructions access memory.

The mangled names normally tell you a lot, the namespace, the class (if any), and the argument count and types. Of course that's not guaranteed either, it's just because that's how we come up with unique stable names for the various things in your program. It could function with a bunch of UUIDs if you setup a table on the compilers side to associate everything.

But wait! There's more! The optimizer can do some really wild things in the name of speed... Including combining functions. Those constructors? Gone, now they're just some more operations in the function bodies. That function you wrote to help improve readability of your code? Gone. That function you wrote to deduplicate code? Gone. That eloquent recursive logic you wrote? Gone, now it's the moral equivalent of a giant mess of goto statements. That template code that makes use of dozens of instantiated functions? Those functions are gone now too; instead it's all the instantiated logic puked out into one giant function. That piece of logic computing a value? Well the compiler figured out it's always 27, so the logic to compute it? Gone.

Now all of that stuff doesn't happen every time, particularly not all of those things are always possible optimizations or good optimizations ... But you can see how incredibly difficult it is to reconstruct a program once it's been compiled and gone through optimization. There's a very low chance if you do reconstruct it, that it will look anything like what you started with.

[-] Treczoks@lemmy.world 13 points 2 years ago

Just wait until you see the crazy optimizers for embedded systems. They take the complete code of a system into consideration, and, in a number of compile passes, reuses code snippets from app, libraries, and OS layer to create one big tangled mess that is hard to follow even if you have the source code...

[-] noli@programming.dev 4 points 2 years ago

Isn't that still the same exact process as a normal compiler except in the case of embedded systems your OS is like a couple kilobytes large and just compiled along with the rest of your code?

As in, are those "crazy optimizations" not just standard compiler techniques, except applied to the entire OS+applications?

[-] morhp@lemmynsfw.com 4 points 2 years ago

The main difference is that when you compile a program for Windows, Linux etc., you have an operating system and kernel with their exposed functions/interfaces so even in a compiled program it's pretty easy to find the function calls for opening a file, moving a window, etc. (as long as the developer doesn't add specific steps hiding these calls). But in an embedded system, it's one large mess without any interfaces apart from those directly on the hardware level.

[-] Treczoks@lemmy.world 4 points 2 years ago

In a way, yes. But it really creates a mess when the linker starts sharing code between your code of which you have sources, and then jumps in the middle of system code for which you don't have sources. And a pain in the whatever to debug.

[-] noli@programming.dev 2 points 2 years ago

Don't you have the code in most cases? Like with e.g. freeRTOS? That's fully open source

[-] Treczoks@lemmy.world 2 points 2 years ago

For a number of reasons people use commercial OSes in this world, too.

[-] noli@programming.dev 1 points 2 years ago

Does commercial mean closed source in this context though? It seems like a waste of resources not to provide the source code for an rtos.

Considering how small in size they tend to be + with their power/computational constraints I can't imagine they have very effective DRM in place so it shouldn't take that much to reverse engineer.

May as well just provide the source under some very restrictive license.

[-] Treczoks@lemmy.world 1 points 2 years ago

Yes, it is closed source, but you can buy a "source license". Which is painfully expensive.

this post was submitted on 03 Jan 2024

104 points (95.6% liked)

No Stupid Questions

48513 readers

219 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 3 years ago

MODERATORS

L3s@lemmy.world

technopagan@lemmy.world

jeffw@lemmy.world

L3s@hackingne.ws

L4s@lemmy.world