How does it work pcode?

28 views (last 30 days)
Xabier
Xabier on 1 Aug 2013
Hello,
We would like to protect our MATLAB algorithms and we were thinking of using "pcode" but in order to understand how protected the code will be I would like to know how this protection system works.
I was reading the description in the Mathworks webpage and in some places this method is described as an encryptation (which means a key is needed to decrypt the code) and in some others it was shown as content-obscuring (which we understand as obfuscation like in Java or .NET). So what it is really?
When we use the pcode function, no key o seed is asked so I asume that the same key is used everytime, which will allow MATLAB decrypting the code in any computer without additional information. Is there a key? Is this key an static key used by MATLAB?
I have read in your forum that MATLAB "compiles" (may I use this word?) this code to a bytecode which is executed in a virtual machine. So is this process made everytime we use MATLAB (I mean the half-compilation to bytecode and after the final compilation in the virtual machine) or that bytecode and the VM are only used for the pcode? In that case would it be possible to install the VM simply in a computer without the rest of the tools from MATLAB?
I really need to justify if MATLAB's pcode is safe enough for our needs so I would be glad I could have all this information.
Thank you for your help.

Accepted Answer

Jan
Jan on 2 Aug 2013
Edited: Jan on 2 Aug 2013
Matlab's license conditions explicitly disallow a reverse engineering. So the depth of the investigations and the public discussion must be limited to the obvious facts.
I have done experiments with P-coded files for the same reasons as you. In modern Matlab versions PCODE produces a different result for each run. This could either mean, that a random key is stored in the P-file or Matlab has a static key and adds some random salt to the P-file. For an decryption algorithm both alternatives are almost identical, because the decrypted salt can be seen as a key stored in the file.
The P-files are much smaller than a zipped version of the M-files. This seems to imply, that the P-files are a kind of byte coded. For large M-files with 10'000th of lines opening the P-file the first time is faster than for the M-file. But this can be an effect of the file size.
Because currently the P-code algorithm is not documented, the best reliable assumption is that a weak to very weak encryption method is applied, such that it is more a kind of obfuscation. At least the comments are guaranteed to be removed, such that even a decryption of a large file does not allow to understand the code directly. I've seen even comment free M-files in the FEX which are not usable.
This is a pessimistic point of view. It is based on the fact, that strong encryption methods can be documented in public without reducing the security, but with increasing the trust of the users. This is not a proof, that an undocumented encryption is weak. But sometimes an educated guess is enough in computer science.
I do not recommend to try this: Insert some really criminal terms concerning the internal security of the United States in a string inside a P-coded. Send the P-code around by email. When you get a visit from the intelligence service, P-coding is weak. Or it proves, that sending encoded files catches the attraction already. (While this might be a slightly off-topic joke, it is not funny and it concerns the security level of P-files.)
Methods for a strong AES encryption have been removed from the FileExchange, because they conflicted with the US laws. But you can copy the C-code from the PDFs published on the NIST servers freely and legally, because PDFs are covered by the freedom of speech. Then you could try to apply your own strong encryption to P-files mounted as memory mapped files. I'm not sure if Matlab can import the temporarily decrypted file directly, but it is worth to try.
But even if the file is securely encrypted, you can still use the debugger to step through the code of the loaded P-file line by line and record the input and output of all calls to builtin or user-defined functions.
  2 Comments
Sean de Wolski
Sean de Wolski on 2 Aug 2013
Edited: Sean de Wolski on 2 Aug 2013
Hey Jan,
Quick "English" comment: The way you have the first sentence makes it sound like the license does not explicitly exclude reverse engineering, which it does. I.e. The way I interpret your sentence is "exclude" is used to modify the "conditions" not the reverse engineering. I would say it like: "explicitly disallow ..." or similar.
:)
Jan
Jan on 2 Aug 2013
@Sean: Thanks! As I have revealed earlier, learning English is one of the reasons I participate in the forum. Even a direct translation to German was not clear. I've improved this important sentence.

Sign in to comment.

More Answers (2)

James Tursa
James Tursa on 1 Aug 2013

Xabier
Xabier on 2 Aug 2013
Thank you for your answers! Especially to Jan, now I understand why I couldn't find any precise information about the pcode. I don’t think I want to have any problem with the US intelligence agency (I’m not paid that much for my job) but thank you anyway for your no-advise. I guess it won’t be that easy to compare between different methods to protect the code if there is no official information for the pcode. Thank you again for your help and have a nice weekend.

Categories

Find more on Programming Utilities in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!