The University of Auckland

Project #44: Generation of realistic-looking code

Back

Description:


There are several situations where there is a need to create source code that is realistic-looking yet its functionality is not necessarily that important, or that the functionality is similar-to, but not identical do, other code.

One example is the security domain. Source code represents a significant amount of intellectual property, and so there is an interest in protecting it from illegal use (e.g. copied). One technique is to "obfuscate" it - make it so hard to understand that no one will want to risk using it. One technique for obfuscation is to add unnecessary code that never actually gets executed (e.g. is only called when a complex condition is true, but that condition is never true). As there will need to be a lot of unnecessary code added to make the obfuscation effective, it would be best if it is generated automatically.

However this technique does not work very well if the unnecessary code is obviously fake. Whatever code is used must be hard to distinguish from the necessary code. One way to get realistic code is to just copy it from another project. However it is likely that it will be obvious that the copied code does not really belong. So the challenge is to create code that is realistic and belongs to the system.

Another example is in teaching. A common task is to have to create source code exercises for students to work on, for example code tracing, where the actual functionality of the code is not relevant. Creating such code manually is very tedious, and it is difficult to create code examples that are different enough to help students learn. Here the challenge is to create code with known difficulty for the task. For example, we might want a set of exercises that start with "easy" code and then progressively gets more difficulty.

A third example is mutation testing. This is a technique for evaluating a test suite by creating fault mutations of the code the test suite is meant for. The quality of the test suite can be evaluated by determining how many mutations it detects as having a fault. The challenge in this case is that the mutations cannot be "too different" from the original source code otherwise they will not provide useful information on how well the test suite works.

The goal of this project is to explore what techniques are available for generating realistic-looking code and apply one or more of them in some context.

Outcome:


The goal of this project is to explore what techniques are available for generating realistic-looking code and apply one or more of them in some context.

Prerequisites

Must have successfully completed SOFTENG306

Specialisations

Categories

Supervisor

Co-supervisor

Team

Lab

Lab allocations have not been finalised