Your base assignment is to implement symbolic regression
The first portion of the assignment is to implement the GP for symbolic regression. I strongly encourage you to use an already created GP system for this assignment (although, if you want, by all means make your own from scratch). Two popular ones are ECJ1 (Java) and DEAP2 (Python), but feel free to use any system you like, but please do check with me if you want to use one implemented in a language other than Java, Python, C, C++, C#. If you want, you can use mine3, but note that it will only work on symbolic regression and it is not that well documented (but it is very good at symbolic regression).
Any academic misconduct will be investigated fully and I will push for the maximum allowable penalty.
I have provided you with a collection of data in CSV format. For the most part, this data was generated by me by randomly generating data points, pumping it through a function, and then adding a little bit of noise to the output. See if you can reverse engineer the functions I used to create the data. All the data is formatted such that the first n-1 columns are the independent variables and the nth (last) column is
the dependent variable. For example, in ‘d1.csv’ there are two columns. The first column we will call x and the second we will call y. We need to find some function of x that will predict y. So, y ≈ f(x). If we had 3 columns, we would want z ≈ f(x,y). Note I have approximately equal to because you may not find the exact functions, but you will likely still get a close approximation. Ultimately, your goal is to use symbolic regression to try to find f.
Assuming you get everything working and you don’t have any serious problems, you will automatically get +10 on your assignment.
If you choose to do a typed problem you can gain an additional +4 points. You will have to pick your own problem, so go find your own data to play with. I recommend checking out the UCI machine learning repository4. You must demonstrate to me that you have it working and that the problem is sufficiently challenging enough (and typed) to obtain the additional points.
If you implement modifications for +1 each (max +2), be sure to make them obvious to me. For this assignment you may not use elitism as a modification. I do not care if you go out and find modifications or if you invent your own, just be sure to convince me that you deserve the extra points. If your improvement is not obvious to me, or if I deem it as not significant enough, you will not get the marks. If you choose to do the writeup then explain these modifications within the writeup. If you do not do the writeup, at least include a text file with a description of your modifications so I know what they are.
You can obtain a maximum of +9 for doing a writeup with +5 being from the base report. WARNING: The writeup is not trivial to do well and will take some time to write. This writeup will be marked more qualitatively by a marker. There is no precise best way to structure a writeup and it is difficult to know exactly what should be included. A portion of these marks will be dedicated to prose, understandability, continuity, spelling, grammar, content, and effectiveness. You can find an example of an article I wrote for publication this year to get a sense of what is good, but there is no one right way to do it and I would not recommend making your report look like mine (I’m simply giving it to you for an example, but you can find a lot more online). Below is a list of ideas on what to include:
· What are you doing?
· Small literature review?
· What has worked well in the past on this problem?
· Explain the problem/data
· What is GP?
· What is symbolic regression?
· How is it different from basic linear regression?
· What is typed GP?
· Explain your algorithm
· How did you implement your GP exactly? Enhancements?
· Explain your analysis methodology
· What will you compare it to?
· How will you compare?
· Means? Distributions? P-values? Interquartile ranges? Other statistics?
· Explain the results and discuss them
· What happened? How did they compare to random? Other algorithms? Comparison to known best? Summary statistics?
· You’ll want visualizations here if you choose to do them.
· Plot given data along with your models’ predictions.
· Conclusions and possible future directions
· How good was it?
· References, if you use them.
Again, do note that the marks for this portion will be more qualitative and it will be difficult to know what’s good beforehand. The content is up for you to descide and your decision making on what to include is part of the assignment and course learning objectives. There is no required length for the report, but please do NOT make it longer than 8 pages double column format.
You can obtain +1 if you include a sufficient literature review and have proper references/citations. Don’t worry too much about your formatting. There is no correct number of references to include, just do what makes sense in your situation. It is up to the marker to determine if you will be awarded the +1.