Arc I: Raison D’être

Nanubala Gnana Sai
Geek Culture
Published in
6 min readJun 22, 2021

--

It’s the end of the second week of the Coding Phase for GSoC under mlpack. Time has been passing rather quickly. With this article, I wish to summarize the accomplishments, failures and experience I’ve gained during the past 2 weeks.

Before we begin, I’d like to thank my mentors Marcus Edel, James Balamuta and Sayan Goswami. Without their relentless support, this project couldn’t possibly have caught pace.

Just Windows things :)

Community Bonding

It was time to meet everyone and for introducing myself. Honestly, I’m not very good at these kinds of meetings. Thankfully, It went smoother than expected, I got to know a lot more about the mentors and my peers. Their hobbies, their passion, and things they find delight in. Some like sky diving, some enjoy paragliding, some play the piano and others find pleasure in driving.

The ensmallen library has focused extensively on Single Objective Problems (SOP) in the past. That implies, all the callbacks, optimizers and test cases were written keeping that concept in mind. Initially, the plan was to add a few more Multi-Objective Optimizers (MOO). This soon transitioned into a “revolution” (As I like to call it :-) ) to accommodate the library for MOO based problems . Hence the title of my project, “A Framework for Multi-Objective Optimizers”.

A crucial part of crafting algorithms from scratch involves thorough testing. A good test suite would cover the length and breadth of the algorithm. The ZDT (Zitzler, Deb and Thiele) Test Suite matches this description and more. It offers a myriad of objective functions designed specifically to test the algorithm’s ability to overcome various difficult situations. The coding part wasn’t much of a hindrance, although we skipped one of the test because that part in the paper was ambiguous to me.

Fig 1. The shapes of the Pareto Fronts of the ZDT test problems. For more info, see this link.

The attention then shifted onto demonstrating practical applications. Marcus suggested writing a xeus-cling notebook for this purpose. The idea was to use it to balance the returns and volatility of a user given stocks data. Optimizing conflicting objectives simultaneously is the forte of Multi-Objective Evolutionary Algorithms (MOEA).

To start with, a way for generating stock data on the spot was required. The pandas-datareader Python module was deemed fit for this task. With a few lines of code, it allowed users to query the Yahoo Finance API and output a CSV file. The next step was to establish a gateway between Python and C++ interfaces. Enter: “Python C API”. Python C API is, what many consider, where Hades hangs out when he’s bored in hell. Jokes aside, debugging in Python C API can be grueling, I came across this hilarious reply (see the first comment) in Stack Overflow during debugging.

you forgot the step where you draw a pentagon in chicken blood and light the black candles :P …. thats what usually works for me when mixing C and Python

With that put to bed, the need for a new callback emerged. It’s job would be to track the evolutionary process by caching the current Pareto Front. The mlpack repository relies heavily on template metaprogramming to push most of the work to the compile stage itself. The callbacks utilize the same idea, they’re passed as template parameters here. Since previous callbacks were written specifically for SOP, it had to be built from ground up for it to work here.

Time came to finally deploy our algorithm to real world, the result was… underwhelming.

Fig. 2: Left to right. The generated Pareto Front vs the expected Pareto Front.

Defeated, I second-guessed the credibility of all my work till now. Was all the code I’ve written till now wrong? The mentors were confident that, that wasn’t the case. A thorough post-mortem revealed that the implementation was indeed correct ( Yay! ), but the population size was too low to observe anything significant. This marked the end of the Community Bonding period.

Week 1

Previously, artificial set of problems (ZDT) was merged to test the robustness of algorithms against predefined setbacks. However, the Pareto Set (final population in variable space) shapes are astonishingly simple. There’s absolutely no reason that real world problems would behave like this (Li et. al. ). This was the topic of the week, finding an algorithm which performs well on both variable and objective space.

“Multi-Objective Evolutionary Algorithm using Decomposition — Differential Variant (MOEA/D-DE)” [1]. Phew, long name. This algorithm promises to deliver exactly on the aforementioned criteria.

The salient features of this algorithm are:

a) Reference Directions: A set of vectors sampled from Unit Simplex in the objective space. They guide the population towards the true Pareto Front.

b) Decomposition: An objective function which “decomposes” Multi-Objective Problems to a scalar optimization problem using reference directions.

The research paper made some bold claims regarding its speed and accuracy relative to the traditional NSGA-II algorithm, more on this later. The first step was to write some decomposition algorithms.

An interesting insight about decomposition algorithms, essentially they push the population towards the reference direction. For ex: The Weighted Average Decomposition, its objective is to minimize the dot product between reference direction and a given objective vector. That is, its maximizing the similarity between them. Realization’s like these help us understand what the algorithm is doing at its core.

On that note, it was found that the Policy Based Intersection Decomposition (PBI), was wrongly implemented in the paper. The reference direction wasn’t normalized before multiplication in the paper. Checking with other reputable code repositories, my suspicion was spot on. I ended up writing a mini mathematical proof for it ;D

Week 2

The focus of this week was on generating these “Reference Directions”. These are the guiding beacon of the evolution process, it allows explicit control over the diversity of the resulting population. Uniform distribution among the population is the desired feature, so it’s only expected that the Reference Directions distribution be uniform [2].

The mathematical formulation of this boiled down to one simple concept, “Find a way to uniformly sample a unit simplex”. A tour down Stack Overflow revealed three methods which are feasible:

a) Bayesian Bootstrap
b) Das Dennis Sampling
c) Dirichlet sampling

Out of all the above, Das-Dennis was the hardest to implement. It involved multiple layers of recursion, so stack limit could prove to be a problem. Thankfully, pymoo had a recursion less implementation, elegant and incredible brevity.

After having merged MOEA/D-DE with the policy designs. The current pull request for Das Dennis Weight initialization remains hanging. Nevertheless, I’ve taken the liberty to compare our algorithm with NSGA-II, the following was observed.

I ran this a few more times, but the results were more or less the same. That means MOEAD is not only more accurate, but also, faster than NSGA-II by 30x on average.

A victory by wide margin! That was the nail in the coffin for this week’s work. We decided to call it a day and have a mini-break.

End notes

Before even starting to write the algorithm I had to revamp a lot of existing codes. This included writing brand new test suites, rewriting callbacks, deleting old codes, so on, so forth. I believe all of it was worth the effort, all this made porting the code so much easier. Or should I say, its “raison d’être” was fulfilled :-).

Have a great week, folks!

Bibliography

  1. Li, Hui, and Qingfu Zhang. “Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II.” IEEE transactions on evolutionary computation 13.2 (2008): 284–302.
  2. Li, Miqing, and Xin Yao. “What weights work for you? Adapting weights for any Pareto front shape in decomposition-based evolutionary multiobjective optimisation.” Evolutionary computation 28.2 (2020): 227–253.

--

--