Master thesis proposals
Geometric analysis and exploration of neural network expressivity
Last updated: 4 November 2020
An overall success of deep neural networks in the recent decade has been addressed many times and explained by a notion of their high expressivity. Indeed, having large number of layers in a network contributes to the highest complexity of a function that can be learned by the network. Nevertheless, this potential does not ever seem to be utilized on practice. Moreover, as shown in (Ba & Caruana, 2014), shallow neural networks can be trained to have similar performance as of deep networks.
One measure of neural network expressivity can be defined as the number of different activation patterns that occur in the network, where an activation pattern represents the behavior of all non-linearities that occur for a given input. Such measure intuitively represents the complexity of input space geometry learned by the network. Recently, it has been shown by (Hanin & Rolnick, 2019) that the expected deep network expressivity in this sense is equivalent to one of a shallow network with the same number of neurons, and is independent of the number of layers.
In this master project, we will focus our attention on activation patterns of neural networks, and their corresponding regions in the input space. We propose three main directions to choose from for the project.
Statistical exploration. In this direction, the main task would be to perform experiments to propose and confirm or reject hypotheses about the regional structure of a network. Questions that we may consider include an analysis of a region distribution for classification and regression tasks, correlations between the distribution and relation of such distribution to the decision boundary. Furthermore, the study could be extended by considering data-driven network initializations for classification or representation learning, and contrast them to standard random weight initialization schemes.
Boundary traversal. The core part of this direction would include an adaptation of a graph traversal methodology from our recent publication(Polianskii & Pokorny, 2020) to the domain of activation regions. Such applications would bring us to a novel algorithm to obtain the regions which might be more computationally efficient than other existing methods. This methodology would also allow us to study the decision boundary of a given classifier directly. Additionally, we might look at topology of the decision boundary by viewing the system of activation regions as a cellular (CW) complex (learning about topological data analysis will be one of the outcomes of working on this part).
Active training control. The main idea is to investigate ways to train a neural network for a given dataset while enforcing desired properties on its regional structure. An example for such property could be a “high expressivity along the decision boundary,” where boundary information would be provided by some oracle. Simultaneously, this project will investigate the validity of an assumption that the “network expressivity does not change during training” when a non-random handcrafted initialization is considered.
- Background in Machine Learning/Computer Science or Mathematics
- Good programming skills and understanding of code organization in at least one ml library, e.g. pytorch or tensorflow
- Vladislav Polianskii, vpol (at) kth.se
- Matteo Gamba, mgamba (at) kth.se
- Ba, J., & Caruana, R. (2014). Do Deep Nets Really Need to be Deep? In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 2654–2662). Curran Associates, Inc. http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf
- Hanin, B., & Rolnick, D. (2019). Deep relu networks have surprisingly few activation patterns. Advances in Neural Information Processing Systems, 361–370.
- Polianskii, V., & Pokorny, F. T. (2020). Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2154–2164.