Mixed precision for High Performance Computing, application to low energy gamma radiation measurements

Roméo Molina

Résumé

In this thesis, we focus on numerical precision formats, on the control of numerical accuracy of the results and on the opportunities they offer in the context of High Performance Computing. More specifically, we are interested in mixed precision, which consists in mixing several precision formats in the same code to take advantage of both the performance gains of low precision and the validity and stability of high precision. Here, we examine two approaches to introduce mixed precision. On the one hand, precision tuning, which consists in developing decision-support tools to introduce lower precisions than the original ones, into an existing code, while checking the validity of the results thus obtained. On the other hand, the development of linear algebra algorithms that are intrinsically mixed in terms of precision, which can then be used as building blocks for various applications. In this manuscript, we focus on one application in particular: nuclear physics research, i.e. the study of particles and the interactions that govern this scale. This is often achieved by studying extreme values, particularly on highly unstable nuclei, using high-resolution detectors. AGATA is a European collaboration to develop a High-Purity Germanium gamma-ray detector. It is based on the reconstruction of the complete path of a gamm-ray in the detector. To do this, there is a Pulse-Shape Analysis (PSA) that consists in comparing the traces measured in each segment with those of a database previously calculated or obtained by calibrating the crystal, to identify the interaction points and their associated energies. The quantity of data implies an on-line processing, but this step must also be carried out accurately, as a resolution of 5mm is required. We therefore sought to speed up this step by reducing the volume of data, using formats of reduced precision. To check the validity of the results obtained, we used stochastic arithmetic and evaluated the number of points identified by the different methods. In this way, we demonstrated that the original algorithm could be run without loss of quality when using FP16 rather than FP32. We also adapted the algorithm to GPU architecture, showing positive results and encouraging the choice of this kind of hardware for the PSA. We also worked on the Sparce Matrix-Vector product (SpMV), developing a mixed-precision version of this algorithm. This is based on a rigorous analysis that divides elements into buckets and computes them with a precision inversely proportional to their magnitude. This algorithm not only guarantees target accuracy, but also allows the use of several precision formats, both native and emulated. This algorithm delivers significant gains in memory and execution time, but was limited in its use of non-standard precision formats due to their lack of hardware implementation. We therefore decided to integrate optimized accessors within the adaptive matrix-vector product and developed new formats using a reduced exponent taking advantage of the small variation in magnitude within each bucket.

Dans le cadre de cette thèse, nous nous intéressons aux formats de précision numérique, au contrôle de la validité de résultats numériques et aux opportunités qu'ils offrent dans le cadre du calcul haute performance. Plus précisément, nous nous intéressons à la précision mixte, qui consiste à mêler plusieurs formats de précision dans un même code pour tirer partie à la fois des gains de performances apportés par les précisions faibles et de validité et de la stabilité des précisions élevées. Nous étudions ici deux approches visant à introduire de la précision mixte. D'une part le tuning de précision qui consiste à développer des outils d'aide à la décision permettant d'introduire des précisions plus faibles que celles d'origine, dans un code existant, tout en effectuant un contrôle sur la validité des résultats ainsi obtenus. D'autre part, le développement d'algorithmes d'algèbre linéaire intrinsèquement mixtes en termes de précision pouvant ensuite être utilisés comme des briques de bases pour des applications diverses. Dans ce manuscrit, nous nous intéressons à une application en particulier : la recherche en physique nucléaire, c'est-à-dire l'étude des particules et des interactions qui régissent cette échelle. Cet objectif est souvent atteint par l'étude de valeurs extrêmes, en particulier sur des noyaux très instables à l'aide de détecteurs à haute résolution. AGATA est une collaboration européenne visant à mettre au point un détecteur de rayons gamma au Germanium de Haute Pureté. Celui-ci s'appuie sur deux nouvelles technologies : la segmentation électrique des cristaux de Germanium et la reconstruction du parcours complet d'un rayon dans le détecteur. Pour cela, une étape d'analyse de la forme des traces mesurées dans chaque segments avec ceux d'une base de donnée préalablement calculée ou obtenue par calibration du cristal permet d'identifier les points d'interaction et leurs énergies associées. La quantité de données mesurée implique un traitement en direct mais cette étape doit aussi être réalisée avec précision car une résolution de 5mm est requise. Nous avons donc cherché à accélérer cette étape en réduisant le volume des données en utilisant des formats de précisions réduite. Afin de vérifier le maintien de la validité des résultats obtenus, nous nous sommes appuyés sur l'arithmétique stochastique mais aussi sur une évaluation du nombre de points identifiés pareillement par les différentes méthodes. Nous avons ainsi mis en évidence que l'exécution de l'algorithme d'origine pouvait se faire sans perte de qualité en FP16 plutôt qu'en FP32. Nous avons également effectué une réécriture de l'algorithme pour l'adapter à l'architecture GPU, montrant des résultats positifs et incitant à choisir ce type de matériel pour effectuer cette étape. Nous avons également réalisé un travail sur le produit matrice-vecteur creux en développant une version en précision mixte de cet algorithme. Celui-ci s'appuie sur une analyse rigoureuse qui permet de répartir les éléments en buckets et de les calculer dans une précision inversement proportionnelle à leur magnitude. Cet algorithme permet de garantir une précision cible mais aussi d'utiliser plusieurs formats de précision qu'ils soient natifs ou émulés. Cet algorithme permet d'obtenir des gains très importants en mémoire et en temps d'exécution mais se trouvait limité dans son utilisation des formats de précision non standards du fait de leur absence d'implémentation hardware. Nous avons donc décidé d'intégrer des accesseurs optimisés au sein du produit matrice-vecteur adaptatif et développé de nouveaux formats utilisant un exposant réduit tirant partie de la faible variation de magnitude au sein de chaque bucket.

Mixed precision for High Performance Computing, application to low energy gamma radiation measurements

Précision mixte pour le calcul de haute performance, application aux mesures de rayonnements gamma de basse énergie

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager