A data analysis pipeline integrating ion mobility and high-resolution mass spectrometry for non-target screening in environmental studies
Résumé
Non-target analysis (NTA) based on high-resolution mass spectrometry (HRMS) allows advances in environmental analysis through the identification of unknown contaminants. Recently, ion mobility spectrometry (IMS) coupled to HRMS has gained some interest to obtain an additional separation dimension in addition to gas or liquid chromatography. Nonetheless, the analysis of the high-dimensional data generated by these techniques is often hindered by challenges related to proprietary software, including misalignment, restrictive data formats , and complications in developing customisable analytical workflows. Some efforts are being made to provide open databases, softwares and workflows but they are not always compatible with data acquired with ion mobility and the most commonly used open data format (mzML), produces large files, especially when including ion mobility (several Gb), thus requiring extensive storage space and computing resources. The goal of this study is to develop a pipeline for analysis of HRMS data including ion mobility, based on the advantages of recent open data formats and softwares, and to apply it to the detection of contaminants in complex environmental mixtures.
The environmental data used in this study was acquired on a UPLC-IMS-QTOF system (Waters Vion) in Data-Independent Analysis mode (HDMSE). On this system, data can be retrieved from the UNIFI software using an API. Asynchronous HTTP requests were implemented to speed up the retrieval of large binary data streams. The collected data was then converted to tabular data and saved in the Apache Parquet format , an open-source file format providing efficient compression and fast retrieval of column-oriented data, compatible with many programming languages and data analysis packages. This makes it really fast to import and use into different analytical pipelines, e.g. in DEIMoS , a recent Python-based package allowing for efficient processing of multi-dimensional HRMS data, compatible with IMS. It provides efficient algorithms for feature detection, alignment and MS/MS spectral deconvolution. Comparisons were made between the data formats used in this pipeline with other open formats typically obtained with the ProteoWizard(MsConvert) software . The speed of conversion, data retrieval and storage space were evaluated for 10 environmental samples.
The developed pipeline and the Parquet format demonstrated superior efficiency in terms of data collection and saving speed compared to other formats. Storage size was also significantly reduced with the Parquet format , with ~100-200 Mo for typical environmental samples, while mzML files reached 5-6 Go. The data filtering (e.g., MS1 vs MS2) and visualization is also easily performed with commonly used packages in the R or Python environments . The DEIMoS package's peak detection and alignment algorithms displayed a significantly faster processing speed than proprietary software. The data processed with the DEIMoS package will be further used with other open-source packages such as patRoon to enable extensive interrogation of a number of spectral libraries. The conversion package (developed in R) will be released on GitHub for public access and use, along with a data visualization application (R Shiny).
The pipeline developed in this study allows substantial gains in efficiency in terms of data collection, storage and processing when dealing with large datasets of environmental samples. Current efforts are made on linking the data obtained with this pipeline with other open-source tools and workflows used for NTA of environmental samples (patRoon ). Based on this pipeline, results of contaminants detection in surface waters impacted by urban wet weather discharges will be presented. The processed data will also be used for machine-learning modeling in order to link HRMS signals with the measured ecotoxicity of environmental mixtures.
Origine | Fichiers produits par l'(les) auteur(s) |
---|---|
Licence |
Copyright (Tous droits réservés)
|