Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

Yu Su; Ke Zhang; Jingyu Wang; Kurosh Madani

doi:10.3390/s19071733

Article Dans Une Revue Sensors Année : 2019

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

, , , (1, 2)

1
2

Yu Su

Fonction : Auteur
PersonId : 1319016
ORCID : 0000-0002-1438-1989

Ke Zhang

Fonction : Auteur

Jingyu Wang

Fonction : Auteur

Kurosh Madani

Fonction : Auteur

Laboratoire Images, Signaux et Systèmes Intelligents

SYNAPSE

Résumé

With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models.

Domaines

Informatique [cs]

Lab Synapse : Connectez-vous pour contacter le contributeur

https://hal.u-pec.fr/hal-04318247

Soumis le : vendredi 1 décembre 2023-15:22:13

Dernière modification le : vendredi 27 décembre 2024-19:56:53

Dates et versions

hal-04318247 , version 1 (01-12-2023)

Identifiants

HAL Id : hal-04318247 , version 1
DOI : 10.3390/s19071733

Citer

Yu Su, Ke Zhang, Jingyu Wang, Kurosh Madani. Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion. Sensors, 2019, 19 (7), pp.1733. ⟨10.3390/s19071733⟩. ⟨hal-04318247⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LISSI UPEC

19 Consultations

0 Téléchargements

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager