Multimodal Evaluation Method for Sound Event Detection
Abstract
Time is an important dimension in sound event detection (SED) systems. However, evaluating the performance of SED systems is directly taken from the classical machine learning domain, and they are not well adapted to the needs of these systems such as recognizing the time, duration, detection, and uniformity of sound events. Despite its importance, it is not well-developed yet. Current methods are highly biased by their assumptions and may misleadingly present convincible results. This paper presents a novel multimodal method to evaluate SED systems from multiple perspectives such as detection, total duration, relative duration, and uniformity. Furthermore, the proposed method is simple, time-efficient, visualizable, extensible, open-source, and overcomes the limitations of existing methods. The benefits of the proposed approach are demonstrated by re-evaluating the best systems presented in a known challenge on sound event detection.