Title: | Sound Classification Using Convolutional Neural Networks |
---|---|
Description: | Provides an all-in-one solution for automatic classification of sound events using convolutional neural networks (CNN). The main purpose is to provide a sound classification workflow, from annotating sound events in recordings to training and automating model usage in real-life situations. Using the package requires a pre-compiled collection of recordings with sound events of interest and it can be employed for: 1) Annotation: create a database of annotated recordings, 2) Training: prepare training data from annotated recordings and fit CNN models, 3) Classification: automate the use of the fitted model for classifying new recordings. By using automatic feature selection and a user-friendly GUI for managing data and training/deploying models, this package is intended to be used by a broad audience as it does not require specific expertise in statistics, programming or sound analysis. Please refer to the vignette for further information. Gibb, R., et al. (2019) <doi:10.1111/2041-210X.13101> Mac Aodha, O., et al. (2018) <doi:10.1371/journal.pcbi.1005995> Stowell, D., et al. (2019) <doi:10.1111/2041-210X.13103> LeCun, Y., et al. (2012) <doi:10.1007/978-3-642-35289-8_3>. |
Authors: | Bruno Silva [aut, cre] |
Maintainer: | Bruno Silva <[email protected]> |
License: | GPL-3 |
Version: | 0.0.9.3 |
Built: | 2025-02-02 03:40:45 UTC |
Source: | https://github.com/bmsasilva/soundclass |
See documentation of package magrittr for details.
lhs %>% rhs
lhs %>% rhs
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics |
Shiny app to label recordings. Use this app to visualize your training recordings, create annotations and store them in a sqlite database. The app has a sidebar panel with the following buttons/boxes to input required user data:
Create database – if no database exists to store the annotations, use this button to create one
Choose database – choose the database to store the annotations
Butterworth filter – check box to apply filter and indicate low and high frequencies in kHz to filter the recordings
Time expanded – only used in recorders specifically intended for bat recordings. Can take any numeric value. If the recording is not time expanded the value must be set to 1. If it is time expanded the numeric value corresponding to the time expansion should be indicated
Choose folder – choose the folder containing the training recordings
After the spectrogram is ploted:
Select events by clicking in the spectrogram on the middle of the event of interest (bat call, bird song, etc)
Insert the correct label in the "Label" box and add any additional notes in the "Observations" box
Press 'Set labels' button to add labels to database
Repeat above steps if more than one set of events is present in the recording
Press 'Next' button to advance to next recording or pick another recording from the dropdown list
The spectrogram can be zoomed by pressing mouse button and dragging to select an area and then double click on it. To unzoom simply double clicking on the spectrogram without an area selected. To adjust visualization settings, in the top right, the tab "Spectrogram options" can be used to:
Threshold – minimum intensity values to show in the spectrogram. A value of 100 will typically be adequate for the majority of the recorders
Window length – moving window length in ms. Smaller windows best suited for short calls
Overlap – overlap between consecutive windows, higher values give best visualization but lower performance
Resolution – frequency resolution of the spectrogram
app_label()
app_label()
Starts the shiny app, no return value.
Bruno Silva
Shiny app to fit a model from training recordings or to run a fitted model to classify new recordings. This app consists of three GUIs, i.e. three main panels, accessible by the tabs at the top:
Create train data – create train data from recordings and their respective annotations database
Fit model – fit a model from training data
Run model – run a fitted model to classify new recordings
This panel is used to create train data from recordings and their respective annotations database. The sidebar panel has the following buttons/boxes to input required user data:
Choose folder – choose the folder containing the training recordings
Choose database – choose the database with the annotations for the training recordings
Time expanded – choose the correct time expansion factor, normally only used in recorders specifically intended for bat recordings. Can take the values "auto", 1 or 10. If the recording is in real time the value must be 1. If it's time expanded, the value 10 or "auto" can be selected. If "auto" is selected it is assumed that sampling rates < 50kHz corresponds to a value of 10 and sampling rates > 50kHz to corresponds to a value of 1
Spectrogram parameters – different typologies of sound events
require different parameters for computing the spectrograms.
The more relevant are: size (in ms), which should be large
enough to encompass the duration of the largest sound event in
analysis (not only in the training data but also in novel recordings
where the classifiers are to be applied) and moving window (in ms),
that should be smaller for shorter sound events (to capture the quick
changes in time) and larger for longer sound events (to avoid redundant
information). The other parameters are more generalist and the same
values can be used for different sound events, as they only change
the definition of the images created. Please refer to
spectro_calls
documentation for further details
After entering the required information press the button "Create training data from labels" to generate the training data that will be used for fitting a model. This object is saved in the folder containing the training recordings with the name "train_data.RDATA".
This panel is used to fit a model from training data. The sidebar panel has the following buttons/boxes to input required user data:
Choose train data – the file "train_data.RDATA" created in the previous panel
Choose model – a blank model to be fitted. A custom model is provided but must be copied to an external folder if it is to be used. The model path can be obtained by running the following line at the R console: system.file("model_architectures", "model_vgg_sequential.R", package="soundClass") and should be manually copied to a an external folder
Model parameters – the train percentage indicates the percentage of data that is used to fit the model while the remaining are used for validation, batch size indicates the number of samples per gradient update, the learning rate indicates the degree of the gradient update, early stop indicates the maximum number of epochs without improvement allowed before training stops and epochs indicate the maximum number of epochs to train. Further information can be found in keras documentation https://keras.io/api/
The model is evaluated during fitting using the validation data. After completion, by reaching the maximum epochs or the early stopping parameters, the fitted model, the fitting log and the model metadata are saved to the folder containing the train data with file names: "fitted_model.hdf5", "fitted_model_log.csv" and "fitted_model_metadata.RDATA" respectively.
This panel is used to run a fitted model to classify new recordings. The sidebar panel has the following buttons/boxes to input required user data:
Choose folder – choose the folder containing the recordings to be classified
Choose model – a fitted model to be used for classification
Choose metadata – the file containing the fitted model metadata
Time expanded – choose the correct time expansion factor, normally only used in recorders specifically intended for bat recordings. Can take the values "auto", 1 or 10. If the recording is not time expanded the value must be 1. If it's time expanded, the value 10 or "auto" can be selected. If "auto" is selected it is assumed that sampling rates < 50kHz corresponds to a value of 10 and sampling rates > 50kHz to corresponds to a value of 1
Output file – the name of the files to store the results of the classification
Irrelevant – does the fitted model includes an irrelevant class?
Export plots – should a spectrogram of the classified recordings be saved to disk?
The classification results are stored in a folder called "output", created inside the folder containing the recordings. They are stored in a database in sqlite3 format with all the relevant events detected and the respective probability of belonging to a given class. Additionally a file in the csv format is saved to disk, containing summary statistics per recording, i.e. the class with most events detected in each particular recording and the average frequency of maximum energy of the events detected.
app_model()
app_model()
Starts the shiny app, no return value.
Bruno Silva
Run automatic classification of sound events on a set of recordings using a fitted model.
auto_id(model_path, update_progress = NA, metadata, file_path, out_file, out_dir, save_png = TRUE, win_size = 50, plot2console = FALSE, remove_noise = TRUE, recursive = FALSE, tx = 1, butt_filter = FALSE)
auto_id(model_path, update_progress = NA, metadata, file_path, out_file, out_dir, save_png = TRUE, win_size = 50, plot2console = FALSE, remove_noise = TRUE, recursive = FALSE, tx = 1, butt_filter = FALSE)
model_path |
Character. Path to the fitted model. |
update_progress |
Progress bar only to be used inside shiny. |
metadata |
The object created with the function train_metadata() containing the parameters used to fit the model, or the path to the saved RDATA file. |
file_path |
Character. Path to the folder containing recordings to be classified by the fitted model. |
out_file |
Character. Name of the output file to save the results. Will be used to name the csv file and the sqlite database. |
out_dir |
Character. Path to the folder where the output results will be stored. Will be created if it doesn't exist already. |
save_png |
Logical. Should a spectrogram of the classified recordings with the identified event(s) and respective classification(s) be saved as png file? |
win_size |
Integer. Window size in ms to split recordings in chunks for classification. One peak per chunk is obtained and classified. |
plot2console |
Logical. Should a spectrogram of the classified recordings with the identified event(s) and respective classification(s) be plotted in the console while the analysis is running? |
remove_noise |
Logical. TRUE indicates that the model was fitted with a non-relevant class which will be deleted from the final output. |
recursive |
Logical. FALSE indicates that the recordings are in a single folder and TRUE indicates that there are recordings inside subfolders. |
tx |
Only used in recorders specifically intended for bat recordings. Can take the values "auto" or any numeric value. If the recording is not time expanded tx must be set to 1 (the default). If it's time expanded the numeric value corresponding to the time expansion should be indicated or "auto" should be selected. If tx = "auto" the function assumes that sampling rates < 50kHz corresponds to tx = 10 and > 50kHz to tx = 1. |
butt_filter |
Logical. Indicate if a butterworth filter is applied to the recordings |
Runs a classification task on the recordings of a specified folder and saves the results of the analysis.
Nothing.
Bruno Silva
Create a sqlite3 database (if a database with the specified name doesn't exist already) with predefined tables. Two types of databases are possible, one to store recordings annotations and another to store the output of the classification.
create_db(path, db_name = NA, table_name = "labels", type = "reference")
create_db(path, db_name = NA, table_name = "labels", type = "reference")
path |
Character. Path to the folder where the database will be created. |
db_name |
Character. Name of the database to be created. |
table_name |
Character. Name of the table to be created in the database. It is mandatory to use the default table name "labels" if the database is intended to be used in conjunction with other functions of this package. |
type |
Character indicating the type of database to create. Possible options are: "reference" which creates a database to be used to store recordings annotations for training purposes, and "id" which creates a database to output the results of the automatic classification. |
Nothing
Bruno Silva
## Not run: dir_path <- tempdir() create_db(dir_path, db_name = "test", table_name = "labels", type = "reference") file.remove(file.path(dir_path, "test.sqlite3")) ## End(Not run)
## Not run: dir_path <- tempdir() create_db(dir_path, db_name = "test", table_name = "labels", type = "reference") file.remove(file.path(dir_path, "test.sqlite3")) ## End(Not run)
Detects the temporal position of the desired number of energy peaks in a recording exclusively with non-relevant events.
find_noise(recording, nmax = 1, plot = FALSE)
find_noise(recording, nmax = 1, plot = FALSE)
recording |
Object of class "rc". |
nmax |
Integer indicating the maximum number of peaks to detect in the recording. |
plot |
Logical. If TRUE a plot showing the peak(s) is returned. |
A vector with the temporal position of the identified peak(s), in samples.
Bruno Silva
# Create a sample wav file in a temporary directory recording <- tuneR::noise(duration = 44100) temp_dir <- tempdir() rec_path <- file.path(temp_dir, "recording.wav") tuneR::writeWave(recording, filename = rec_path) # Import the sample wav file new_rec <- import_audio(rec_path, butt = FALSE, tx = 1) find_noise(new_rec, nmax = 1, plot = FALSE) file.remove(rec_path)
# Create a sample wav file in a temporary directory recording <- tuneR::noise(duration = 44100) temp_dir <- tempdir() rec_path <- file.path(temp_dir, "recording.wav") tuneR::writeWave(recording, filename = rec_path) # Import the sample wav file new_rec <- import_audio(rec_path, butt = FALSE, tx = 1) find_noise(new_rec, nmax = 1, plot = FALSE) file.remove(rec_path)
Import a "wav" recording. If the recording is stereo it is converted to mono by keeping the channel with overall higher amplitude
import_audio(path, butt = FALSE, low, high, tx = 1)
import_audio(path, butt = FALSE, low, high, tx = 1)
path |
Character. Full path to the recording |
butt |
Logical. If TRUE filters the recording with a 12th order filter. The filter is applied twice to better cleaning of the recording |
low |
Minimum frequency in kHz for the butterworth filter |
high |
Maximum frequency in kHz for the butterworth filter |
tx |
Time expanded. Only used in recorders specifically intended for bat recordings. Can take the values "auto" or any numeric value. If the recording is not time expanded tx must be set to 1 (the default). If it's time expanded the numeric value corresponding to the time expansion should be indicated or "auto" should be selected. If tx = "auto" the function assumes that sampling rates < 50kHz corresponds to tx = 10 and > 50kHz to tx = 1. |
An object of class "rc". This object is a list with the following components:
sound_samples – sound samples of the recording
file_name – name of the recording
file_time – time of modification of the file (indicated for Pettersson Elektronic detectors, for other manufactures creation time should be preferable but it's not implemented yet)
fs – sample frequency
tx – expanded time factor
Bruno Silva
# Create a sample wav file in a temporary directory recording <- tuneR::sine(440) temp_dir <- tempdir() rec_path <- file.path(temp_dir, "recording.wav") tuneR::writeWave(recording, filename = rec_path) # Import the sample wav file new_rec <- import_audio(rec_path, low = 1, high = 20, tx = 1) new_rec file.remove(rec_path)
# Create a sample wav file in a temporary directory recording <- tuneR::sine(440) temp_dir <- tempdir() rec_path <- file.path(temp_dir, "recording.wav") tuneR::writeWave(recording, filename = rec_path) # Import the sample wav file new_rec <- import_audio(rec_path, low = 1, high = 20, tx = 1) new_rec file.remove(rec_path)
Convert time to number of samples or vice versa in sound files.
ms2samples(value, fs = 300000, tx = 1, inv = FALSE)
ms2samples(value, fs = 300000, tx = 1, inv = FALSE)
value |
Integer. Number of samples or time in ms. |
fs |
Integer. The sampling frequency in samples per second. |
tx |
Integer. Indicating the time expansion factor. If the recording is not time expanded tx must be set to 1 (the default). |
inv |
Logical. If TRUE converts time to number of samples, if FALSE number of samples to time. |
Integer. If inv = TRUE returns number of samples, if inv = FALSE returns time in ms.
Bruno Silva
ms2samples(150000, fs = 300000, tx = 1, inv = FALSE) ms2samples(100, fs = 300000, tx = 1, inv = TRUE)
ms2samples(150000, fs = 300000, tx = 1, inv = FALSE) ms2samples(100, fs = 300000, tx = 1, inv = TRUE)
Plot training spectrograms
plot_td(train_data, index)
plot_td(train_data, index)
train_data |
Train data object returned by function spectro_calls() |
index |
Vector indicating th index of the spectrograms to plot. |
A plot
Bruno Silva
Generate spectrograms from recording labels for classification purposes. The spectrogram parameters are user defined and should be selected depending on the type of sound event to classify.
spectro_calls(files_path, update_progress = NA, db_path, spec_size = NA, window_length = NA, frequency_resolution = 1, overlap = NA, dynamic_range = NA, freq_range = NA, tx = 1, seed = 1002, butt_filter = FALSE)
spectro_calls(files_path, update_progress = NA, db_path, spec_size = NA, window_length = NA, frequency_resolution = 1, overlap = NA, dynamic_range = NA, freq_range = NA, tx = 1, seed = 1002, butt_filter = FALSE)
files_path |
Character. Path for the folder containing sound recordings. |
update_progress |
Progress bar only to be used inside shiny. |
db_path |
Character. Path for the database of recording labels created with the shinny app provided in the package. |
spec_size |
Integer. Spectrogram size in ms. |
window_length |
Numeric. Moving window length in ms. |
frequency_resolution |
Integer. Spectrogram frequency resolution with higher values meaning better resolution. Specifically, for any integer X provided, 1/X the analysis bandwidth (as determined by the number of samples in the analysis window) will be used. Not implemented yet, always uses 1 as input value. |
overlap |
Percentage of overlap between moving windows. Accepts values between 0.5 and 0.75. |
dynamic_range |
Threshold of minimum intensity values to show in the spectrogram. A value of 100 will typically be adequate for the majority of the recorders. If this is set to NULL, no threshold is applied. |
freq_range |
Frequency range of the spectrogram. Vector with two values, referring to the minimum and maximum frequency to show in the spectrogram. |
tx |
Time expanded. Only used in recorders specifically intended for bat recordings. Can take the values "auto" or any numeric value. If the recording is not time expanded tx must be set to 1 (the default). If it's time expanded the numeric value corresponding to the time expansion should be indicated or "auto" should be selected. If tx = "auto" the function assumes that sampling rates < 50kHz corresponds to tx = 10 and > 50kHz to tx = 1. |
seed |
Integer. Define a custom seed for randomizing data. |
butt_filter |
Logical. Should a butterworth filter be applied to the recording? |
A list with the following components:
data_x – an array with the spectrogram matrices
data_y – the labels for each matrix in one-hot-encoded format
parameters – the parameters used to create the matrices
labels_df – the labels with their respective numeric index
Bruno Silva
Obtain train metadata from the output of function spectro_calls. Needed to run a fitted model
train_metadata(train_data)
train_metadata(train_data)
train_data |
Output of function spectro_calls. |
A list with the following components:
parameters – parameters of the spectrograms
classes – class names and respective codes
Bruno Silva