forensic audio enhancement and transcripts

Forensic Audio Enhancement Transcription and Voice Correlation

Digital Audio Recordings

We strongly recommend that you create a duplicate working copy of your original digital audio file. Generate a minimum of a SHA-1 and MD-5 Hash Checksum keys on both the original file and the working copy to verify a true and error free copy of your file was made. You can then forward both the working copy and the Hash checksums to Media Forensics via a variety of means (see File and Data Transfer Options). Media Forensics can then verify that the file was received error free by confirming the Hash checksum on receipt. The hash checksums also provide a reliable means of verifying the data in court.

Do not open the file in an editing application for cropping or trimming of the start or end of the recording. This will modify the file metadata, apply additional compression to the recording and change the “Last Modified” timestamp. If this occurs, the file will probably be considered not authentic as it is no longer a true duplicate copy of the original. The added compression will also reduce any possible enhancements as more spectral components will be altered or lost.

Digitally recorded data is less forgiving and limiting as to what can be achieved in cases where the original recording had inappropriate settings applied such as too low a sampling rate, a low number of bits per sample, amplitude set too low or too high. Steps can be taken to mitigate some of these shortcomings. However, the level of success will vary on a case to case basis.

All work done will be non-destructive, fully documented and be reproducible on a step by step basis. This level of detail is typically required for the work to be accepted as evidence in a court of law. For private work that will not be used in any legal setting, the documentation can be bypassed to provide a lower cost point. However, this work must not be later submitted as evidence under any circumstances as Media Forensics will not support any associated claims of authenticity or integrity.

The use of data compression algorithms (as used in most recordings) and the choice of encoder type can affect our ability to enhance the data to some degree. If you are making the recording, always select the best possible quality settings to ensure the best possible enhancement results.

Audio Enhancement

Forensic audio enhancement processes provided by Media Forensics include but are not limited to:

  • Filter out interfering noise, crackles pops etc.
  • Enhance and isolate desired voice or background activity.
  • Read, convert and write from and to almost any audio codec and media format.
  • Produce transcripts and associated flags in the audio recording.
  • Separate, align and/or merge multiple tracks.
  • Playback speed and pitch adjustments.
  • Cell phone noise filtering.
  • Adaptive noise threshold filtering.
  • Channel blending.
  • Artifact suppression.
  • Mono to Stereo with tailored channel balancing, Stereo to Mono conversions.
  • 30 band graphic equalisation.
  • Hiss, hum, echo and reverb reduction.
  • Speech volume leveler, audio normalisation and de-clipping.
  • DC offset adjustment.

Audio Voice Recognition

Identifying who’s voice may or may not have been recorde can be very difficult with likely inconclusive results. Even in laboratory conditions, using identical equipment and environment, matching speech is rarely 100% reliable.

Voice recognition requires the breaking down of the spoken words into their phonemes and extracting multiple speech characteristics such at frequency components, spectral power levels, cadence and much more.

If your recording or the sample provided for matching is very short it is unlikely to provide any convincing correlation as there is a requirement to train the software on the provided voice samples.

If the voice recognition and subsequent matching process consists of a choice of one of several samples, then it is much more likely that one sample will stand out as a most likely match. However, this is rarely the case. Usually only one sample is provided with either desired outcome of matching or not matching the recorded voice.

Some voice recognition steps include:

  • Recording format and signal strength standardisation prior to processing.
  • Vocal component extraction and categorising.
  • number of speakers identified or estimated.
  • Short and long term vocal characteristics extracted and correlated
  • processing of voice samples using Hidden Markov Models and/or Neural Networks.

If a particular word or phrase is denied as having been uttered by the person the recording is being attributed to, then it is more likely that an Authentication and Integrity analysis will determine whether the recording was maliciously edited.

Audio Transcriptions

Media Forensics can provide audio transcripts that are of a very high quality, far surpassing the online offerings. However, quality work takes time with even a few minutes of audio taking several hours minimum.

The transcription work involves many repeated passes with a variety of enhancement algorithms and varying playback speeds to achieve the best possible accurate transcript.

Analogue Audio Recordings

Due to the age of magnetic tape reels and cassettes, special handling and processing may be required to minimise the loss of recorded signal due to the flaking off of the magnetic coating from the base material. The degradation of magnetic tapes is sped up by the storage of the media in uncontrolled temperature and humidity environment. Do not attempt to play, respool or retention the tape if you suspect it is in poor condition.

If you have an analogue recording, we utilise the best available professional digitisers to capture and digitise the audio into a high quality uncompressed digital data stream. This will become our working copy. Ideally the original recording equipment should be used in this replay and capture process if possible.