![Sound Signal](sound_signal.png) ![Signal Period](signal_period.png) ![Sin and Cos](sin_and_cos.png) ## Sound Pressure Level (SPL) * Sound intensity measured in decibels * ![SPL Equation](spl_equation.png) * ![SPL Intensity](spl_intensity.png) ## Audible Frequency Band * A typical adult can hear sounds with frequencies as low as 20 Hz and as high as 20,000 Hz (20 kHz) (although the upper limit generally decreases with age) ## Sound Wave Propogation and Fall off Propogation * an acoustic pressure wave propagates through space and can be absorbed or reflected by surfaces, diffracted around corners and through narrow “slits,” and refracted as it passes across the boundary between different transmission media * ![Sound Radiation](sound_radiation.png) * the intensity of the sound pressure wave it produces falls off with distance, following a 1/r² law, while pressure follows a 1/r law * ![Sound Falloff](sound_falloff.png) ## Interference/Superposition/Comb Filtering * When multiple sound waves overlap in space, their amplitudes add toge- ther—this is called superposition * If the waves are in phase—that is, their peaks and troughs line up—then the waves will pos- itively reinforce each other * Likewise, if the waves are out of phase, the peaks of one wave can tend to cancel the troughs of the other and vice versa, and the result is a wave with lower (or even zero) amplitude * Comb Filtering: waves reflect off surfaces in such a way as to either almost completely cancel or completely reinforce certain frequencies. The result is a frequency response (see Section 14.2.5.7) with lots of narrow peaks and troughs, which when plotted look a bit like a comb (hence the name) ## Dry, Wet, and Reverberations * Direct (dry). Sound waves that arrive at the listener via a direct, un- obstructed path from the source are collectively known as direct or dry sound * Early reflections (echo). Sound waves that arrive at the listener via an indi- rect path, after being reflected from and partially absorbed by surround- ing surfaces * Late reverberations (tail). Once the sound waves have bounced around the listening space more than a few times, they superimpose and interfere with one another so much that the brain can no longer detect distinct echos. These are known as late reverberations or the diffuse tail * the echos and the tail combine with the dry sound to create what is known as wet sound * ![Wet and Dry](wet_dry.png) ## The Doppler Effect * The sound of the train seems higher pitched when it is approaching you, and becomes lower pitched as it races off into the distance * ![Doppler Effect](doppler_effect.png) * where f is the original frequency, f' is the Doppler-shifted (observed) frequency at the listener, c is the speed of sound in air and v l and vs are the speeds of the listener and sound source, respectively ## Perception of Position * Fall-off with distance * Atmospheric Absorbtion * Having two ears (left and right) * Ear Shape * Head Related Transfer Function (HRTF): mathematical model of the minute effects that the folds of our ears (the pinnae) have on sounds coming from different directions ## Continuous and Discrete Time Signals * real number (decimals allowed, t = time), we call the signal a continuous-time signal, x(t) * integer (n = index), we call the signal a discrete-time signal, x\[n\] ## Linear Time-Invariant Systems (LTI) * A time-invariant system is one for which a time shift in the input signal causes an equal time shift in the output signal * A linear system is one that possesses the property of superposition. This means that if an input signal consists of a weighted sum of other signals, then the output is a weighted sum of the individual outputs that would have been produced, had each of the other signals been fed through the system indepen- dently ## Unit Impulse * This signal is one of a family of related functions known as singularity functions because they all contain at least one discontinuity or “singularity.” * It is a signal whose value is zero everywhere except at n = 0, where its value is one * ![Discrete Impulse](discrete_impulse.png) * ![Discrete Impulse Graph](discrete_impulse_graph.png) * ![Discrete Impulse Formula](discrete_impulse_formula.png) * In continuous time: * ![Continuous Impulse](continuous_impulse.png) * ![Continuous Impulse Graph](continuous_impulse_graph.png) * ![Continuous Impulse Equation](continuous_impulse_equation.png) ## Convolution/Impulse Response * Impulse Response Represented by h\[n\] * ![Impulse Response Graph](impulse_response_graph.png) * The output for the entire input signal is a time shifted impulse response, known as the convolution sum * represented by the \* Operator * ![Convolution Sum](convolution_sum.png) * ![Conlution Sum Notation](convolution_sum_notation.png) * Convolution is commutative, associative, and distributive ## Fourier Transform and Complex Notation * Complex multiplication results in a rotation in the complex plane * Due to j=sqrt(-1) times itself repeats when continued many times * ![Complex Rotation](complex_rotation.png) * ![Complex Projection](complex_projection.png) * ![Fourier Transform](fourier_transform.png) ## Microphone Polar Patterns (input/sensitivity Regions) * ![Microphone Types](microphone_types.png) ## Low Frequency Effects (LFE) * Another term for a subwoofer: bass/low frequency signals ## Amplification/Gain and Volume control * ![Gain Formula](gain_formula.png) * Volume control limits/lowers the gain out of the total max * ![Volume Control](volume_control.png) ## Pulse Code Modulation (PCM) sound signal * PCM Consists of: * Voltage measurement, in floating point or quantized to 8,16,24,or 32 bits (analog to digital conversion (ADC)) * Bit depth also known as "resolution" * Sampled at a sample-rate * Stored in an array/memory * Need to store the resulting sample rate and bit depth in order to reproduce the sound (e.g. other pcm stored data can have different sample rates, so need to convert /equalize them) * PCM storage types: * Raw headerless * Linear PCM (LPCM) * WAV * WMA * AIFF * MP3 * ATRAC * OGG Vorbis * Dolby Digital * DTS * VAG: used in PS3 * MPEG-4 SLS/ALS/DST * The “PlayStation 3 Secrets” website also provides some excellent information on audio formats: https://bit.ly/2HOVtvR ## Rendering audio in 3d * Overview in steps: * Sound Synthesis * Spatialization (distance-based attenuation and Pan) * Acoustical Modeling (early reflections/late reverberations) * Doppler Shifting * Mixing ## Representation in the world * 3D Sound Sources: have position, velocity, radiation pattern, range * Listener: virtual microphone * Environmental Model: geometry, acoustic properties ## Pan and Constant Power Law * The term “pan” comes from early technology that used a “panoramic po- tentiometer” (variable resistor) or “pan pot” to control the relative volumes of the left and right speakers of a stereo system * Use angle between sound source and listener * ![Pan Diagram](pan_diagram.png) * ![Pan Angle Formula](pan_angle_formula.png) * Need to preserve constant power in order to keep the loudness the same from different angles * ![Constant Power](constant_power.png) * A1 and A2 are the gain for each speaker ## Sound Propogation Modeling * 3 Types: * Geometric Analysis * Perceptually Based Models * Uses LTI system model; expensive computationally (convolution) so not used in games really * Ad-hoc methods ## Obstruction/Occlusion/Exclusion * Defined regions to figure out reverb/absorbtion * ![Occlusion Diagram](occlusion_diagram.png) ## Sound Portals * Used in The Last of Us * ![Sound Portals](sound_portals.png) ## Audio Engine Architecture and Pipeline ![Audio Engine Architecture](audio_architecture.png) * Dry digital PCM signal is synthesized * Distance based attenuation and reverb produces wet signal * Wet and dry are panned independantly * panned multi-channel of all 3d sounds are mixed * Each sound is called a "voice", number of channels = number of voices * ![Audio Pipelline](audio_pipeline.png) * Individual voice pipeline: * ![Vice Pipeline](voice_pipeline.png) ## Aux Send * Routing the signal to an external effects pedal/device, then back into the main route ## Reverb Tank * A cheap/easy way to create reverb: store a history of the sound in a buffer and mixed back into the signal, introducing a delay ## Pre/Post send filters * Pre-filter: used to model effects close to the sound source, such as a gas mask effect (dry and wet) * Post-filter: used to muffle for obstruction/occlusion of the direct sound path (dry sound) ## Master mixer and output bus * Performs sample rate and bit depth conversion * Can be done digital or analog * ![Master Bus](master_bus.png) ## Digital Busses and Latency * Use ring buffers to connect different components * Shared memory can used a shared ring buffer (same address space) * Non-shared can use direct-memory access controller (DMAC), as done between PPU and SPU on PS3, or PCIe for sound cards * Larger buffer = more latency * We want low latency on a music creation/recording system in order to try and keep all components as synchronized as possible * t.v. and playback can get away with larger delay, for example a few display/render frames at 1/60 Hz will be fine when playing ## Sound Cues and Groups * Contains both the PCM data and metadata about the sound * 3D or 2D * Fall off curve * Sound ID and sound type * Groups can be used to identify sound types, such as for "frantic" or "calm" events, enemy or friendly player groups * These groups are used to mix/apply relative audio level compared to other groups based on importance ## Ducking * Quickly lower a sound or groups volume in order to make sure the player hears important audio such as dialog ## Instance limiting * Prevent the number of sounds playing at once due to hardware or aesthetic/overall pleasing constraints. * Can be done via per-group limits, sound priorities, etc. ## Popular audio engines * Windows UAA (universal audio architecture) * XAudio2: Xbox 360, Xbox One, Windows; replaces directaudio * Scream, BoomRangBuss: used by PS3, PS4, PSVita, used by NaughtDog * Advanced Linux Sound Architecture (ALSA): Linux * QNX Sound Architecture * OpenAL * AeonWave 4D * FMOD Studio * Miles Sound System * Wwise * Unreal Engine sound system ## Game Audio Features * Split-screen support * Physics-driven audio * Dynamic Music System (based on mood/tension) * Character Dialog System * Sound Synthesis * Crowd Modeling ## Character Dialog System * Want to support: * Catalog all possible lines * Each character/group has unique, recognizable voice * Variety/randomness in dialog * Audio Streaming for long audio such as music, long dialog * Naughty dog uses scheme scripting for dialog: (define-dialog-line 'line-out-of-ammo (character 'drake (lines drk-out-of-ammo-01 ;; "Dammit, I'm out!" drk-out-of-ammo-02 ;; "Crap, need more bullets." drk-out-of-ammo-03 ;; "Oh, now I'm REALLY mad." ) ) (character 'elena (lines eln-out-of-ammo-01 ;; "Help, I'm out!" eln-out-of-ammo-02 ;; "Got any more bullets?" ) ) (character 'pirate-a (lines pira-out-of-ammo-01 ;; "I'm out!" pira-out-of-ammo-02 ;; "Need more ammo!" ;; ... ) ) (character 'pirate-h (lines pirh-out-of-ammo-01 ;; "I'm out!" pirh-out-of-ammo-02 ;; "Need more ammo!" (define-conversation-segment 'conv-searching-for-stuff-01 :rule [] :line 'line-did-you-find-anything ;; "Hey, did you find anything?" :next-seg 'conv-searching-for-stuff-02 ) (define-conversation-segment 'conv-searching-for-stuff-02 :rule [] :line 'line-nope-not-yet ;; "I've been looking for an hour..." :next-seg 'conv-searching-for-stuff-03 ) (define-conversation-segment 'conv-searching-for-stuff-03 :rule [] :line 'line-shut-up-keep-looking ;; "Well then shut up and keep looking!" ) (define-conversation-segment 'conv-player-hit-by-bullet ( :rule [ ('health < 25) ] :line 'line-i-need-a-doctor ;; "I'm bleeding bad... need a doctor!" ) ( :rule [ ('health < 75) ] :line 'line-im-in-trouble ;; "Now I'm in real trouble." ) ( :rule [ ] ;; no criteria acts as an "else" case :line 'line-that-was-close ;; "Ah! Can't let that happen again!" ) ) (define-conversation-segment 'conv-shot-at--start ( :rule [ ] :line 'line-are-you-ok ;; "Are you OK?" :next-seg 'conv-shot-at--health-check :next-speaker 'listener ;; \*\*\* see comments below ) ) (define-conversation-segment 'conv-shot-at--health-check ( :rule [ (('speaker 'shot-recently) == false) ] :line 'line-yeah-im-fine ;; "Yeah, I'm fine." :next-seg 'conv-shot-at--not-hit :next-speaker 'listener ;; \*\*\* see comments below ) ( :rule [ (('speaker 'shot-recently) == true) ] :line 'line-not-exactly ;; "(panting) Not exactly." :next-seg 'conv-shot-at--hit :next-speaker 'listener ;; \*\*\* see comments below ) ) ## Context/Regional sensitive dialog * Can have AI say different dialog based on player location to make them seem smarter * ![Context Regions](context_regions.png) ## Music Stinger * A short musical clip/sound to indicate an event/change in the game, such as the enemy spotted you, or death music