Atharva Anand Joshi
Hi everyone! I am Atharva Anand Joshi, incoming Machine Learning Engineer 2 at Hewlett-Packard - Poly. Here, I explore advanced Deep Learning-based techniques for Spatial and Personalized Speech Enhancement directly on the headset digital signal processor. Our work allows headsets to adapt to the voices of multiple users and over time get better at isolating only their speech in real-time. I am fascinated by the application of machine learning in diverse domains like audio, finance, and marketing.
I completed my M.S in Electrical and Computer Engineering with concentration in AI/ML Systems from Carnegie Mellon University. My research spans speech processing, deep learning, and representation learning, and I’ve had the privilege of collaborating with WAVLab under the guidance of Professor Shinji Watanabe.
I earned my B.E in Electrical and Electronics Engineering from BITS Pilani in July, 2022. During my undergrad, I worked on research projects under the guidance of Professor Syed Mohammad Zafaruddin (SM Research Group) and Professor Ananthakrishna Chintanpalli.
Before joining CMU, I worked as an Analyst at American Express, AI Labs . Here, I explored modelling approaches involving a blend of Tabular deep learning with Tree-based algorithms for Credit Default Prediction - What is the probability of a given customer failing to repay their outstanding debt in near future? This helps in setting the credit line and making other such credit-related decisions.
During my internship at AmEx, I developed a template-based framework that allows users to seamlessly create and deploy their end-to-end Self Learning pipelines for Sequence Models. I have also interned at Adobe Research, India with mentors Dr Atanu Sinha and Dr Sunav Choudhary. Here, we created concise user representations which can be projected onto edge server, providing faster marketing services.
Feel free to reach out for research collaboration, academic guidance (if you are an undergraduate student from BITS) or even just for a chat! For GRE and TOEFL preparation guidance, especially under time contraints do check out this repository.
Email  / 
Resume  / 
Google Scholar  / 
Linkedin  / 
Github
|
|
|
HP Inc., Poly Advanced Technology Group
Research and Development
February 2025 - Present (Incoming Fulltime)
Summer Internship 2023 and 2024
1) Spatial and Personalized Speech Enhancement on Headsets
2) Deep Learning for Noise Suppression on Poly Headsets
|
|
Watanabe’s Audio and Voice (WAV) Lab
Research Collaboration
January 2024 - Present
1) Data Scalability Aspects for the Speech Enhancement Task
2) Query-Driven Dynamic Pruning for Large Speech Models
3) ESPnet: End-to-end speech processing toolkit
|
|
Adobe Research, India
Big Data Experience Lab
May 2021 - August 2021
Edge Computing for Marketing Technology
|
Publications
[1] A. A. Joshi, H. Settibhaktini and A. Chintanpalli, "Modeling Concurrent Vowel Scores Using the Time Delay Neural Network and Multitask Learning," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2452-2459, 2022, doi: 10.1109/TASLP.2022.3192096. (Link)
[2] A. A. Joshi, P. Bhardwaj and S. M. Zafaruddin, "Terahertz Wireless Transmissions with Maximal Ratio Combining over Fluctuating Two-Ray Fading," 2022 IEEE Wireless Communications and Networking Conference (WCNC), 2022, pp. 1575-1580, doi: 10.1109/WCNC51071.2022.9771926. (Link)
|
Patent
S. Chakraborty, S. Choudhary, A. Sinha, S. Nair, M. Ghuhan, Y. Gagneja, A. Joshi, A. Tyagi, S. Gupta, “Generating Concise and Common User Representations for Edge Systems from Event Sequence Data Stored on Hub Systems”, US 17/849,320, Filed Jun 24, 2022. (Link)
|
|
Data Scalability Aspects for the Speech Enhancement Task
Investigating large data (>10,000 hours) scalability aspects for the state-of-the-art speech enhancement models.
Compared several speaker-level data selection methods using diversity metrics based on speaker embeddings.
Extending the work by improving selection methods through non-intrusive speech quality prediction metrics.
|
|
Query-Driven Dynamic Pruning for Large Speech Models
Developed a novel frame-level gate prediction model which can dynamically prune speech LLMs.
Studied the impact of several context sources, including speaker characteristics and audio features on the Automatic
Speech Recognition and Speech Translation capabilities of the model.
Analyzed pruning patterns to understand how the gate predictor decides which modules to prune.
|
|
ESPnet: End-to-end speech processing toolkit
Contributed an end-to-end reproducible deep learning pipeline for the Kinect-WSJ dataset – a multichannel, reverberated and noisy version of the WSJ0-2mix dataset.
Performed benchmark analysis on this speech separation dataset using current state-of-the-art models.
Code / Model
|
|
Proactive Servicing: Guess What? (Amex ML Challenge)
Combined event sequences and demographic data to predict customer intent at the start of the Ask Amex chat session.
The approach involved joint training of Bidirectional GRU with Feedforward Networks.
Attained a validation top-5 accuracy score of 0.768. Our solution made it to the top 10 leaderboard and was selected for internal presentation.
|
|
High Performance Parallel Implementations for Convolutional Neural Networks
Provided fast OpenMP and CUDA implementations for various subroutines corresponding to the convolution layer.
Achieved maximum speedup of 4.23x on the Intel(R) Xeon(R) Silver 4208 CPU and 73.87x on the Nvidia Tesla T4 GPU.
Report
|
|
Concurrent Vowel Identification using TDNN-MTL
Predicted the effect of fundamental frequency (F0) difference on the identification scores in a concurrent vowel identification experiment using Deep Learning.
From the neuron responses generated by the Auditory Nerve Model, a temporal network architecture was used to model short-term and long-term dependencies.
Paper
|
|
Terahertz wireless transmissions with MRC receiver over FTR fading
Framework to perform numerical analysis on FTR channel models obtained upon combining small-scale fading and antenna misalignment effects.
This analysis can also be verified using Monte-Carlo Simulations.
Code / Paper
|
Music
I have been an avid practitioner and performer of Hindustani Classical Vocal Music for the past fourteen years.
Here's a short article on Raga that I had written in my sophomore year (2020). Would love to hear your thoughts!
Following is my musical journey during college and beyond:
|
|
My Instagram Channel
On this channel, I post Classical music, Ghazals and Bollywood covers during my free time.
Link
|
|
Ragamalika, the Classical Music and Dance Club of BITS Pilani
Joint Coordinator
Composed and performed music for our semester productions: Nritya Ranjani and Sangamam
Managed external professional concerts: We've had the opportunity to host some wonderful artists in the past including Pandit Jayateerth Mevundi, Pandit Abhishek Raghuram and IndoSoul by Karthick Iyer.
Instagram / Youtube / Facebook
|
|