Thierno Ibrahima Diop,达喀尔开发者,达喀尔地区,塞内加尔
Thierno is available for hire
Hire Thierno

Thierno Ibrahima Diop

Verified Expert  in Engineering

Data Scientist and Developer

Dakar, Dakar Region, Senegal
Toptal Member Since
April 25, 2022

Thierno是一位首席数据科学家,对自然语言处理(NLP)和机器学习(ML)充满热情。. 他已经指导数据科学家学徒三年了. 他之前在网络和移动应用程序开发方面做了三年的自由职业者. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.


GPT, Natural Language Processing (NLP)...
Desert Moon Speech Services LLC




Preferred Environment

Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, Flask, SpaCy, Gensim, OpenAI

The most amazing...

...我开发的模型是一个检测代码中不同安全问题的系统. It was built using large language models, such as GPT and LLaMA.

Work Experience

CEO | Lead Data Scientist

2022 - PRESENT
  • 领导机器学习工程师团队,应用深度学习从音频输入中检测受欢迎的背诵者.
  • 指导机器学习工程师应用深度学习来计算用户与背诵者的相似性.
  • 帮助团队实现深度学习技术,并用我们的用例进行实验.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Audio, TensorFlow, PyTorch, Python 3, Artificial Intelligence (AI), Jupyter Notebook, Scikit-learn, Keras, DVC, Git, Matplotlib, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, Team Management, Interviewing, Hiring, Code Review, Programming, PostgreSQL

Senior Interview Engineer

2021 - PRESENT
  • 在不到一年的时间里,完成了400多次面试,升入大四.
  • 在与客户分享结果之前,负责其他面试官的质量控制.
  • Gave live reviews for the onboarding of new interviewers.

AI Developer via Toptal

2023 - 2024
Desert Moon Speech Services LLC
  • Collected data to convert audio to phonemes. 然后对数据进行处理,以处理噪声、持续时间和国际音标转换.
  • 使用迁移学习在音素层面训练了一个简单的分类模型.
  • 将问题转换为语音识别,以获得更多上下文和更多可用数据.
  • Handed label imbalance as some phonemes are rare.
Technologies: 人工智能(AI)自然语言处理(NLP), Machine Learning, Speech to Text, Audio, Deep Learning, Transfer Learning, FastAPI, Docker, PyTorch, Python 3

NLP Research Engineer

2023 - 2023
  • Tested different prompt techniques (zero-shot learning, few-shot learning, chain-of-thought, 与不同的法学硕士就20多个安全问题进行了讨论.
  • 优化llm以解决复杂的安全问题,并为模型准备数据.
  • 创建管道以处理具有中间表示的代码并评估llm.
  • 使用来自llm的嵌入,使用GMM和LDA进行主题建模.
  • 使用LLM生成代码,通过创建代理对不同的安全问题进行模糊测试.
  • Built the API and created the releases used in production.
  • Multithreaded to accelerate prediction and inference time.
Technologies: Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Python, Artificial Intelligence (AI), Machine Learning, Deep Learning, Topic Modeling, Clustering, Fuzz Testing, Language Models, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API

Lead Data Scientist

2019 - 2021
  • Created a text-to-speech program with the Wolof language. 使用Wolof语言将文本转换为音素的算法与两个参与者协调数据收集,并评估音素覆盖率.
  • 对沃洛夫语的自动语音识别做出了贡献. 设计了一个收集原始Wolof音频的平台,用于自我监督学习.
  • 建立光学字符识别(OCR)和计算机视觉模型,从国民身份证中提取结构化数据. 内部部署模型和AWS Lambda功能以实现可伸缩性. Built a rotation model to handle the image rotation.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Python, DVC, Bash Script, Amazon S3 (AWS S3), Amazon Web Services (AWS), Amazon EC2, Neural Networks, DeepSpeech, Deep Learning, NumPy, OCR, Seaborn, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Git, Jupyter Notebook, SpaCy, Machine Learning, Artificial Intelligence (AI), Artificial Neural Networks (ANN), APIs, SQL, Team Management, Source Code Review, Interviewing, Hiring, Code Review, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker

Data Scientist

2018 - 2019
  • Used NLP and NLU to extract useful information in a legal text. Developed a regex tester library.
  • 为一家电信公司开发了一个抽取式聊天机器人,用于自动FAQ,通过抓取网站和Twitter来收集数据.
  • Performed data collection and annotation. Deployed using AWS Lambda.
  • 利用Spark开发了一个规则系统,利用Apache Airflow实现了一个灵活的计分系统,具有作业管理和计分系统调度功能.
  • 使用来自多个来源的数据在电信领域执行客户细分. 将聚类模型与理论指标和业务指标进行比较.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Matplotlib, Python 3, Flask, Spark, Apache Airflow, Git, DVC, Gensim, SpaCy, Kaldi, Docker, Bash Script, Audio, Artificial Intelligence (AI), Jupyter Notebook, Keras, Streamlit, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), OCR, NumPy, SciPy, Seaborn, TensorBoard, APIs, SQL, Java, Source Code Review, Programming, Chatbots, Semantic Web, Databases, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker, Amazon DynamoDB


2015 - 2018
  • 作为全栈web和移动开发人员,同时为多个客户工作.
  • 参与了prodispo移动和web应用程序的构思和实现.
  • Developed a web application for the purchase of phone credit.
  • 使用WebSocket创建并使用WebChat应用程序.
  • 为Gainde 2000会议的非物质化开发REST api, 以通关管理为核心的塞内加尔海关战略平台.
  • Created a web app for various football competitions.
  • 构建了一个web服务和一个社交跨平台移动应用.
  • Developed and orchestrated a news website using WordPress.
Technologies: PHP, Symfony, Angular, Ionic, React, Bash Script, Python 3, Jupyter Notebook, Git, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), APIs, Programming, PostgreSQL, AWS Lambda, Amazon DynamoDB

Automatic Speech Recognition for the Wolof Language.

Developed a speech recognition model for the Wolof language. 该项目涉及音频数据收集,并对多种模型和方法进行了评估. 数据必须在多样性和正确性方面进行验证和清理. 我将学习和从头开始的培训与传统和混合方法相结合.

This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.

Wolof Speech Recognition

对创建沃洛夫语自动语音识别做出贡献. 我设计了一个平台来收集原始的Wolof音频用于自我监督学习,并构建和部署了结果模型.

Chatbot for Customer Support in Telecommunication

A chatbot application to semi-automate customer support and FAQ. 这些数据是从多个网站上抓取的,并经过清理,构建了一个提取聊天机器人.


Python 3, Python, Bash Script, SQL, PHP, Java, R


Flask, Spark, Streamlit, Symfony, Angular, Ionic, Scrapy


TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, PyTorch, SpaCy, React, NumPy, SciPy, DeepSpeech


Gensim, Apache Airflow, Amazon Textract, Amazon SageMaker, Kaldi, Git, Seaborn, TensorBoard, Whisper


Jupyter Notebook、Amazon EC2、Amazon Web Services (AWS)、AWS Lambda、Docker


Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases


Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, GPT, Generative Pre-trained Transformers (GPT), Team Management, ChatGPT, DVC, OCR, Deep Learning, Artificial Neural Networks (ANN), APIs, Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API, Speech to Text, Transfer Learning, FastAPI


Fuzz Testing

2015 - 2018

Master's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal

2013 - 2015

Bachelor's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal


Cloudera CCA 175 Spark and Hadoop Developer


Collaboration That Works

How to Work with Toptal



Share your needs


Choose your talent


Start your risk-free talent trial

与你选择的人才一起工作,试用最多两周. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring