Post Job Free
Sign in

Machine Learning Data Scientist

Location:
Durham, NC, 27701
Posted:
April 16, 2025

Contact this candidate

Resume:

Scott Lai

*********@*****.*** +1-352-***-**** https://scottlai.me/ linkedin.com/in/scottlaiq/ Work Authorization: Green Card EDUCATION (Education Page)

Duke University Durham, NC Aug 2022 – May 2024

Master of Science in Interdisciplinary Data Science

• Relevant Courses: Statistical Modeling, Data Engineering, NLP, Data Analysis in Cloud, Practicing Machine Learning, Blockchain Development University of Wisconsin-Madison Madison, WI Aug 2016

– Dec 2018

Bachelor of Science in Statistics and Economics

• Relevant Courses: Calculus, Linear Regression, Data Visualization, Machine Learning in Python, Statistics and Probability. TECHNICAL SKILLS

• Tech Skills: Python, Rust, Node.js, HTML, CSS, JavaScript, R, MySQL, LangChain, LlamaIndex, GPT-api, Claude-api, Tableau, Scikit Learn, TensorFlow, PyTorch, Machine Learning (ML), Deep Learning (DL), Reinforcement Learning (RL), AB Testing. AWS, GCP, Azure, Git, Hugging Face, Bash, Spark, Hadoop, Docker, Kubernetes, beanstalk, OpenAI, Stable Diffusion, Claude 2, Midjourney, CodeWhisperer, Solidity, Web3.js, Truffle, Remix, Flask, Django.

• Professional Certifications: ESG Certification (in progress), AWS Technical Essentials Certification, AWS Cloud Practitioner Essentials Certification, AWS Cloud Foundations Certificate (in progress)

• Language: Mandarin (Native), English (Native)

EXPERIENCE (Experience Page)

BlueberryAI Burlingame, California Dec 2024 – present

- AI Engineer

• Develop and implement AI-driven agents and retrieval-augmented generation (RAG) models for legal analysis in advertisements.

• Design and optimize data pipelines using AWS and Databricks to support efficient processing and extraction of legal insights from large datasets.

• Collaborate with cross-functional teams to identify and address key legal issues in advertising content, ensuring compliance with regulatory standards.

• Enhance AI models for accuracy in legal identification and integrate them into client-facing applications for real-time analysis. ScholarAI Houston, Texas Oct 2023 – present

- Founding Software Engineer

• Developed and deployed a production-grade RAG system using LangChain and AWS DynamoDB, improving search accuracy by 35%

• Engineered an AI-driven document processing pipeline handling 10,000+ daily requests using LLMs and OCR (98% accuracy)

• Built and optimized custom LLM models achieving 17% performance improvement over ChatGPT and Claude 3.5

• Implemented distributed multi-agent LLM architecture serving 500+ concurrent users with FastAPI and React

• Designed scalable ETL pipelines processing 2TB+ data using Apache Spark and Airflow Duke University Graduate School Durham, NC May 2023 – May 2024

- Teaching & Research Assistant

• Collaborated with HuggingFace on LLM development using Rust Candle

• Managed AWS/Azure cloud research projects and led AWS Cloud Club (200+ students)

• Designed technical curriculum and provided student mentorship Spigot, Inc. Virtual / Fort Myers, Florida Nov 2022 – Aug 2023

- Data Scientist & AI Developer

• Developing AI tools to support various aspects of the company's operations.

• Conducting comprehensive research on Big Data to extract valuable insights and trends.

• Engaging in research and development activities related to Blockchain technology.

• Collaborating on strategic projects within the FinTech and Blockchain domain to help achieve business objectives. IUNISPACE Shenzhen, China Aug 2020 – Dec 2021

- Chief Information Officer & Data Scientist

• Alternative data research in the industry environment. Apply alternative data research include satellite data, GIS data, human-flow data, consumption data in the stock market trading to make decisions.

• Leading data analysis team designed 12 algorithms for stock trading. Using data analysis skills include machine learning, deep learning, NLP, and knowledge graph to design the algorithm for quantitative trading, and using the strategy to earn two million dollars in China's A-share market for the company.

• Designed 3 individual data platforms for 3 industries, including bank, real estate, and non-performing asset corporations to help them with the data intelligent processing. These three platforms helped 7 companies save around fifty million dollars from Labor and efficiency costs in 2021. PROJECTS (Project Page)

Ari AI Law webapp ( https://www.ari.law/#lp-pom-block-32 ) May. 2024 Tech: LLMs, Python, Docker, LangChain, LlamaIndex, Claude, AutoGen, GenAI, Nodejs, USPTO, TensorFlow, PyTorch, etc.

• Developed AI-powered patent analysis system using LLMs and USPTO data ScholarAI education webapp ( https://scholarai.io/ ) Oct. 2023 Tech: LLMs, Python, Docker, LangChain, LlamaIndex, Claude, AutoGen, GenAI, Nodejs, etc.

• Simplifies academic, scientific, and legal research by providing instant answers, insights, and analysis from papers, patents, and more—all powered by cutting-edge AI.

• Built RAG-based research platform using LLMs, LangChain, and custom AI models NeuroComm ( https://colab.duke.edu/project/neurocomm-bridging-eeg-and-ai-cognitive-insight/ ) Jan. 2024 Tech: LLM, LLLM, EEG, Python

• Created EEG-based cognitive analysis system integrating LLMs for insight generation Candle Cookbook ( https://nogibjj.github.io/candle-cookbook/ ) Oct. 2023 Tech: LLM, Azure, AWS, GitHub Action, Jenkins, Rust

• Using cloud service to deploy the GPU LLM binary to making large language models more accessible and open-source.



Contact this candidate