Talent.com
Fine Tuning / Post Training Data Scientist - RL (GRPO, PPO, RLHF)

Fine Tuning / Post Training Data Scientist - RL (GRPO, PPO, RLHF)

BinanceWorkFromHome, Bali, Indonesia
3 days ago
Job description

Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.

About the Role

You will develop and optimize Reinforcement Learning (RL) models for enterprise-scale applications such as customer service, token reporting, compliance, and Web3 domain reasoning.

You will explore and evaluate advanced Algorithms including PPO, GRPO, DPO, RLHF, RLAIF, and Agentic RL to enhance the capabilities of LLMs, VLMs, and Agentic AI at Binance. The role requires a strong theoretical foundation in RL—covering policy optimization, reward modeling, and planning—paired with the Engineering skills to build scalable production systems.

You will take full ownership from research through deployment, driving experimentation with systematic evaluation and benchmarking. Collaboration across research, infrastructure, and application teams will be key to delivering impactful AI solutions.

Responsibilities

  • Research and develop state-of-the-art RL algorithms, focusing on large model optimization and alignment techniques.
  • Design and implement RL training pipelines, including environment simulation, data generation, and reward function design.
  • Apply Reinforcement Learning methods to enhance LLM / VLM / Agentic AI capabilities in reasoning, planning, and autonomous decision‑making.
  • Collaborate with Engineers and researchers to integrate RL solutions into enterprise AI platforms.
  • Monitor model performance in production and continuously improve through iterative training and fine‑tuning.

Requirements

  • Master’s Degree in Computer Science, Applied Mathematics, Machine Learning, or related fields.
  • 5+ years of hands‑on experience in RL and (either 1 : LLM / VLM / Agentic AI) optimization.
  • Strong coding skills in Python, with experience in ML frameworks and RL libraries.
  • Experience with large-scale distributed training and optimization.
  • Self‑driven, ownership mindset, and strong problem‑solving skills. Excellent communication skills for cross‑functional collaboration.
  • Why Binance

  • Shape the future with the world’s leading blockchain ecosystem
  • Collaborate with world-class talent in a user‑centric global organization with a flat structure
  • Tackle unique, fast‑paced projects with autonomy in an innovative environment
  • Thrive in a results‑driven workplace with opportunities for career growth and continuous learning
  • Competitive salary and company benefits
  • Work‑from‑home arrangement (the arrangement may vary depending on the work nature of the business team)
  • Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.

    By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.

    #J-18808-Ljbffr

    Create a job alert for this search

    Data Scientist • WorkFromHome, Bali, Indonesia

    Related jobs
    • Promoted
    Data Scientist (Gen-AI Focused)

    Data Scientist (Gen-AI Focused)

    NucleusX B.VDenpasar, Bali, Indonesia
    Data Scientist (Gen-AI Focused, 3–5 Years Experience).We are looking for a Data Scientist with a strong focus on Generative AI (Gen-AI) to join our growing team. You will play a key role in developi...Show moreLast updated: 30+ days ago
    • Promoted
    Technical Implementation Consultant

    Technical Implementation Consultant

    AilyticsWorkFromHome, Provinsi Bali, Indonesia
    Here at Ailytics, we’re building AI solutions to envision a safer world.By combining computer vision and predictive analytics, we enable organizations to proactively identify risks, optimize proces...Show moreLast updated: 3 days ago
    • Promoted
    Implementation Engineer | SEA

    Implementation Engineer | SEA

    black.aiWorkFromHome, Bali, Indonesia
    Customer & Commercial Customer Customer Deployment.We’re pioneering AI technologies that help raise the standard of healthcare for millions of people every day. Well-funded and global, backed by wor...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Intermediate Site Reliability Engineer, Database Operations

    Intermediate Site Reliability Engineer, Database Operations

    GitLabWorkFromHome, Bali, Indonesia
    Intermediate Site Reliability Engineer, Database Operations.Remote, Australia; Remote, Canada; Remote, New Zealand.GitLab is an open‑core software company that develops the most comprehensive AI‑po...Show moreLast updated: 11 hours ago
    • Promoted
    UX Designer - Design systems

    UX Designer - Design systems

    CanonicalWorkFromHome, Bali, Indonesia
    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives such as ...Show moreLast updated: 3 days ago
    SEO Specialist (AI & Automation)

    SEO Specialist (AI & Automation)

    Hire OverseasDenpasar, Denpasar City, ID
    Quick Apply
    AI innovation and workflow automation.You’ll help clients scale visibility and revenue by combining classic SEO fundamentals with automation tools that streamline keyword research, reporting, and t...Show moreLast updated: 7 days ago
    • Promoted
    Kami sedang mencari guru les privat Python di Denpasar

    Kami sedang mencari guru les privat Python di Denpasar

    SuperprofDenpasar, ID
    Superprof adalah sarana untuk berbagi ilmu dan pengetahuan yang menghubungkan mereka yang ingin belajar dan mereka yang ingin mengajar. Superprof didirikan pada bulan Agustus 2013 dan diluncurkan di...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Machine Learning Engineer - EMEA remote

    Cloud Machine Learning Engineer - EMEA remote

    JobgetherWorkFromHome, Bali, Indonesia
    This position is posted by Jobgether on behalf of a partner company.We are currently looking for a Cloud Machine Learning Engineer in EMEA. We are seeking a talented Cloud Machine Learning Engineer ...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Software Engineer, Data Infrastructure & Acquisition - Surabaya, Indonesia Surabaya, Indonesia

    Software Engineer, Data Infrastructure & Acquisition - Surabaya, Indonesia Surabaya, Indonesia

    Speechify, Inc.WorkFromHome, Bali, Indonesia
    Software Engineer, Data Infrastructure & Acquisition - Surabaya, Indonesia.The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’...Show moreLast updated: 11 hours ago
    • Promoted
    Indonesian Language Expert - AI Trainer

    Indonesian Language Expert - AI Trainer

    micro1WorkFromHome, Bali, Indonesia
    Indonesian Language Expert - AI Trainer.Join our customer’s team as an Indonesian Language Expert - AI Trainer and play a crucial role in shaping the next generation of AI language models.You will ...Show moreLast updated: 2 days ago
    Performance Strategist

    Performance Strategist

    Papa in ShapeBali, ID
    Quick Apply
    At Papa In Shape, our mission is to redefine how men and women take care of their health, wellness, and appearance through personalized coaching. To accelerate our growth, we’re looking for a Perfor...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer (Diffusion / Vision)

    Machine Learning Engineer (Diffusion / Vision)

    BjakWorkFromHome, Bali, Indonesia
    Transform Visual Models into Real-World Applications.We’re building AI systems for a global audience.We are living in an era of AI transition - this new project team will be focusing on building ap...Show moreLast updated: 3 days ago
    • Promoted
    Learning AI Specialist

    Learning AI Specialist

    InfluxWorkFromHome, Provinsi Bali, Indonesia
    Get AI-powered advice on this job and more exclusive features.The Learning AI Specialist is responsible for enabling and scaling AI adoption inside the Learning Design team by building light automa...Show moreLast updated: 14 days ago
    • Promoted
    Kami sedang mencari guru les privat UX design di Denpasar

    Kami sedang mencari guru les privat UX design di Denpasar

    SuperprofDenpasar, ID
    Superprof adalah sarana untuk berbagi ilmu dan pengetahuan yang menghubungkan mereka yang ingin belajar dan mereka yang ingin mengajar. Superprof didirikan pada bulan Agustus 2013 dan diluncurkan di...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer (CLI and API Designer)

    Software Engineer (CLI and API Designer)

    H3 Platform Inc.WorkFromHome, Bali, Indonesia
    Software Engineer (CLI & API Designer).Location : 新北市汐止區新台五路一段79號11F之1 (Remote work available).Follow server requirements to develop APIs on PCIe Switch. Implement API based on UI / UX requirements for...Show moreLast updated: 2 days ago
    • Promoted
    IT Operations & Automation Staff

    IT Operations & Automation Staff

    Samara LombokDenpasar, Bali, Indonesia
    We’re looking for an IT Operation Automation Engineer with a hospitality background who can connect daily hotel operations with modern automation tools. Operate and maintain digital systems that sup...Show moreLast updated: 3 days ago
    • Promoted
    Kami sedang mencari guru les privat IELTS di Denpasar

    Kami sedang mencari guru les privat IELTS di Denpasar

    SuperprofDenpasar, ID
    Superprof adalah sarana untuk berbagi ilmu dan pengetahuan yang menghubungkan mereka yang ingin belajar dan mereka yang ingin mengajar. Superprof didirikan pada bulan Agustus 2013 dan diluncurkan di...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Product Manager, Real-Time Data

    Senior Product Manager, Real-Time Data

    DoiT InternationalWorkFromHome, Provinsi Bali, Indonesia
    Our Senior Product Manager will be an integral part of our global Product Managers team.This role is based remotely in the East or Central time zone of the US, in the United Kingdom or Ireland.DoiT...Show moreLast updated: 1 day ago