Bartosz Mikulski - Principal AI Engineer & MLOps Architect

I am an AI Engineer and MLOps Architect focused on building boring, reliable infrastructure for exciting problems.

With over 10 years of experience, I bridge the gap between "it works in a notebook" and "it works for 200 million users." My career has evolved from handling massive data scale at Fandom to architecting MLOps platforms at Qwak, and now modernizing ML infrastructure for high-performance training.

What I bring to the table:

  • Production-Grade GenAI: I don't just prompt models. Instead, I engineer systems. I specialize in building deterministic RAG pipelines and Agentic workflows where reliability and evaluation (MRR, Faithfulness) are baked in from day one.
  • Platform Engineering & MLOps: I have architected platforms that serve millions of requests and handle petabytes of data. I advocate for "Shift Left" data quality, automated infrastructure (IaC), and cost-aware engineering (FinOps).
  • Technical Leadership: I believe a Principal Engineer’s job is to make the team faster. I have trained over 1,000 engineers in workshops on AI/ML to help others avoid the mistakes I've already made.

Core Stack: Python, Scala, Spark, AWS, Kubernetes, Terraform, LangChain, Vector Databases.

Status Update: I am currently transitioning back to full-time engineering and am actively looking for a Principal / Staff AI Engineer role (Remote or Hybrid (EU)). I am interested in teams where I can take ownership of complex AI/data challenges.

Message me on LinkedIn


Selected Case Studies & Architecture

Below are selected projects demonstrating architecture, scalability, and business impact. For technical deep dives, read my blog.

  • Machine Learning Infrastructure Migration

    Leading the strategic migration of the ML training infrastructure to Oracle Cloud (OCI) to overcome critical scalability bottlenecks. The legacy training pipeline, reliant on Jenkins-orchestrated scripts and limited compute resources, necessitated data downsampling, effectively capping model performance. I am currently architecting the transition to a cloud-native environment designed to enable training on full-scale datasets. This initiative involves navigating complex platform constraints to build a robust, high-capacity compute layer that decouples model training from CI/CD tooling.

    Company: Start.io AI Technologies: XGBoost, Dask, Oracle Cloud
  • Document Intelligence Pipeline Rescue

    Stabilized a failing document intelligence pipeline by enforcing deterministic AI outputs. I was brought in to rescue a no-code data extraction workflow that was failing due to non-deterministic LLM outputs and fragile JSON parsing logic. I re-architected the inference layer, replacing the brittle "prompt engineering" components with a dedicated, containerized microservice utilizing BAML for strict schema enforcement and automatic retry logic. This intervention transformed a crashing prototype into a production-grade system with 100% structural correctness and 95% extraction accuracy in under 24 hours.

    Company: Can't disclose AI Technologies: BAML, Mistral 7B, Docker Read case study
  • Retrieval Augmented Generation (RAG) for Customer Support

    Architected a novel RAG system to enhance customer support efficiency by automating solution discovery. My core innovation was engineering a data pipeline that performed targeted information extraction from unstructured emails, creating a high-signal knowledge base that dramatically improved search relevance. To validate system performance, I established a robust, metrics-driven evaluation framework. Retrieval quality was benchmarked using Mean Reciprocal Rank (MRR@3), while the generative component was assessed with a QA process I designed to measure "Faithfulness" and "Semantic Similarity" to source documents. This systematically mitigated model hallucination and ensured all AI-drafted solutions were factually grounded.

    Company: Can't disclose AI Technologies: LangChain, OpenAI API, Chroma
  • Voice of Customer Intelligence Platform

    Architected a "Voice of Customer" intelligence platform to automate quality assurance across a distributed franchise network. I identified that manual review analysis was unscalable and lacked the granularity to distinguish between location-specific operational failures and brand-wide product issues. I engineered a semantic analysis pipeline using Generative AI that mapped unstructured customer feedback onto a rigid, pre-defined taxonomy of operational KPIs. This architectural decision transformed qualitative text into quantitative data, enabling the generation of automated, single-page performance audits that pinpointed specific areas for improvement (e.g., hygiene vs. product quality) for individual franchisees.

    Company: Can't disclose AI Technologies: Generative AI, Python Read case study
  • Machine Learning Platform

    Led the critical migration of the platform's service mesh from a monolithic Envoy configuration to Istio, significantly de-risking the shared compute cluster by isolating service configurations. This move eliminated the risk of single-point-of-failure outages, ensuring high availability for multiple clients in the multi-tenant environment. Additionally, I optimized the core model build pipeline by architecting a dependency caching strategy, cutting build times by over 30% and directly improving developer velocity. Served as an engineering expert for the customer support team, diagnosing and resolving complex, mission-critical build and deployment failures during the on-call duty.

    Company: Qwak AI Technologies: Kubernetes, Istio, BentoML, Airflow, Spark
  • MLOps Platform Architecture

    Architected and led the implementation of a new MLOps platform on AWS, replacing a legacy, monolithic system on Heroku whose size limitations were blocking the adoption of state-of-the-art models (e.g., BERT). My design eliminated a critical source of production incidents by creating atomic, self-contained deployment units with BentoML, which guaranteed the compatibility of models and their required embeddings. I further engineered a configuration-driven deployment system using AWS AppConfig that enabled advanced strategies like canary releases and shadow deployments, allowing the business to expand support from ~7 to 16 language/source combinations. To ensure production stability, I instituted a multi-stage, automated testing gate within the CI/CD pipeline which prevented faulty models from being released, and hardened the system with an automatic rollback mechanism for near-zero deployment risk.

    Company: Riskmethods AI Technologies: AWS Sagemaker, AWS Code Pipeline, AWS Kinesis, Terraform, Tensorflow, PyTorch, MLFlow, BentoML, Snowflake
  • User Activity Reporting at Scale

    Scaled a mission-critical data infrastructure supporting 200 million monthly visitors, processing billions of daily events. I re-engineered high-volume Spark aggregation pipelines to eliminate severe data skew and memory-related worker failures caused by sparse datasets. By designing and implementing a custom two-stage aggregation pattern, I optimized partition distribution and stabilized the platform’s performance during peak traffic. Furthermore, I hardened the near-real-time (NRT) processing layer with an automated monitoring and fault-tolerance system, ensuring the continuous availability of global clickstream data for downstream analytics.

    Company: Fandom AI Technologies: Apache Spark, AWS S3, AWS Kinesis, AWS Lambda, AWS Redshift, AWS EMR, GraphQL
  • FinOps & Data Performance Strategy

    Spearheaded a strategic FinOps initiative to optimize cloud data spend and instill a culture of performance-aware engineering. I identified a pattern of inefficient query patterns driving excessive costs and addressed this by establishing a comprehensive training program and documentation standard (partitioning, projection, join ordering). To overcome a critical observability gap in Amazon Athena, I architected a custom debugging environment using self-hosted Presto on EMR, enabling the team to visualize execution plans for the first time. This initiative successfully reduced the frequency of high-cost query alerts (>100GB scanned) by approximately 90%, transforming a daily operational issue into a rare exception.

    Company: Fandom AI Technologies: Amazon Athena, Presto, AWS EMR
  • Data Pipeline Quality Framework

    Architected a defensive data quality framework to enforce integrity at the point of ingestion, effectively "shifting left" on validation. By implementing declarative quality contracts using AWS Deequ, I established a fail-fast mechanism that prevented latent data corruption from polluting downstream analytics. This solution significantly optimized operational velocity, eliminating costly cascading failures and drastically reducing the turnaround time for complex data backfills and remediation.

    Company: Fandom AI Technologies: Apache Spark, AWS EMR, AWS Deequ, Terraform, AWS CloudWatch
  • Entity Resolution & Master Data System

    Architected an intelligent entity resolution engine to consolidate fragmented supplier data into a "Golden Record." I designed a batch processing system on Snowflake and AWS Batch that identified and merged duplicate business partner entries across disparate datasets. Implemented a Human-in-the-Loop (HITL) feedback mechanism, allowing the system to learn from manual merge decisions and continuously improve the precision of its matching algorithms.

    Company: Riskmethods AI Technologies: Snowflake, AWS Batch, Python
  • Automated Bidding Software

    Architected a high-concurrency revenue-prediction engine to drive automated real-time bidding strategies. I designed and deployed a low-latency Flask-based prediction service that utilized gradient-boosted models (XGBoost) to forecast revenue-per-session in real-time. By integrating these predictive analytics directly into the bidding pipeline, I enabled more granular traffic acquisition decisions, optimizing ad spend based on anticipated lifetime value. Identified and communicated strategic risks regarding model scalability and integration depth, providing critical technical oversight during a pivot phase.

    Company: Pub Ocean AI Technologies: XGBoost, Tensorflow, Flask
  • Content Quality Data Strategy

    Designed the data strategy and observability platform for the Content Quality department, directly influencing business operations. Revealed a critical strategic flaw via spatial analysis: demonstrated that stakeholders were prioritizing low-value inland properties over high-value beach resorts, forcing a department-wide pivot in content acquisition strategy. Built a custom tracking dashboard that became the daily steering tool for business unit leadership.

    Company: HolidayCheck AI Technologies: Spark, Airflow, Tableau
  • AdServer Architecture Refactor

    Re-architected the core AdServing engine to eliminate technical debt and restore developer velocity. I identified a critical bottleneck caused by a "God Class" anti-pattern and a fragile testing strategy that coupled assertions to implementation details, resulting in cascading test failures for every new feature. I executed a strategic refactor, enforcing a Command-Query Separation (CQRS) pattern to decompose monolithic services into focused, cohesive units. Simultaneously, I overhauled the testing paradigm, shifting from brittle interaction mocking to robust behavioral verification, which drastically reduced maintenance overhead and accelerated deployment.

    Company: HolidayCheck AI Technologies: Scala, Akka.HTTP, MongoDB, RabbitMQ

Testimonials

  • Martyna Urbanek-Trzeciak (Product Manager - Data Engineering)

    I worked with Bartosz while he was a member of Data Engineering team at Fandom. He is very professional and open to share his knowledge with his teammates and beyond. His approach was always very data-driven and he has great knowledge in Data Engineering area what made him very valuable partner in discussions.

  • Mariusz Kuriata (Senior Manager of Engineering - Head of Ops)

    It was my pleasure to work with Bartosz. Bartosz is a dedicated and experienced Data Engineer who showed a range of skills and readiness to help. I appreciated that I could count on Bartosz to lead sophisticated technical projects. Highly recommend!

  • Workshop participant

    I'm extremely impressed with Bartosz's expertise and experience. We covered all assignments, addressing various details, scenarios, and potential errors. Every question we asked was answered thoroughly. The workshop format of the sessions and small group activities were particularly enjoyable. We had opportunities to apply our new knowledge practically. The trainer remained accessible whenever questions arose. If any uncertainties emerged, the facilitator explained everything with patience."

  • Workshop participant

    One of the most content-rich lessons I've experienced.

Workshops I teach

  • Building AI-Powered Applications with LangChain

    I teach how to craft effective prompts, and use LangChain to extend model capabilities (connecting to the internet, vector databases, and external REST endpoints). Together we build a memory-aware chatbot, implement retrieval-augmented search over document collections, design multi-step reasoning chains and autonomous agents, and wrap everything with LangSmith for monitoring and logging. By the end, attendees leave with deployable code, a solid understanding of the modern AI stack, and the confidence to embed intelligent functionality into their own products.

    Read more
  • Business Process Automation with No-Code and AI

    Participants learn to map processes visually in n8n, add branching logic and loops, and connect to any SaaS or in-house API. They learn to include AI in the workflows for text summarization, classification, vector search across files and databases, and build tool-using agents that fetch live data or trigger follow-up actions. By the end, they can develop a customer-support chatbot, automate document creation, send contextual Slack or email alerts, and monitor everything from a single, no-code dashboard.

    Read more
  • Retrieval-Augmented Generation (RAG): Building AI Search Systems

    Participants learn how to parse and index documents into vector stores, craft advanced retrieval strategies (semantic search, query expansion, keyword/metadata filters, parent-document and sub-query retrieval), and combine results with reranking for higher relevance. They practice scoring both retrieval and generation with robust metrics, integrate text-to-SQL so LLMs can mine relational databases, and apply guardrails for automatic answer verification to keep responses factual.

    Read more
  • Fine-Tuning Language Models

    Participants learn the whole fine-tuning pipeline: choosing the right approach (full fine‑tune, LoRA, QLoRA), preparing clean training data, and running experiments with Axolotl or HuggingFace Transformers. We cover hyper-parameter choices, cost controls, and automated ways to score the output using larger reference models. By the end, they can ship a custom model, prove its quality with solid metrics, and expose it through a reliable API.

    Read more
  • Prompt Engineering

    Participants learn why models hallucinate, what they can and can't do, and how to craft prompts that steer them: defining clear context, structuring questions, setting explicit output requirements, and iterating systematically. We practice advanced techniques like few-shot examples, chain-of-thought and tree-of-thought prompting, and robust system messages.

    Read more

Conference Talks and Podcasts about AI and MLOps

  • MLOps for the rest of us at Infoshare 2022 (Gdańsk, Poland)

    Shared how my team built a lean MLOps pipeline that let us deploy new machine learning models to production without overengineering or big budgets. I covered real-world lessons from supply chain risk management, including handling word embedding failures, moving to Sagemaker, testing preprocessing, and managing canary releases.

    Read more
  • Data Intensive AI at DataTalks Club hosted by Alexey Grigorev

    We discuss practical strategies for testing data workflow. I explain how data engineering underpins effective AI, from preparing training data to deploying models, and highlight real-world use cases where AI quietly powers better products behind the scenes. We dive into prompt engineering, showing how in-context examples and evaluation datasets drive reliable outputs, and touch on emerging topics like prompt compression and caching.

    Read more

Publications

Other Conference Talks

Other Podcasts

Meetups

How to contact me?

You can find me on social media (links below) or send me an email: blog (here is the "at" symbol) mikulskibartosz.name

Social Media

Subscribe to the newsletter

Privacy Policy

Link to the Privacy Policy page.