· AI Engineering, AI in Education · 8 min read
Using AI to Accelerate Intelligence
AI is a transformative technology, let's use it to improve ourselves. AI combined with Knowledge tracing will unlock the next evolution in education.
AI has started revolutionizing almost every sphere of how we interact with technology. We are already operating at frontiers that would have read like pure science fiction just a decade ago, and our reliance on these superpowers is going to affect all of us. Look at software engineering: our concept of what a junior developer is and what they are expected to deliver has drastically changed overnight, and it is likely never going back.
It might even change how we define being “smart” shifting it from an intrinsic trait to how effectively someone can harness AI. But the most profound impact will be personal, specifically in how we learn. Human cognition depends on internalizing new information through active learning and conceptual transfer. Applying that knowledge allows the brain to form connections and chunk related information, enabling the efficient long-term storage and retrieval of mental models—the underlying framework of our intuition. Ultimately, our prior knowledge dictates how easily we can scaffold and acquire new, related concepts. When AI hands answers to us on a plate without requiring real cognitive effort, it bypasses the trials, repetition, and friction that deep learning actually requires. What this shortcut does to our own brains is a giant experiment currently playing out in the real world.
Every student who falls behind carries the weight of every concept they were never quite given the chance to master, and these gaps compound silently. A weak foundation in fractions makes percentages a struggle, which eventually makes understanding compound interest almost impossible. Traditional assessments fail here because they treat the grade of a single paper as the only data point that matters. Over the last two decades, knowledge tracing has emerged to solve this by modeling exactly what a student knows at any given moment. Where standard tests just measure whether a student got a question right, knowledge tracing maps how their understanding of each underlying skill evolves over time and across multiple attempts. It is the difference between seeing “Naledi scored 60%” and knowing “Naledi has a 41% mastery of simplifying fractions, trending downward, with a confidence interval of 73%.” The first just tells you she is scraping by; the second tells you exactly where to step in and fix the foundation.
The Problem With Average
Most educational software shows teachers and parents a flat average score: a class average of 72%, a student averaging 65% across recent quizzes, or a standard pass rate.
Averages are the absolute enemy of effective intervention. A student sitting at a 65% average could be solidly competent in three-quarters of the material while being completely lost in one critical, foundational area. Worse, averages fail to tell you what a student is about to forget. A concept mastered three weeks ago and never reviewed has a cognitive half-life. If you do not know where a student sits on that decay curve for each specific skill, you are planning revision completely blind.
What you actually need is skill-level mastery paired with trend direction and confidence intervals. Knowledge tracing gives us exactly that. Combining these deep analytical insights with AI’s ability to dynamically tailor lesson plans and adapt learning materials to an individual’s style is exactly what I am building at Idhesive. But before digging into how we orchestrate these capabilities, it helps to look at how knowledge tracing works under the hood.
The Main Approaches
Three algorithm families dominate this space: BKT, IRT, and the deep learning variants that have emerged over the last decade. Each comes with its own engineering and product trade-offs between interpretability, data requirements, and predictive power.
Bayesian Knowledge Tracing (BKT)
BKT is the oldest and most widely deployed approach in production educational systems. It is a probabilistic model driven by four core parameters per skill. First is the initial probability the student knows the skill before any attempts. Next is the learn rate, which tracks the probability they master the skill following a correct attempt. To account for noise, the model tracks the guess parameter—the probability of a correct answer by pure chance, which is highly relevant in multiple-choice setups—and the slip parameter, which catches the probability of an incorrect answer despite the student actually knowing the skill.
Each time a student answers a question mapped to a specific skill, BKT updates the probability of mastery by conditioning on their success or failure. Wrong answers only drop the mastery metric slightly if the slip rate is low, while correct answers bump it up significantly if the guess rate is low. The model naturally handles the difference between a careless mistake and genuine confusion.
The real engineering appeal here is that BKT is fast, highly explainable, and works well with relatively few data points. A teacher can look at a student’s skill probability and immediately understand what it means. The downside is that BKT treats every skill as an isolated island. It cannot inherently capture the reality that understanding multiplication makes learning division easier, or that mastering equivalent fractions is a strict prerequisite for comparing them.
Item Response Theory (IRT)
IRT models the relationship between a learner’s underlying ability and their probability of getting a specific question right, but it introduces item-level parameters that basic BKT lacks. The simplest variation, the Rasch model, uses two parameters for every question: difficulty, which defines where on the ability spectrum the question sits, and discrimination, which maps how sharply student ability correlates with a correct answer. High-quality questions discriminate well, whereas trivial or poorly designed questions fail to separate high-ability and low-ability students.
More sophisticated IRT models factor in a guessing parameter or calculate correctness as a direct function of ability minus item difficulty. The output yields a continuous learner ability estimate alongside item difficulty levels, allowing you to select questions perfectly calibrated to a student’s current ceiling.
IRT’s greatest strength is its consistency across different test variants. If you calibrate your item bank properly, an ability score of 1.2 on Form A means the exact same thing as a 1.2 on Form B, which is why it is the standard for high-stakes testing. The tradeoff is that IRT requires massive datasets to reliably estimate item parameters, and most implementations still treat skills independently. Newer cognitive diagnosis frameworks like DINA attempt to model prerequisites as a graph, which aligns much better with actual curriculum design.
Deep Knowledge Tracing (DKT and variants)
DKT arrived around 2014, bringing LSTM recurrent networks into the mix. Where BKT represents skill mastery as a single scalar probability, DKT represents a learner’s knowledge as a dense hidden state vector, trained on their entire interaction history across all skills simultaneously. The model surfaces rich, latent representations. For example, it can deduce that struggling with question 12 strongly predicts failing question 19 because both tap into a hidden conceptual relationship discovered within the data.
Toolkits like PyKT and EduKTM are the dominant open-source options for this family. Both implement DKT and its successors like DKVMN or AKT, and they expect ordered learner interaction sequences containing skill labels and correctness indicators.
The performance appeal is undeniable: DKT consistently outperforms BKT on next-answer prediction, especially for users with extensive interaction histories, capturing subtle patterns that scalar variables miss entirely. The downside is that it is a complete black box. You can query the model’s prediction on whether a student will get the next question right, but you cannot easily ask it why. That is a major hurdle in an educational product where teachers need actionable reasoning, not just a blind number to trust.
Building the Idhesive Pipeline
This brings us to how we actually implement this architecture at Idhesive. Our pipeline starts with canonical events: every single question attempt is logged with its correctness, skill mapping, response timing, and exact sequence position. These events stream through Redpanda to Python workers, where each running knowledge-tracing model operates as an independent consumer group maintaining its own state.
The workers continuously update per-learner skill mastery and append time-series snapshots to a dedicated analytics state database. When our AI quiz planner spins up, it consumes a structured context object containing these exact metrics: mastery probability, confidence intervals, trend direction, and recommended question constraints per skill.
What makes this system highly tractable is keeping the queue architecture clean. Each knowledge-tracing worker is stateless and idempotent; it simply reads canonical events and writes state updates. This makes data replay incredibly straightforward. If we deploy a new model version, it can consume the historical event archive and generate a fresh state without disrupting the live production environment. The workers do not need to know about each other, meaning adding a new model to the pipeline is as simple as spinning up a new consumer group.
Ultimately, the quality of our predictions depends entirely on the fidelity of our data. We capture rich metadata on every attempt—moving beyond binary right-or-wrong metrics to log the normalized response, precise timing, attempt index, and the exact question-to-skill mappings via a structured Q-matrix. The richer the sequence, the better the models perform. This is why a rigid canonical event schema and a durable outbox pattern for reliable replay matter just as much as the actual model choice; low-quality event data limits even the smartest algorithms.