Here you can find a short description of some research projects I have been working on. See also my full publication list.

A Computational Model for Formative Assessment in Computer Science Education

The concept of formative assessment, originally expressed by educational psychologists in the late sixties, formalizes the idea of testing the knowledge of students in order to personalize instructional intervention and enhance their learning opportunities. This is in contrast to the traditional practice of summative assessment, where the main purpose of testing is to assign grades and rank student performance. Although summative assessment is certainly important for practical reasons, the inclusion of more formative assessment in instructional practice could significantly improve the effectiveness of teaching.

Formative assessment, although very elegant in theory, can be difficult to apply in practice. One of the reasons is that a teacher may need to collect large quantities of data, thoroughly analyze it to discover important trends and patterns, and use the findings to design appropriate teaching responses at the level of individual students and at the scale of the entire classroom. This process may be overwhelming and unrealistically expensive in terms of teacher and class time.

Impressive advances in the fields of Data Mining and Artificial Intelligence, combined with the insights provided by Cognitive and Educational Psychology, make it possible to envision the creation of Intelligent Assessment Systems (IASs), tools that can help teachers collect and analyze data to facilitate formative assessment. Such tools would noninvasively monitor students' learning processes by creating appropriate tests and automatically evaluating them; store, filter, and classify relevant data; and automatically or semi-automatically discover important patterns that are promptly reported to teachers so they can make informed decisions about what to do next. This could be seen as an educational equivalent of the Management Information Systems that are successfully used in business.

In this new project, I am currently developing methods and tools to help instructors, in particular those teaching large undergraduate courses in Computer Science, analyze and discover influential patterns in student grades. For example, if we are able to discover that "not understanding topics X and Y will most likely lead to trouble with topic Z," and at some point during the semester a student is showing these misunderstandings, then we can take appropriate instructional actions before it's too late.

iList: Intelligent Tutoring System for Computer Science

For a hands-on experience with the iList project, you can visit

The main goals of this project are to discover the characteristics that make one-on-one tutoring an effective form of instruction, and to use these features to design and implement effective Intelligent Tutoring Systems (ITSs). In particular, we are interested in the feedback that ITSs can provide to students. To achieve this goal, this project required several tasks: collection and analysis of human tutorial data; definition of computational models of tutorial strategies and feedback; design and implementation of an Intelligent Tutoring System; and deployment and evaluation of the system.

We conducted a study of human tutoring in the domain of Computer Science data structures, to understand which features and strategies of human tutoring are important for learning. We developed an Intelligent Tutoring System, iList, that helps students learn linked lists. One of the main features of iList is a Procedural Knowledge Model that is automatically extracted from previous student data. This model allows iList to provide effective reactive and proactive procedural feedback while a student is solving a problem.

We tested five different versions of iList, differing in the level of feedback they provide, in multiple classrooms, with a total of more than 200 students. The evaluation study showed that iList is as effective as human tutors in helping students learn; students liked working with the system; and the feedback generated by the most sophisticated versions of the system helps keep the students on the right path.

Research methodology

CSCoding, a tool we developed to annotate video-recorded tutoring sessions

Screenshot of iList

Example of Procedural Knowledge Model automatically generated by iList. You can also download a vector representation in SVG

Links - home of iList.

DIAG-NLP: Natural Language Generation for Intelligent Tutoring Systems

Intelligent Tutoring Systems (ITSs) are effective tools that help students learn. We believe that natural language interfaces to ITSs can play an important role in improving the effectiveness of such systems. To investigate that hypothesis, we developed natural language generators that manipulate the feedback provided by Vivids-DIAG, an ITS that helps students learn how to troubleshoot complex mechanical systems. We found that the version of the system that generates language in which the core concepts are aggregated in a principled way engenders more learning. This more effective language is based on a corpus study, in which human tutors interacted with students through the DIAG interface.

The furnace system in the DIAG home heating troubleshooting simulation

Context Sensitive Spell Checking

The spell checkers included in modern word processors do a very good job in finding misspelled words that do not exist in a dictionary, but have a hard time catching typos that result in a word that by chance is present in the vocabulary of the selected language. For example, a traditional English spell checker would not catch the mistake in a sentence like "I saw TREE trees in the park," where "tree" was written when "three" was intended, because "tree" is also a valid word present in an English dictionary. The only way to catch a mistake like that is to take context into account. Our approach uses a statistical model based on mixed trigrams to capture the context of a given word, and uses this model to try to detect and possibly correct a real-word spelling mistake.

Example of misspelling detection process

Ontology Alignment on the Web

Many information systems use taxonomies and ontologies to allow them to make inferences and organize data for better retrieval performance. Since different systems usually have different ontologies, the integration of heterogeneous systems requires that such ontologies be aligned. One example of this problem is the matching of categories used by different web portals to classify web documents. We worked on an approach, based on simple Natural Language Processing techniques, that can be used to automatically align those categories by analyzing the documents associated with them. We tested the approach on a subset of the Google and LookSmart web directories and obtained promising results.

One of the algorithms implemented in our matcher

Ontologies used in our experiments