Published on 2014-11-13


  1. What is CBMC?
    CBMC, is a bounded model checker for C/C++ and Java programs.

  2. How does it work?

    • Parse, build CFG (Control Flow Graph)
    • Unwind CFG, form formula
    • Formula is solved by SAT/SMT (SAT is Boolean Satisfiability Problem or Propositional Satisfiability Problem, and SMT refers to Satisfiability Modulo Theories)

    The work flow of CBMC is illustrated as follows.
    Work flow of CBMC

  3. Is there something similar to it?
    There does exist several tools similar to CBMC.

    • CPAchecker, a framework and tool for formal software verification, and program analysis, of C programs.
    • ECLAIR, a commercial static code analysis tool developed by BugSeng, for the automatic analysis, verification, testing and transformation of C and C++ programs.
    • Redlib, a Model-checking/simulation-checking library for real-time system with dense-time models in C with CRD (Clock-Restriction Diagrams) technology. Parametric analysis library for linear-hybrid systems in C with HRD (Hybrid-Restriction Diagram) technology.

OCaml and Compiling


OCaml, originally known as Objective Caml, is the main implementation of Caml language. OCaml unifies functional, imperative, and object-oriented programming under an ML-like system.

Languages derived from OCaml contains F#, MetaOCaml, etc.

About OCaml and Compiling

Note: this section need to be refined.

It seems that there are some advantages in compiling using functional languages.


  1. What is Gentle?
    Gentle is a Compiler Construction System. It is widely used in industry and research. It covers the full spectrum of translation.

    The Gentle Compiler Construction System pioneered the notion of term types, pattern matching, and rule-based programming for compiler construction. Gentle served as model for other successful systems.

    The Gentle system is written in C, and the latest version we could get is Gentle97. There is another version Gentle21 which I can't find.

  2. Gentle Examples

Decision Trees

About Decision Trees

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. It is one of the predictive modelling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a finite set of values are called classification trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.

General Stucture of Algorithms to build a Decision Tree

Hunt's algorithm is the basis of many decision tree algorithms, including ID3, C4.5 and Cart. And the general procedure of Hunt's algorithm is as follows.

Let Dt be the set of training records that reach a node t, and y = { y1, y2, ... , yc } is set of class labels.
(1) If all records in Dt belong to the same class yt, then node t is a leaf node labeled as yt.
(2) If Dt is empty, then node t is a leaf node labeled by the class label in the majority of its parent node.
(3) If Dt contains records that belong to more than one class, then split the data into smaller subsets with an attribute test condition, and then apply this procedure to each subset.
(4) If Dt contains records that belong to more than one class, but all records has the same attribute value except the target attribute, then node t is a leaf node labeled by the class label in majority.

Problems remained in Hunt's algorithm

There does exist some problems in the procedure above.

  • How to split the records?
    In fact this is about how to represent the attribute test condition while splitting, and how to measure its quality.

    • How to represent? Reference
      1. Binary Attributes: attributes whose value is binary, e.g. Gender is a binary attribute whose value only contains Male and Female. Test condition of binary attributes generate two output.
      2. Nominal Attributes: attrbutes which takes nominal values, e.g. ID of different objects. This is quite similar to Binary Attributes.
      3. Ordinal Attributes: this type of attributes allow for rank order by which data can be sorted, but still don't allow for relative degree of difference, e.g. Size of different clothes. The split must guarantee the rank order.
      4. Continuous Attributes: test condition of this type can be comparison-testing as x < a or range-query as a <= x <= b.
    • How to measure? There exists several methods for measuring the quality of split.
      1. Entropy
      2. Gini
      3. Classification error
  • When to stop?
    Stop when a node cannot be splitted might be good, but is there some better stop strategies? Pruning can reduce the computational complexity of building a decision tree.

Implementation of Hunt's algorithm