This is the README file for Gibraltar v 2.0
Author - Arati Baliga (aratib@cs.rutgers.edu)
         Jeffrey Bickford (jbickfrd@cs.rutgers.edu, jb613w@att.com)

Version 2.0 is a Gibraltar port to the Xen hypervisor architecture.

Table of Contents
-----------------
1. Overview and Component Interaction
2. Directory and File Layout
3. Installation
-----------------

1. Overview and Component Interaction
General Architecture, refer to the paper. Some implementation level information

# CIL module extracts data type definitions and list of global variables from the kernel source code.
  Output files are globavars.txt and typedefs.txt. Two more are output, which are currently unused.
  These are to be put in the input directory


# Type Mapper (maptypes.c) - Does two tasks - 
               (a) Flattens the definitions available in input/typedefs.txt
                   and generates a flattened definition file input/typedefs.gen.
               (b) Generates a memory map for static memory, mapping primitive types. This is 
                   stored in input/memory.map.static.

# Gibraltar requires all the input files mentioned in the above two steps to start training.

-----------------
Gibraltar 2.0 Update:
    
    CIL no longer supports Linux 2.6. To gather type definitions we started with output from CIL.
    This output contained many duplicates and problems when converting to a typedefs.txt file. For
    this reason, we took the basic CIL output and used GDB along with Xen to produce the proper
    typedefs.txt and typedefs.h output Gibraltar requires. Scripts for this are located in the
    gdbtypes folder.

    It is recommended that you use the same kernel version for your testing since the input files
    already exist in the input folder. Kernel version is 2.6.27.5. 
-----------------

# Gibraltar runs in the training mode. Training load has to be started on the target system (guest domain)
    => It collects snapshots and dumps them in directory <snapshots>
    => Snapshots are in an intermediate format, represented by files *.ir.*. 
    => Two types of files are created for each composite data type encountered in the traversal (*.ir.decls, *.ir.dtrace)
    => These intermediate files have to be converted to a Daikon format before running Daikon on them for invariant inference.

# Snapshots Conversion is done with code in directory <daikon-iface>.
    => This code converts intermediate trace files generated during training to final Daikon trace files
    => At this point, one can choose either memory addresses or pathnames to identify a single object 
    => This code also splits objects that are very large across multiple program points, because Daikon cannot
        handle too many variables at a single program point.
   
-----------------
Gibraltar 2.0 Update:

    Compile daikonTrace.c located in daikon-iface: g++ -o genDaikonTrace daikonTrace.c

    USAGE: genDaikonTrace <name of ds | all> <granularity [class|object|sequence]> <num_traces>

       <ds name> If you specify "all", it will convert all data structures from an intermediate format
                 to Daikon format. Alternatively, if you are interested in specific data structures, you
                 can convert only that particular data structure by specifying the name. You can infer
                 the data structure name from the intermediate snapshot file name by dropping the extension.

   <granularity> You can cluster the intermediate files into different Daikon traces. If you are interested in
                 class invariants, at this stage, specify this argument as class. It follows similarly for
                 objects. Sequence is only used for linked list data structures.

    <num_traces> The number of snapshots you have generated during the training phase. 
-----------------
 
# Daikon uses the Daikon trace files to generate an invariant list. 
    => The scripts to run Daikon are available in scripts/rundaikon.pl and scripts/rundaikononll.pl
    => The generated invariant list needs to merge all fields that were split. To do this, run script scripts/merge.pl
    => Put the invariant lists in directory <invariant-list>

-----------------
Gibraltar 2.0 Update:

    * Note: You must install Daikon prior to the above step. The normal version will do. *

    USAGE: rundaikon.pl <input_dir> <outputfile> <number> <rand_prand_prefix> <config_file>
    
    <input_dir> The directory in which the Daikon traces are stored (after the above conversion).
                Place these in a separate directory than the intermediate traces.
 
    <outputfile> Name of Daikon output.
    
    <number> Number of traces.
        
    <rand_prand_prefix> Any number you pick during the experiment. It will be appended to the output file
                        name and the error file name that is generated by Daikon. This was used to run multiple
                        experiments on the same machine.

    <config_file> Can use the default config.
-----------------

# Gibraltar can be run in detection mode once the invariant list is available

2. Directory and File Layout

Current directory
-------------------
gibraltar - the monitor executable.
runcmd - contains a sample command which shows the arguments that should be passed to gibraltar
monitor.h - main include file for the monitor
global.h - global definitions to be included in other .c files as well
gm.h - Header file required for the Myrinet page fetching code (taken directly from firmware code)
gm_new.h - Header file with some definitions created by me, required for the page fetching code
log2.c - Log functions required for the page fetching code
bytes.c - Functions for manipulation of bytes
defs.c - Functions to deal with kernel data definitions
daikon.c - Creates Daikon related declarations and Daikon data types corresponding to C primitive types
inv.c - All functions for loading and checking class/object/list invariants
libgm.a - The GM library file. Needed for page fetching functionality
monitor.c - The main monitor C file. Contains static memory scan and dynamic memory traversal functions.
maptypes.c - Generates the flattened typedefs.gen file from the typedefs.txt file and the static memory map file
preprocess.c - Some helper functions for the monitor
utils.c - Some general purpose functions used throughout

sha1.c, sha1.h - Contains code for calculating the secure hash. Currently unused.
 
Subdirs
--------
<daikon-iface>
    daikonTrace.c - Generates the actual Daikon tracesi (*.decls, *.dtrace) from the intermediate files (*.ir.decls, *.ir.dtrace)
                    generated by Gibraltar.
    newgenDaikonTrace.c - This was a slightly older version of daikonTrace.c. Kept only for reference.

<generator>
    gen_typedef_headers_uniq.c - Code to generate unique definitions from the typedefs.txt file output by CIL module
    typedefs.txt.h - Output of the above code that contains unique definitions and can be used as a .h file
    gen_calc_offsets.c - Creates the generator program, which should be run on the target machine to calculate offsets.
    print_offsets.c - The output of the above code.
    newoffsets.cpp - Used for some sort of temporary fix for the offset issue. Might not be required now. 

<input>
    globvars.txt - Consists of all global and local static variables and their types - output from the CIL module
    typedefs.txt - Consists of all type definitions used in the kernel - output from the CIL module
    typedefs.gen - Flattened type definition file. All nested structs are flattened out
    offsets.txt - Offsets of each field for all type definitions. This corresponds to definitions in the typedefs.txt file
    offsets.gen - Offsets of each field for all type definitions. This corresponds to the definitions in the typedefs.gen file
    System.map - System.map file of the kernel running on the target system
    System.map.types - Types assigned to data structures in the System.map file
    memory.map.static - Static memory map of the target kernel
    static,invariants - List of constant invariants in the static memory of the kernel. 

<loads>
    runtestload.sh - Script that runs the test load on the target
    runtrainingload.sh - Script that runs the training load on the target.

<scripts>
    genfpfile.pl - Script takes the *.alert file generated by the monitor and generates a list of unique false positives.
    merge.pl - Script merges seperate record generated by Daikon output. Daikon cannot handle a large number of fields at a 
                single program point. Therefore, fields have to split across program points for very large structs. The output
                generated by Daikon has to be finally merged before the detection code in the monitor can process it.
    rename.pl - Script used for renaming files
    rundaikon.pl - Scripts to run Daikon on memory snapshots. Outputs an invariant list.
    rundaikononll.pl - Script to run Daikon on memory snapshots of linked lists. These are processed in a slightly different fashion.   

<sensitivity>
    commonobjs.c - Code that was built to check for common invariants across reboots.

<verify>
    check_static_map.c - Code to check for holes in the static memory region.

<cil-modules>
    CIL modules to extract data type definitions and global variables from the kernel source code
    Generates the two files "input/typedefs.txt" and "input/globvars.txt" used by the monitor and maptypes.c

<snapshots>
    Stores the snapshots created during the training period.

<invariant-list> 
    Stores the invariant files to be used by the detection engine

<xen_patches>
    Patches for the Xen hypervisor to support optimizations.

--------


3. Installation

-----------------
Gibraltar 2.0 Update:

Gibraltar now supports the Xen hypervisor. The implementation was on Xen 3.4 with modifications to optimize Gibraltar.
Patches for Xen are located in the folder xen_patches. Gibraltar now supports two modes "normal mode" and "shadow mode".
In shadow mode, instead of monitoring all memory pages which contain kernel data, Gibraltar only monitors pages which
have been modified since the last scan. This "shadow mode" can use two different methods to trigger a memory scan.
Polling mode, scans recently modified pages every "n" number of seconds. Notification mode scans recently modified 
pages after "n" number of pages have changed.

# Target must run on the modified Xen hypervisor.
# System.map file of the target should be available to the monitor program in directory <input>
# Generator program to calculate offsets should be run on the target,

# Gibraltar runs as a standard application in dom0

usage:

    ./gibraltar <domain id> <mem start> <mem end> <msmt_rounds> <mode> <optimization> <time between | num pages> 

            <domain id> - domain to monitor
            <mem start> - memory address to start at
            <mem end>   - memory address to end at                         * Note: start and end can be found from memory.map.static *
            <msmt_rounds> - how many rounds to perform in training mode
            <mode> - 0 for training mode 1 for detection mode
            <optimization> - 0 for normal gibraltar, 1 for polling mode, 2 for page notification mode
            <time between | num pages> - the time between polling scans or used during page notification mode to notify gibraltar when n number of pages have changes

-----------------
