genomicSimulationC 0.3
Modules | Data Structures | Macros | Typedefs | Enumerations | Functions | Variables
Data Structures

How the simulation stores data. More...

+ Collaboration diagram for Data Structures:

Modules

 Deletor Functions
 For deleting and free associated memory of data structures.
 

Data Structures

struct  gsc_TableSize
 
struct  gsc_MarkerBlocks
 A struct used to store a set of blocks of markers. More...
 
struct  gsc_DecimalMatrix
 A row-major heap matrix that contains floating point numbers. More...
 
struct  gsc_PedigreeID
 A type representing a program-lifetime-unique identifier for a genotype, to be used in tracking pedigree. More...
 
struct  gsc_GroupNum
 A type representing the identifier of a group of genotypes. More...
 
struct  gsc_EffectID
 A type representing a particular loaded set of marker effects. More...
 
struct  gsc_LabelID
 A type representing a particular custom label. More...
 
struct  gsc_MapID
 A type representing a particular loaded recombination map. More...
 
struct  gsc_MultiIDSet
 Simple crate that stores a GroupNum, a MapID, and an EffectID. More...
 
struct  gsc_GenOptions
 A type that contains choices of settings for gsc_SimData functions that create a new gsc_AlleleMatrix/generation. More...
 
struct  gsc_SimpleLinkageGroup
 Parameters for simulating meiosis on a linkage group whose markers are stored contiguously in the simulation. More...
 
struct  gsc_ReorderedLinkageGroup
 Parameters for simulating meiosis on a linkage group whose markers are re-ordered compared to the first recombination map. More...
 
struct  gsc_LinkageGroup
 A generic store for a linkage group, used to simulate meiosis on a certain subset of markers. More...
 
struct  gsc_RecombinationMap
 A type that stores linkage groups and crossover probabilities for simulating meiosis. More...
 
struct  gsc_KnownGenome
 A type that stores the genome structure used in simulation. More...
 
struct  gsc_AlleleMatrix
 
struct  gsc_MarkerEffects
 A type that stores the information needed to calculate breeding values from alleles at markers. More...
 
struct  gsc_SimData
 Composite type that is used to run crossing simulations. More...
 
struct  gsc_TableFileReader
 Stream reader for files of some tabular format. More...
 
struct  gsc_TableFileCell
 Represent a cell read by a gsc_TableFileReader. More...
 
struct  gsc_MapfileUnit
 Unprocessed data for one marker (linkage group and position) loaded from a map file. More...
 
struct  gsc_EffectfileUnit
 Unprocessed data for one marker effect loaded from an effect file. More...
 
struct  gsc_GenotypeFile_MatrixFormat
 Variants in the format of a genotype matrix file. More...
 
struct  gsc_FileFormatSpec
 File format specifier for the genotype input file. More...
 

Macros

#define GSC_CREATE_BUFFER(n, type, length)
 Macro to create a stretchy buffer of any type and some length. More...
 
#define GSC_BUFFER_ISHEAP(n)   n##cap >= sizeof(n##stack)/sizeof(n##stack[0])
 For debugging purposes. More...
 
#define GSC_FINALISE_BUFFER(n, as, nentries)
 Macro to convert a stretchy buffer to a solid heap vector. More...
 
#define GSC_DELETE_BUFFER(n)
 Macro to delete a stretchy buffer. More...
 
#define GSC_STRETCH_BUFFER(n, newlen)
 Macro to expand the capacity of a stretchy buffer. More...
 
#define GSC_NO_PEDIGREE   (gsc_PedigreeID){.id=GSC_NA_ID}
 Empty/null value for pedigree fields. More...
 
#define GSC_NO_GROUP   (gsc_GroupNum){.num=GSC_NA_ID}
 Empty/null value for group allocations. More...
 
#define GSC_NO_EFFECTSET   (gsc_EffectID){.id=GSC_NA_ID}
 Empty/null value for effect set identifiers. More...
 
#define GSC_NO_LABEL   (gsc_LabelID){.id=GSC_NA_ID}
 Empty/null value for custom label identifiers. More...
 
#define GSC_NO_MAP   (gsc_MapID){.id=GSC_NA_ID}
 Empty/null value for recombination map identifiers. More...
 
#define GSC_DETECT_FILE_FORMAT   ((gsc_FileFormatSpec){.filetype=GSC_GENOTYPEFILE_UNKNOWN})
 File format specifier to instruct genomicSimulation loaders to auto-detect all details of the file format. More...
 

Typedefs

typedef struct gsc_AlleleMatrix gsc_AlleleMatrix
 A linked list entry that stores a matrix of alleles for a set of SNP markers and genotypes. More...
 

Enumerations

enum  gsc_TableFileCurrentStatus {
  GSC_TABLEFILE_NEWLINE , GSC_TABLEFILE_COLUMNGAP , GSC_TABLEFILE_CONTENTS , GSC_TABLEFILE_ERROR_EOF ,
  GSC_TABLEFILE_ERROR_EOBUF
}
 Represent possible states of the cursor of a gsc_TableFileReader. More...
 
enum  gsc_GenotypeFileCellStyle {
  GSC_GENOTYPECELLSTYLE_PAIR , GSC_GENOTYPECELLSTYLE_COUNT , GSC_GENOTYPECELLSTYLE_ENCODED , GSC_GENOTYPECELLSTYLE_SLASHPAIR ,
  GSC_GENOTYPECELLSTYLE_UNKNOWN
}
 Represent possible representations of alleles at a marker in a genotype file. More...
 
enum  gsc_GenotypeFileType {
  GSC_GENOTYPEFILE_UNKNOWN , GSC_GENOTYPEFILE_MATRIX , GSC_GENOTYPEFILE_BED , GSC_GENOTYPEFILE_PED ,
  GSC_GENOTYPEFILE_VCF
}
 Enumerate types of genotype files that the simulation knows how to load. More...
 

Functions

gsc_TableFileReader gsc_tablefilereader_create (const char *filename)
 Open a file for reading with gsc_TableFileReader. More...
 
void gsc_tablefilereader_close (gsc_TableFileReader *tbl)
 Close a gsc_TableFileReader's file pointer. More...
 
void gsc_helper_tablefilereader_refill_buffer (gsc_TableFileReader *tbl)
 Read another buffer's worth of characters from a gsc_TableFileReader's file. More...
 
enum gsc_TableFileCurrentStatus gsc_helper_tablefilereader_classify_char (gsc_TableFileReader *tbl)
 Classify the character under the cursor of a TableFileReader as cell contents or otherwise. More...
 
void gsc_tablefilecell_deep_copy (gsc_TableFileCell *c)
 Allocate memory to store a deep copy of a gsc_TableFileCell, if previously only a shallow copy. More...
 
gsc_TableFileCell gsc_tablefilereader_get_next_cell (gsc_TableFileReader *tbl)
 Read forwards in TableFileReader and return the next cell's contents, as well as how many column gaps and newlines preceeded it. More...
 
gsc_FileFormatSpec gsc_define_matrix_format_details (const GSC_LOGICVAL has_header, const GSC_LOGICVAL markers_as_rows, const enum gsc_GenotypeFileCellStyle cell_style)
 Give genomicSimulation hints on the format of a genotype matrix file to be loaded. More...
 

Variables

const gsc_GenOptions GSC_BASIC_OPT
 Default parameter values for GenOptions, to help with quick scripts and prototypes. More...
 

Detailed Description

How the simulation stores data.

genomicSimulation is a state-based package. These data structures are used to store the library's data/state. Many use dynamically allocated memory so call the relevant delete_ function if one exists when you are finished with them.

gsc_SimData is the central state/data storage struct, and so is a required parameter to most user-facing functions. It contains pointers to a gsc_KnownGenome (storing the loaded markers and any recombination maps), a gsc_MarkerEffects (storing the loaded allele effects), and an gsc_AlleleMatrix (storing metadata and genotypes of founders and simulated offspring).

Other structs in this group (gsc_TableSize, gsc_MarkerBlocks, gsc_GenOptions) represented specific types of data and are used as parameters and return value of certain functions.

Macro Definition Documentation

◆ GSC_BUFFER_ISHEAP

#define GSC_BUFFER_ISHEAP (   n)    n##cap >= sizeof(n##stack)/sizeof(n##stack[0])

For debugging purposes.

See also
GSC_CREATE_BUFFER
GSC_STRETCH_BUFFER
GSC_DELETE_BUFFER

Definition at line 448 of file sim-operations.h.

◆ GSC_CREATE_BUFFER

#define GSC_CREATE_BUFFER (   n,
  type,
  length 
)
Value:
type n##stack[sizeof(int)*CONTIG_WIDTH/sizeof(type)]; size_t n##cap = length; \
type* n = (n##cap >= sizeof(n##stack)/sizeof(type)) ? gsc_malloc_wrap(sizeof(type)*n##cap,GSC_TRUE) : n##stack;
static void * gsc_malloc_wrap(const size_t size, char exitonfail)
Replace calls to malloc direct with this function.
@ GSC_TRUE
#define CONTIG_WIDTH

Macro to create a stretchy buffer of any type and some length.

After this macro is run, a buffer with the requested type and capacity will exist in the scope under the requested name.

This macro will also create two helper variables. Their names will be generated based on the name of the buffer:

  • {name}cap will contain the current capacity of the buffer.
  • {name}stack will be a stack array of size CONTIG_WIDTH. If the buffer length is not greater than CONTIG_WIDTH, then {name}stack will point to the same array as the buffer.

Use this buffer only within one local scope. These functions won't work for buffers that have escaped their scope and so left their helper variables behind.

The buffer will be allocated on the stack if its length will not make it exceed CONTIG_WIDTH*sizeof(int) in size in bytes. Otherwise, it will be allocated on the heap. For safety, you should call GSC_DELETE_BUFFER on the buffer once you have finished using it, even if you believe it is small enough to have been allocated on the stack.

See also
GSC_DELETE_BUFFER
GSC_STRETCH_BUFFER
Parameters
nname for the buffer.
typetype of each entry in the buffer (eg int).
lengthnumber of entries the buffer should be able to hold.

Definition at line 438 of file sim-operations.h.

◆ GSC_DELETE_BUFFER

#define GSC_DELETE_BUFFER (   n)
Value:
do { if (n##cap >= sizeof(n##stack)/sizeof(n##stack[0])) { GSC_FREE(n); } \
n = NULL; n##cap = 0; } while (0)
#define GSC_FREE(ptr)

Macro to delete a stretchy buffer.

See also
GSC_CREATE_BUFFER
GSC_STRETCH_BUFFER
GSC_FINALISE_BUFFER

The buffer named {n}, and its assistant variable {n}cap and {n}stack, must exist in the current scope. They would be created by GSC_CREATE_BUFFER

Parameters
nname of the buffer.

Definition at line 477 of file sim-operations.h.

◆ GSC_DETECT_FILE_FORMAT

#define GSC_DETECT_FILE_FORMAT   ((gsc_FileFormatSpec){.filetype=GSC_GENOTYPEFILE_UNKNOWN})

File format specifier to instruct genomicSimulation loaders to auto-detect all details of the file format.

Definition at line 1093 of file sim-operations.h.

◆ GSC_FINALISE_BUFFER

#define GSC_FINALISE_BUFFER (   n,
  as,
  nentries 
)
Value:
do { if (n##cap >= sizeof(n##stack)/sizeof(n##stack[0])) { as = n; } else \
{ size_t len = nentries > n##cap ? n##cap : nentries; as = gsc_malloc_wrap(sizeof(n##stack[0])*len,GSC_TRUE); memcpy(as,n,sizeof(n##stack[0])*len); } } while (0)

Macro to convert a stretchy buffer to a solid heap vector.

See also
GSC_DELETE_BUFFER

The buffer named {n}, and its assistant variable {n}cap and {n}stack, must exist in the current scope. They would be created by GSC_CREATE_BUFFER

This is an alternative to GSC_DELETE_BUFFER, if you want to keep the results.

Parameters
nname of the buffer.
asname of the finalised buffer.
nentriesnumber of entries to copy, if less than buffer capacity

Definition at line 463 of file sim-operations.h.

◆ GSC_NO_EFFECTSET

#define GSC_NO_EFFECTSET   (gsc_EffectID){.id=GSC_NA_ID}

Empty/null value for effect set identifiers.

Has short name: NO_EFFECTSET

Definition at line 590 of file sim-operations.h.

◆ GSC_NO_GROUP

#define GSC_NO_GROUP   (gsc_GroupNum){.num=GSC_NA_ID}

Empty/null value for group allocations.

Has short name: NO_GROUP

Definition at line 578 of file sim-operations.h.

◆ GSC_NO_LABEL

#define GSC_NO_LABEL   (gsc_LabelID){.id=GSC_NA_ID}

Empty/null value for custom label identifiers.

Has short name: NO_LABEL

Definition at line 602 of file sim-operations.h.

◆ GSC_NO_MAP

#define GSC_NO_MAP   (gsc_MapID){.id=GSC_NA_ID}

Empty/null value for recombination map identifiers.

Has short name: NO_MAP

Definition at line 614 of file sim-operations.h.

◆ GSC_NO_PEDIGREE

#define GSC_NO_PEDIGREE   (gsc_PedigreeID){.id=GSC_NA_ID}

Empty/null value for pedigree fields.

Has short name: NO_PEDIGREE

Definition at line 565 of file sim-operations.h.

◆ GSC_STRETCH_BUFFER

#define GSC_STRETCH_BUFFER (   n,
  newlen 
)
Value:
do { \
if (newlen < n##cap) { } \
else if (n##cap >= sizeof(n##stack)/sizeof(n##stack[0])) { \
void* tmp = gsc_malloc_wrap(sizeof(n##stack[0])*newlen,GSC_FALSE); \
if (tmp != NULL) { \
memcpy(tmp,n,sizeof(n[0])*n##cap); \
GSC_FREE(n); n = tmp; n##cap = newlen; }} \
else if (newlen >= sizeof(n##stack)/sizeof(n##stack[0])) { \
n = gsc_malloc_wrap(sizeof(n##stack[0])*newlen,GSC_FALSE); \
if (n != NULL) { \
memcpy(n,n##stack,sizeof(n##stack[0])*n##cap); n##cap = newlen; }} \
else if (newlen < CONTIG_WIDTH) { n##cap = newlen; } \
} while (0)
@ GSC_FALSE

Macro to expand the capacity of a stretchy buffer.

See also
GSC_CREATE_BUFFER
GSC_DELETE_BUFFER

The buffer named {n}, and its assistant variables {n}cap and {n}stack, must exist in the current scope. They would be created by GSC_CREATE_BUFFER

After this macro executes, the buffer named {n} will have the capacity to hold {n}cap entries. Unless memory allocation failed, {n}cap will be greater than or equal to the requested new length. Check the value of {n}cap to check that resizing succeeded.

Parameters
nname of the buffer.
newlenafter execution, the buffer should be able to hold this many entries, unless memory allocation failed (can be checked with n{cap} >= newlen )

Definition at line 498 of file sim-operations.h.

Typedef Documentation

◆ gsc_AlleleMatrix

A linked list entry that stores a matrix of alleles for a set of SNP markers and genotypes.

The simulation stores its genotypes in a list of AlleleMatrix nodes. Each node can store up to CONTIG_WIDTH genotypes.

Has short name: AlleleMatrix

Definition at line 787 of file sim-operations.h.

Enumeration Type Documentation

◆ gsc_GenotypeFileCellStyle

Represent possible representations of alleles at a marker in a genotype file.

Enumerator
GSC_GENOTYPECELLSTYLE_PAIR 
GSC_GENOTYPECELLSTYLE_COUNT 
GSC_GENOTYPECELLSTYLE_ENCODED 
GSC_GENOTYPECELLSTYLE_SLASHPAIR 
GSC_GENOTYPECELLSTYLE_UNKNOWN 

Definition at line 959 of file sim-operations.h.

◆ gsc_GenotypeFileType

Enumerate types of genotype files that the simulation knows how to load.

The format of the file cannot be automatically in all cases. This type exists so that users can specify the format of input files.

Has short name: GenotypeFileType

The format of the file is decoded separately from the format of the alleles of each genotype at each marker. In the templates of each format, "[alleles]" can represent:

  • a pair of ASCII characters (in which case the two characters are interpreted as the two alleles, with their ordering representing their phase),
  • a single character from the standard IUPAC nucleotide encoding (in which case the character is decoded to represent the alleles observed at this marker. The phase of the alleles at this marker is randomly chosen if the genotype is heterozygous at that marker), or
  • a single digit from {0,1,2} (in which case the digit represents the number of copies of the major allele. The phase of the alleles at this marker if the digit is 1 is randomly chosen.)

IUPAC nucleotide encoding: Code => Alleles key: A => AA ; C => CC ; G => GG ; T => TT ; R => AG ; Y => CT ; S => CG ; W => AT ; K => GT ; M => AC

The format of the "[alleles]" cells in a file can be automatically determined out of these options.

Enumerator
GSC_GENOTYPEFILE_UNKNOWN 
GSC_GENOTYPEFILE_MATRIX 

Either a marker-by-line matrix, where each marker is a row, or a line-by-marker matrix, where each marker is a column.

The other axis represents lines/organisms/founders.

(marker-as-row form)

[corner] [line] [line] [line] ... [line]

[marker] [alleles] [alleles] [alleles] ... [alleles]

[marker] [alleles] [alleles] [alleles] ... [alleles]

...

(marker-as-row form)

[corner] [marker] [marker] [marker] ... [marker]

[line] [alleles] [alleles] [alleles] ... [alleles]

[line] [alleles] [alleles] [alleles] ... [alleles]

...

Details

The corner cell may or may not be filled. Its value is ignored.

Any combination of spaces or tabs between non-space/non-tab characters is interpreted as a column separator. Length and order of spaces and tabs do not need to be consistent between column separators in the file.

Any one or two consecutive characters from {'
', '\r'}, in any order, will be interpreted as a single line break.

The default, when no map is present in simulation, is to assume markers are rows in this file. However, if any of the column headers of a matrix file are names of markers being tracked by the simulation, then that file is interpreted as having markers as columns.

GSC_GENOTYPEFILE_BED 
GSC_GENOTYPEFILE_PED 
GSC_GENOTYPEFILE_VCF 

Definition at line 1011 of file sim-operations.h.

◆ gsc_TableFileCurrentStatus

Represent possible states of the cursor of a gsc_TableFileReader.

Enumerator
GSC_TABLEFILE_NEWLINE 
GSC_TABLEFILE_COLUMNGAP 
GSC_TABLEFILE_CONTENTS 
GSC_TABLEFILE_ERROR_EOF 
GSC_TABLEFILE_ERROR_EOBUF 

Definition at line 898 of file sim-operations.h.

Function Documentation

◆ gsc_define_matrix_format_details()

gsc_FileFormatSpec gsc_define_matrix_format_details ( const GSC_LOGICVAL  has_header,
const GSC_LOGICVAL  markers_as_rows,
const enum gsc_GenotypeFileCellStyle  cell_style 
)

Give genomicSimulation hints on the format of a genotype matrix file to be loaded.

Sometimes genomicSimulation's automatic file formatting detection may misinterpret the formatting of a genotype matrix (eg assuming markers are columns, when they are actually rows of the matrix; assuming there is no header row even though there is one; being unable to determine that the body of the matrix are alternate allele counts because there are some confusingly-placed "NA"s). For particularly large files, the file formatting detection process might slow down file imports or require more memory.

To bypass part or all of the formatting detection steps when importing a genotype matrix file, this function can be used to provide the final parameter for

See also
gsc_load_genotypefile or
gsc_load_data_files.
Parameters
has_headerGSC_TRUE if the genotype matrix to be imported definitely has a header row, GSC_FALSE if the genotype matrix has no header row, or some other value (eg GSC_NAGSC_NA) to not bypass the header detection steps of the import process.
markers_as_rowsGSC_TRUE if each row in the genotype matrix represents a genetic marker, GSC_FALSE if each column of the genotype matrix represents a genetic marker, or some other value (eg GSC_NA) to not bypass the orientation detection steps of the import process.
cell_styleThe style in which the alleles of a candidate at a marker are encoded in the body cells of the genotype matrix. Use GSC_GENOTYPECELLSTYLE_UNKNOWN to not bypass the cell style detection step of the import process.
Returns
a structure to pass as the final parameter of
See also
gsc_load_genotypefile or
gsc_load_data_files

Definition at line 7051 of file sim-operations.c.

◆ gsc_helper_tablefilereader_classify_char()

enum gsc_TableFileCurrentStatus gsc_helper_tablefilereader_classify_char ( gsc_TableFileReader tbl)

Classify the character under the cursor of a TableFileReader as cell contents or otherwise.

Does not update tbl->cursor, so repeated calls of this same function without updating the cursor in between will return the same result.

Definition at line 5167 of file sim-operations.c.

+ Here is the caller graph for this function:

◆ gsc_helper_tablefilereader_refill_buffer()

void gsc_helper_tablefilereader_refill_buffer ( gsc_TableFileReader tbl)

Read another buffer's worth of characters from a gsc_TableFileReader's file.

Warning
This overwrites any characters previously saved in the TableFileReader buffer. The pointers of any gsc_TableFileCell read from this table may become invalid if they are shallow copies. If you need to retain any cell values, consider using gsc_tablefilecell_deep_copy() before calling this function.

Definition at line 5153 of file sim-operations.c.

+ Here is the caller graph for this function:

◆ gsc_tablefilecell_deep_copy()

void gsc_tablefilecell_deep_copy ( gsc_TableFileCell c)

Allocate memory to store a deep copy of a gsc_TableFileCell, if previously only a shallow copy.

The deep copy will be a null-terminated string even if the shallow copy was not null-terminated.

After this call, the cell is stored in heap memory and will need to be freed once the cell is no longer needed. Schema for doing this: if (!mycell.isCellShallow) { GSC_FREE(mycell.cell); }

Definition at line 5196 of file sim-operations.c.

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ gsc_tablefilereader_close()

void gsc_tablefilereader_close ( gsc_TableFileReader tbl)

Close a gsc_TableFileReader's file pointer.

Definition at line 5141 of file sim-operations.c.

+ Here is the caller graph for this function:

◆ gsc_tablefilereader_create()

gsc_TableFileReader gsc_tablefilereader_create ( const char *  filename)

Open a file for reading with gsc_TableFileReader.

On successfully opening file, fills the TableFileReader buffer for the first time.

Definition at line 5121 of file sim-operations.c.

+ Here is the caller graph for this function:

◆ gsc_tablefilereader_get_next_cell()

gsc_TableFileCell gsc_tablefilereader_get_next_cell ( gsc_TableFileReader tbl)

Read forwards in TableFileReader and return the next cell's contents, as well as how many column gaps and newlines preceeded it.

Cells can be of unlimited length, as long as they fit in memory.

Definition at line 5211 of file sim-operations.c.

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

Variable Documentation

◆ GSC_BASIC_OPT

const gsc_GenOptions GSC_BASIC_OPT
extern

Default parameter values for GenOptions, to help with quick scripts and prototypes.

Has short name: BASIC_OPT

Definition at line 10 of file sim-operations.c.