genomicSimulationC 0.2.6
|
Experimental functions for retroactively calculating number of recombinations. More...
Functions | |
int * | gsc_calculate_min_recombinations_fw1 (gsc_SimData *d, gsc_MapID mapid, char *parent1, unsigned int p1num, char *parent2, unsigned int p2num, char *offspring, int certain) |
Identify markers in the genotype of offspring where recombination from its parents occured. More... | |
int * | gsc_calculate_min_recombinations_fwn (gsc_SimData *d, gsc_MapID mapid, char *parent1, unsigned int p1num, char *parent2, unsigned int p2num, char *offspring, int window_size, int certain) |
Identify markers in the genotype of offspring where recombination from its parents occured, as judged by the marker itself and a short window around it. More... | |
static int | gsc_has_same_alleles (const char *p1, const char *p2, const size_t i) |
Simple operator to determine if at marker i, two genotypes share at least one allele. More... | |
static int | gsc_has_same_alleles_window (const char *g1, const char *g2, const size_t start, const size_t w) |
Simple operator to determine if at markers with indexes i to i+w inclusive, two genotypes share at least one allele. More... | |
int | gsc_calculate_recombinations_from_file (gsc_SimData *d, const char *input_file, const char *output_file, int window_len, int certain) |
Provides guesses as to the location of recombination events that led to the creation of certain genotypes from certain other genotypes. More... | |
Experimental functions for retroactively calculating number of recombinations.
This functionality is for interest only. It is not clear, or tidy, or checked against real data.
int * gsc_calculate_min_recombinations_fw1 | ( | gsc_SimData * | d, |
gsc_MapID | mapid, | ||
char * | parent1, | ||
unsigned int | p1num, | ||
char * | parent2, | ||
unsigned int | p2num, | ||
char * | offspring, | ||
int | certain | ||
) |
Identify markers in the genotype of offspring
where recombination from its parents occured.
This function is a little lower-level (see the kinds of parameters required) and so a wrapper like gsc_calculate_recombinations_from_file is suggested for end users.
The function reads start to end along each chromosome. At each marker, it checks if the alleles the offspring has could only have come from one parent/there is known parentage of that allele. If that is the case, it saves the provided id number of the source parent to the matching position in the result vector. If it is not the case, its behaviour depends on the certain
parameter.
Parents do not have to be directly identified as parents by the pedigree functionality of this library. A sample usage is performing a cross then multiple generations of selfing, then comparing the final inbred to the original two lines of the cross.
d | pointer to the gsc_SimData struct whose genetic map matches the provided genotypes. |
mapid | ID of the map from which to calculate potential historical crossovers, or NO_MAP to use the first-loaded/primary map by default |
parent1 | a character vector containing one parent's alleles at each marker in the gsc_SimData. |
p1num | an integer that will be used to identify areas of the genome that come from the first parent in the returned vector. |
parent2 | a character vector containing the other parent's alleles at each marker in the gsc_SimData. |
p2num | an integer that will be used to identify areas of the genome that come from the second parent in the returned vector. |
offspring | a character vector containing the alleles at each marker in the gsc_SimData of the genotype whose likely recombinations we want to identify. |
certain | a boolean. If TRUE, markers where the parent of origin cannot be identified will be set to 0, if FALSE, the value will be set to the id of the parent that provided the most recently identified allele in that chromosome. |
d->n_markers
containing the id of the parent of origin at each marker in the offspring
genotype. Definition at line 7148 of file sim-operations.c.
int * gsc_calculate_min_recombinations_fwn | ( | gsc_SimData * | d, |
gsc_MapID | mapid, | ||
char * | parent1, | ||
unsigned int | p1num, | ||
char * | parent2, | ||
unsigned int | p2num, | ||
char * | offspring, | ||
int | window_size, | ||
int | certain | ||
) |
Identify markers in the genotype of offspring
where recombination from its parents occured, as judged by the marker itself and a short window around it.
This function is a little lower-level (see the kinds of parameters required) and so a wrapper like gsc_calculate_recombinations_from_file is suggested for end users.
The function reads start to end along each chromosome. At each marker, it checks if the alleles the offspring has in the window centered at that marker could have come from one parent but could not have come from the other/there is known parentage of that allele. If that is the case, it saves the provided id number of the source parent to the matching position in the result vector. If it is not the case, its behaviour depends on the certain
parameter.
Parents do not have to be directly identified as parents by the pedigree functionality of this library. A sample usage is performing a cross then multiple generations of selfing, then comparing the final inbred to the original two lines of the cross.
Behaviour when the window size is not an odd integer has not been tested.
d | pointer to the gsc_SimData struct whose genetic map matches the provided genotypes. |
mapid | ID of the map from which to calculate potential historical crossovers, or NO_MAP to use the first-loaded/primary map by default |
parent1 | a character vector containing one parent's alleles at each marker in the gsc_SimData. |
p1num | an integer that will be used to identify areas of the genome that come from the first parent in the returned vector. |
parent2 | a character vector containing the other parent's alleles at each marker in the gsc_SimData. |
p2num | an integer that will be used to identify areas of the genome that come from the second parent in the returned vector. |
offspring | a character vector containing the alleles at each marker in the gsc_SimData of the genotype whose likely recombinations we want to identify. |
window_size | an odd integer representing the number of markers to check for known parentage around each marker |
certain | a boolean. If TRUE, markers where the parent of origin cannot be identified will be set to 0, if FALSE, the value will be set to the id of the parent that provided the most recently identified allele in that chromosome. |
d->n_markers
containing the id of the parent of origin at each marker in the offspring
genotype. Definition at line 7257 of file sim-operations.c.
int gsc_calculate_recombinations_from_file | ( | gsc_SimData * | d, |
const char * | input_file, | ||
const char * | output_file, | ||
int | window_len, | ||
int | certain | ||
) |
Provides guesses as to the location of recombination events that led to the creation of certain genotypes from certain other genotypes.
The input file (which pairs up which targets and their parents the calculation should be carried out on) should have format:
[target name] [parent1name] [parent2name]
[target name] [parent1name] [parent2name]
...
The tab-separated output file produced by this function will have format:
[marker 1 name] [marker 2 name]...
[target name] [tab-separated recombination vector, containing the index at each marker of the parent the function guesses the target's alleles came from, or 0 if this is unknow]
...
Parents do not have to be directly identified as parents by the pedigree functionality of this library. A sample usage is performing a cross then multiple generations of selfing, then comparing the final inbred to the original two lines of the cross.
d | pointer to the gsc_SimData struct containing the genotypes and map under consideration. |
input_file | string containing the name of the file with the pairs of parents and offsprings of which to calculate recombinations |
output_file | string containing the filename to which to save the results. |
window_len | an odd integer representing the number of markers to check for known parentage around each marker |
certain | TRUE to fill locations where parentage is unknown with 0, FALSE to fill locations where parentage is unknown with the most recent known parent |
Definition at line 7374 of file sim-operations.c.
|
inlinestatic |
Simple operator to determine if at marker i, two genotypes share at least one allele.
Checks only 3 of four possible permutations because assumes there cannot be more than two alleles at a given marker.
p1 | pointer to a character array genotype of the type stored in an gsc_AlleleMatrix (2*n_markers long, representing the two alleles at a marker consecutively) for the first of the genotypes to compare. |
p2 | pointer to a character array genotype for the second of the genotypes to compare. |
i | index of the marker at which to perform the check |
Definition at line 1815 of file sim-operations.h.
|
inlinestatic |
Simple operator to determine if at markers with indexes i to i+w inclusive, two genotypes share at least one allele.
Checks only 3 of four possible permutations at each marker because assumes there cannot be more than two alleles at a given marker. For the return value to be true, there must be at least one match at every one of the markers in the window.
g1 | pointer to a character array genotype of the type stored in an gsc_AlleleMatrix (2*n_markers long, representing the two alleles at a marker consecutively) for the first of the genotypes to compare. |
g2 | pointer to a character array genotype for the second of the genotypes to compare. |
start | index of the first marker in the window over which to perform the check |
w | length of the window over which to perform the check |
Definition at line 1832 of file sim-operations.h.