instagraal package¶
Submodules¶
instagraal.cuda_lib_gl_single module¶
-
class
instagraal.cuda_lib_gl_single.
sampler
(use_rippe, S_o_A_frags, collector_id_repeats, frag_dispatcher, id_frag_duplicated, id_frags_blacklisted, n_frags, n_new_frags, init_n_sub_frags, n_new_sub_frags, np_rep_sub_frags_id, sub_sampled_sparse_matrix, np_sub_frags_len_bp, np_sub_frags_id, np_sub_frags_accu, np_sub_frags_2_frags, mean_squared_frags_per_bin, norm_vect_accu, sub_candidates_dup, sub_candidates_output_data, S_o_A_sub_frags, sub_collector_id_repeats, sub_frag_dispatcher, sparse_matrix, mean_value_trans, n_iterations, is_simu, gl_window, pos_vbo, col_vbo, vel, pos, raw_im_init, pbo_im_buffer, gl_size_im)[source]¶ Bases:
object
-
estimate_parameters
(max_dist_kb, size_bin_kb, display_graph)[source]¶ estimation by least square optimization of Rippe parameters on the experimental data :param max_dist_kb: :param size_bin_kb:
-
instagraal.glutil module¶
instagraal.gpustruct module¶
instagraal.init_nuisance module¶
instagraal.instagraal module¶
Large genome reassembly based on Hi-C data.
- Usage:
- instagraal <hic_folder> <reference.fa> [<output_folder>]
- [–level=4] [–cycles=100] [–coverage-std=1] [–neighborhood=5] [–device=0] [–circular] [–bomb] [–save-matrix] [–pyramid-only] [–save-pickle] [–simple] [–quiet] [–debug]
- Options:
-h, --help Display this help message. --version Display the program’s current version. - -l 4, –level 4 Level (resolution) of the contact map.
- Increasing level by one means a threefold smaller resolution but also a threefold faster computation time. [default: 4]
- -n 100, –cycles 100 Number of iterations to perform for each bin.
- (row/column of the contact map). A high number of cycles has diminishing returns but there is a necessary minimum for assembly convergence. [default: 100]
- -c 1, –coverage-std 1 Number of standard deviations below the mean.
- coverage, below which fragments should be filtered out prior to binning. [default: 1]
- -N 5, –neighborhood 5 Number of neighbors to sample for potential
- mutations for each bin. [default: 5]
- –device 0 If multiple graphic cards are available, select
- a specific device (numbered from 0). [default: 0]
-C, --circular Indicates genome is circular. [default: False] -b, --bomb Explode the genome prior to scaffolding. [default: False] --pyramid-only Only build multi-resolution contact maps (pyramids) and don’t do any scaffolding. [default: False] --save-pickle Dump all info from the instaGRAAL run into a pickle. Primarily for development purposes, but also for advanced post hoc introspection. [default: False] --save-matrix Saves a preview of the contact map after each cycle. [default: False] --simple Only perform operations at the edge of the contigs. [default: False] --quiet Only display warnings and errors as outputs. [default: False] --debug Display debug information. For development purposes only. Mutually exclusive with –quiet, and will override it. [default: False]
-
class
instagraal.instagraal.
window
(name, folder_path, fasta, device, level, n_iterations_em, n_iterations_mcmc, is_simu, scrambled, perform_em, use_rippe, gl_size_im, sample_param, thresh_factor, output_folder)[source]¶ Bases:
object
A window displaying the live movie of the calculations performed by the scaffolder.
[description]
Parameters: - name (str) – The name of the project. Will determine the window title.
- folder_path (str or pathlib.Path) – The directory containing the Hi-C conact map.
- fasta (str or pathlib.Path) – The path to the reference genome in FASTA format.
- device (int) – The identifier of the graphic card to be used, numbered from 0. If only one is available, it should be 0.
- level (int) – The level (resolution) at which to perform scaffolding.
- n_iterations_em (int) – The number of EM (expectation maximization) iterations.
- n_iterations_mcmc (int) – The number of MCMC (Markov chain Monte-Carlo) iterations.
- is_simu (bool) – Whether the parameters should be simulated. Mutually exclusive with use_rippe and will override it.
- scrambled (bool) – Whether to scramble the genome.
- perform_em (bool) – Whether to perform EM (expectation maximization).
- use_rippe (bool) – Whether to explicitly use the model from Rippe et al., 2001.
- gl_size_im (int) – The size of the window to be displayed.
- sample_param (bool) – Whether to sample the parameters.
- thresh_factor (float) – The sparsity (coverage) threshold below which fragments are discarded, as a number of standard deviations below the mean.
- output_folder (str or pathlib.Path) – The path to the output folder where the scaffolded genome and other relevant information will be saved.
instagraal.leastsqbound module¶
Constrained multivariate Levenberg-Marquardt optimization
An updated version of this file can be found at https://github.com/jjhelmus/leastsqbound-scipy
The version here has known bugs which have been fixed above, proceed at your own risk.
- Jonathan J. Helmus (jjhelmus@gmail.com)
-
instagraal.leastsqbound.
calc_cov_x
(infodic, p)[source]¶ Calculate cov_x from fjac, ipvt and p as is done in leastsq
-
instagraal.leastsqbound.
external2internal
(xe, bounds)[source]¶ Convert a series of external variables to internal variables
-
instagraal.leastsqbound.
internal2external
(xi, bounds)[source]¶ Convert a series of internal variables to external variables
-
instagraal.leastsqbound.
internal2external_grad
(xi, bounds)[source]¶ Calculate the internal to external gradiant
Calculates the partial of external over internal
-
instagraal.leastsqbound.
leastsqbound
(func, x0, bounds, args=(), **kw)[source]¶ Constrained multivariant Levenberg-Marquard optimization
Minimize the sum of squares of a given function using the Levenberg-Marquard algorithm. Contraints on parameters are inforced using variable transformations as described in the MINUIT User’s Guide by Fred James and Matthias Winkler.
Parameters:
- func functions to call for optimization.
- x0 Starting estimate for the minimization.
- bounds (min,max) pair for each element of x, defining the bounds on
- that parameter. Use None for one of min or max when there is no bound in that direction.
- args Any extra arguments to func are places in this tuple.
Returns: (x,{cov_x,infodict,mesg},ier)
Return is described in the scipy.optimize.leastsq function. x and con_v are corrected to take into account the parameter transformation, infodic is not corrected.
Additional keyword arguments are passed directly to the scipy.optimize.leastsq algorithm.
instagraal.linkage module¶
instagraal.log module¶
Basic logging setup for instaGRAAL.
Logging level can be set by the user and determines the verbosity of the whole program.
instagraal.optim_rippe_curve_update module¶
instagraal.parse_info_frags module¶
instagraal.pyramid_sparse module¶
Pyramid library
Create and handle so-called ‘pyramid’ objects, i.e. a series of decreasing-resolution contact maps in hdf5 format.
-
instagraal.pyramid_sparse.
abs_contact_2_coo_file
(abs_contact_file, coo_file)[source]¶ Convert contact maps between old-style and new-style formats.
A legacy function that converts contact maps from the older GRAAL format to the simpler instaGRAAL format. This is useful with datasets generated by Hi-C box.
Parameters: - abs_contact_file (str, file or pathlib.Path) – The input old-style contact map.
- coo_file (str, file, or pathlib.Path) – The output path to the generated contact map; must be writable.
-
instagraal.pyramid_sparse.
build
(base_folder, size_pyramid, factor, min_bin_per_contig)[source]¶ Build a pyramid of contact maps
Build a fragment pyramid for multi-scale analysis
Parameters: - base_folder (str or pathlib.Path) – Where to create the hdf5 files containing the matrices.
- size_pyramid (int) – How many levels (contact maps of decreasing resolution) to generate.
- factor (int) – Subsampling factor (binning) from one level to the next.
- min_bin_per_contig (int) – The minimum number of bins per contig below which binning shall not be performed.
-
instagraal.pyramid_sparse.
build_and_filter
(base_folder, size_pyramid, factor, thresh_factor=1)[source]¶ Build a filtered pyramid of contact maps
Build a fragment pyramid for multi-scale analysis and remove high sparsity (i.e. low-coverage) and short fragments.
Parameters: - base_folder (str or pathlib.Path) – Where to create the hdf5 files containing the matrices.
- size_pyramid (int) – How many levels (contact maps of decreasing resolution) to generate.
- factor (int) – Subsampling factor (binning) from one level to the next.
- thresh_factor (float, optional) – Number of standard deviations below the mean coverage beyond which lesser covered fragments will be discarded. Default is 1.
Returns: obj_pyramid – The pyramid object containing all the levels.
Return type: Pyramid
-
instagraal.pyramid_sparse.
fill_sparse_pyramid_level
(pyramid_handle, level, contact_file, nfrags)[source]¶ Fill a level with sparse contact map data
Fill values from the simple text matrix file to the hdf5-based pyramid level with contact data.
Parameters: - pyramid_handle (h5py.File) – The hdf5 file handle containing the whole dataset.
- level (int) – The level (resolution) to be filled with contact data.
- contact_file (str, file or pathlib.Path) – The binned contact map file to be converted to hdf5 data.
- nfrags (int) – The number of fragments/bins in that specific level.
-
instagraal.pyramid_sparse.
init_frag_list
(fragment_list, new_frag_list)[source]¶ Adapt the original fragment list to fit the build function requirements
Parameters: - fragment_list (str, file or pathlib.Path) – The input fragment list.
- new_frag_list (str, file or pathlib.Path) – The output fragment list to be written.
Returns: i – The number of records processed this way.
Return type:
-
instagraal.pyramid_sparse.
new_remove_problematic_fragments
(contig_info, fragments_list, abs_fragments_contacts, new_contig_list_file, new_fragments_list_file, new_abs_fragments_contacts_file, pyramid)[source]¶