hax package¶
Subpackages¶
Submodules¶
hax.data_extractor module¶
Extract peak or hit info from processed root file
-
class
hax.data_extractor.
DataExtractor
[source]¶ Bases:
object
This class is meant for extracting properties that are not on the event level, such as peak or hit properties. For more information, check the docs of DataExtractor.get_data().
-
get_data
(dataset, level='peak', event_fields=['event_number'], peak_fields=['area', 'hit_time_std'], hit_fields=[], event_cuts=[], peak_cuts=[], stop_after=<MagicMock name='mock.inf' id='140355900206352'>, added_branches=[])[source]¶ Extract peak or hit data from a dataset. Peak or hit can be toggled by specifying level = ‘peak’ or level = ‘hit’. Example useage:
- d = DataExtractor.get_data(dataset=run_name,level=’peak’,event_fields = [‘event_number’],
- peak_fields=[‘area’],event_cuts=[‘event_number > 5’, ‘event_number < 10’], peak_cuts=[‘area > 100’, ‘type = “s1”’],stop_after=10000,added_branches= [‘peak.type’])
-
-
hax.data_extractor.
build_cut_string
(cut_list, obj)[source]¶ Build a string of cuts that can be applied using eval() function.
-
hax.data_extractor.
make_branch_selection
(level, event_fields, peak_fields, added_branches)[source]¶ Make the list of branches that have to be selected.
hax.minitrees module¶
Make small flat root trees with one entry per event from the pax root files.
-
class
hax.minitrees.
MultipleRowExtractor
[source]¶ Bases:
hax.minitrees.TreeMaker
Base class for treemakers that return a list of dictionaries in extract_data. These treemakers can produce anywhere from zeroto or many rows for a single event.
If you’re seeing this as the documentation of an actual TreeMaker, somebody forgot to add documentation for their treemaker.
-
class
hax.minitrees.
TreeMaker
[source]¶ Bases:
object
Treemaker base class.
If you’re seeing this as the documentation of an actual TreeMaker, somebody forgot to add documentation for their treemaker.
A treemaker loops the extract_data function over events. This function returns a dictionary. Since dictionaries take a lot of memory, we periodically convert them into pandas dataframes (interval with which this occurs is controlled by the cache_size attribute). At the end of data extraction, the various dataframes are concatenated.
You must instantiate a new treemaker for every extraction.
-
branch_selection
= None¶
-
cache_size
= 5000¶
-
extra_branches
= ()¶
-
extra_metadata
= {}¶
-
mc_data
= False¶
-
never_store
= False¶
-
pax_version_independent
= False¶
-
uses_arrays
= False¶
-
-
hax.minitrees.
check
(run_id, treemaker, force_reload=False)[source]¶ Return if the minitree exists and where it is found / where to make it.
Parameters: - treemaker – treemaker name or class
- run_id – run name or number
- force_reload – ignore available minitrees, just tell me where to write the new one.
Returns: (treemaker, available, path). - treemaker_class: class of the treemaker you named. - already_made is True if there is an up-to-date minitree we can load, False otherwise (always if force_reload) - path is the path to the minitree to load if it is available, otherwise path where we should create the minitree.
-
hax.minitrees.
extend
(data, treemakers)[source]¶ Extends the dataframe data by loading treemakers for the remaining events See https://github.com/XENON1T/hax/pull/52 for more information.
Parameters: - data – dataframe, assumed to be event-per-row
- treemakers – list of treemakers to load
-
hax.minitrees.
force_df_types
(df_content, df_types)[source]¶ Return dataframe with same columns and dtypes as df_types, with content from df_content.
- Extra columns are dropped.
- Missing columns are set to NaN (for floats) or INT_NAN (for integers). Columns that are neither int or float are set to zero (e.g. ‘’ for strings).
- Columns with different types are converted using numpy’s astype. When converting floats to ints, all nonfinite values are replaced with INT_NAN
-
hax.minitrees.
function_over_events
(function, dataframe, branch_selection=None, **kwargs)[source]¶ Generator which yields function(event, **kwargs) of each processed data event in dataframe
-
hax.minitrees.
get_treemaker_name_and_class
(tm)[source]¶ Return (name, class) of treemaker name or class tm
-
hax.minitrees.
load
(datasets=None, treemakers='all', preselection=None, force_reload=False, delayed=False, num_workers=1, compute_options=None, cache_file=None, remake_cache=False, event_list=None)[source]¶ Return pandas DataFrame with minitrees of several datasets and treemakers.
Parameters: - datasets – names or numbers of datasets (without .root) to load
- treemakers – treemaker class (or string with name of class) or list of these to load. If value is set to ‘all’ then the standard science run minitrees are loaded.
- preselection – string or list of strings parseable by pd.eval. Should return bool array, to be used for pre-selecting events to load for each dataset.
- force_reload – if True, will force mini-trees to be re-made whether they are outdated or not.
- delayed – Instead of computing a pandas DataFrame, return a dask DataFrame (default False)
- num_workers – Number of dask workers to use in computation (if delayed=False)
- compute_options – Dictionary of extra options passed to dask.compute
- cache_file – Save/load the result to an hdf5 file with filename specified by cahce_file. Useful if you load in a large volume of data with many preselections.
- remake_cache – If True, and cache file given, reload (don’t remake) minitrees and overwrite the cache file.
- event_list – List of events to process (warning: only makes sense for single dataset)
-
hax.minitrees.
load_cache_file
(cache_file)[source]¶ Load minitree dataframe + cut history from a cache file
-
hax.minitrees.
load_single_dataset
(run_id, treemakers, preselection=None, force_reload=False, event_list=None, bypass_blinding=False)[source]¶ Run multiple treemakers on a single run
Returns: (pandas DataFrame, list of dicts describing cut histories)
Parameters: - run_id – name or number of the run to load
- treemakers – list of treemaker classes / names to load
- preselection – String or list of strings passed to pandas.eval. Should return bool array, to be used for pre-selecting events to load for each dataset. If string does not contain spaces, should be lax lichen name. If string contains a colon and no spaces, should be lichen_file:lichen_name
- force_reload – always remake the minitrees, never load any from disk.
- event_list – List of event numbers to visit. Disables load from / save to file.
Bypass_blinding: Flag to disable blinding cut. WARNING: analysts should not use this, only for production! See #211
-
hax.minitrees.
load_single_minitree
(run_id, treemaker, force_reload=False, return_metadata=False, save_file=None, event_list=None)[source]¶ Return pandas DataFrame resulting from running treemaker on run_id (name or number)
Parameters: - run_id – name or number of the run to load
- treemaker – TreeMaker class or class name (but not TreeMaker instance!) to run
- force_reload – always remake the minitree, never load it from disk.
- return_metadata – instead return (metadata_dict, dataframe)
- save_file – save the results to a minitree file on disk.
- event_list – List of event numbers to visit. Forces save_file=False, force_reload=True.
Returns: pandas.DataFrame
hax.misc module¶
-
hax.misc.
dataframe_to_wiki
(df, float_digits=5, title='Awesome table')[source]¶ Convert a pandas dataframe to a dokuwiki table (which you can copy-paste onto the XENON wiki)
Parameters: - df – dataframe to convert
- float_digits – Round float-ing point values to this number of digits.
- title – title of the table.
hax.paxroot module¶
Utility functions for loading and looping over a pax root file
-
hax.paxroot.
function_results_datasets
(datasets_names, event_function=<function <lambda>>, event_lists=None, branch_selection=None, kwargs=None, desc='')[source]¶ Returns a generator which yields the return values of event_function(event) over the datasets specified in datasets_names.
Parameters: - dataset_names – list of datataset names or numbers, or string/int of a single dataset name/number
- event_function – function to run over each event
- event_lists – a list of event numbers (if you’re loading in a single dataset) to visit, or a list of lists of event numbers for each of the datasets passed in datasets_names.
- branch_selection – can be - None (all branches are read), - ‘basic’ (hax.config[‘basic_branches’] are read), or - a list of branches to read.
- kwargs – dictionary of extra arguments to pass to event_function. For example: kwargs={‘x’: 2, ‘y’: 3} –> function called like: event_function(event, x=2, y=3)
- desc – Description used in the tqdm progressbar
-
hax.paxroot.
get_metadata
(run_id)[source]¶ Returns the metadata dictionary stored in the pax root file for run_id.
-
hax.paxroot.
loop_over_dataset
(*args, **kwargs)¶ Execute a function over all events in the dataset(s) Does not return anything: use function_results_dataset or pass a class method as event_function if you want results. See function_results_datasets for possible options.
hax.pmt_plot module¶
-
hax.pmt_plot.
plot_on_pmt_arrays
(color=None, size=None, geometry='physical', title=None, scatter_kwargs=None, colorbar_kwargs=None)[source]¶ Plot a scatter plot of PMTs in a specified geometry, with a specified color and size of the markers. Color or size must be per-PMT array that is indexable by another array, i.e. must be np.array and not list. scatter_kwargs will be passed to plt.scatter colorbar_kwargs will be passed to plt.colorbar geometry can be ‘physical’, a key from pmt_data, or a 2-tuple of keys from pmt_data.
hax.raw_data module¶
Functions for working with raw data.
-
class
hax.raw_data.
HTTPSClientAuthHandler
(key, cert)[source]¶ Bases:
urllib.request.HTTPSHandler
Used for accessing GRID data and handling authentication
-
hax.raw_data.
cleanup_temporary_data_files
()[source]¶ Removes all temporarily downloaded raw data files. Run automatically for you when your program quits
-
hax.raw_data.
download_from_grid
(file_path_tail)[source]¶ Downloads file_path_tail from grid, returns filename of temporary file
-
hax.raw_data.
inspect_events
(run_id, event_numbers, focus='all', save_to_dir=None, config_override=None)[source]¶ Show the pax event display for the events in run_id,
focus can be ‘all’ (default) which shows the entire event, ‘largest’, ‘first’, ‘main_s1’, or ‘main_s2’
-
hax.raw_data.
inspect_events_from_minitree
(events, *args, **kwargs)[source]¶ Show the pax event display for events, where events is a (slice of) a dataframe loaded from a minitree Any additional arguments will be passed to inspect_events, see its docstring for details
-
hax.raw_data.
inspect_peaks
(run_id, event_numbers, peak_boundaries, save_to_dir=None, config_override=None)[source]¶ Inspect the peaks starting at peak_boundaries (in samples… sorry) in event_numbers. Event numbers and peak_boundaries must be list/arrays of integers of the same length.
-
hax.raw_data.
inspect_peaks_array
(run_id, peak_array, save_to_dir=None, config_override=None)[source]¶ Inspect peaks from a record array returned by hax.DataExtractor
-
hax.raw_data.
process_events
(run_id, event_numbers=None, config_override=None)[source]¶ Yields processed event(s) numbered event_numbers from dataset run_id (name or number) config_override is a dictionary with extra pax options
hax.recorrect module¶
Functions to redo late-stage pax corrections with new maps on existing minitree dataframes
These functions will be slow, since the pax interpolating map was never designed to be quick (vectorized), other processing plugins dominate the run time of pax.
-
hax.recorrect.
add_uncorrected_position
(data)[source]¶ Adds r, theta, u_r, u_x, u_y, u_z to data. If u_x already exists, does nothing. Returns no value. Modifies data in place.
-
hax.recorrect.
recorrect_rz
(data, new_map_file=None)[source]¶ Recompute the (r,z)(r,z) field distortion correction Be sure to redo the S1(x,y,z) correction after this as well, whether or not the S1(x,y,z) map changed!
Parameters: - data – input dataframe
- new_map_file – file with (r,z)(r,z) correction map to use. Defaults to map currently in pax config.
Returns: dataframe with altered values in x, y, z (and few added columns for uncorrected position)
-
hax.recorrect.
recorrect_s1xyz
(data, new_map_file=<MagicMock name='mock.configuration.load_configuration().__getitem__().__getitem__()' id='140355900462928'>)[source]¶ Recompute the S1(x,y,z) light yield correction. If you want to redo (r,z)(r,z), do it before doing this!
Parameters: - data – Dataframe. Only Basics minitree required.
- new_map_name – Filename of map you want to use for the correction.
Returns: Dataframe with changed values in cs1 column
-
hax.recorrect.
recorrect_s2xy
(data, old_map_file='s2_xy_XENON1T_17Feb2017.json', new_map_file=<MagicMock name='mock.configuration.load_configuration().__getitem__().__getitem__()' id='140355900462928'>)[source]¶ Recompute the (x,y) correction for a different map :param data: dataframe (Basics and Extended minitrees required) :param old_map_file: Map filename that was used to process the dataframe. Defaults to the map used for 6.4.2 :param new_map_file: Map filename that you want to use for the correction. Defaults to the pax config default. :return: dataframe with altered value in cS2 (and few added columns for uncorrected position)
TODO: This could be rewritten to use the extended minitrees, so the old map no longer needs to be specified.
hax.runs module¶
Runs database utilities
Return how often each tag occurs in the datasets DataFrame ds
-
hax.runs.
get_dataset_info
(run_id, projection_query=None)¶ Returns a dictionary with the runs database info for a given run_id. For XENON1T, this queries the runs db to get the complete run doc.
Parameters: - run_id – name or number, or list of such, of runs to query. If giving a list, it must be sorted!
- projection_query – can be - None (default): the entire run doc will be returned - string: runs db field name (with dots indicating subfields), we’ll query and return only that field. - anything else: passed as projection to pymongo.collection.find
For example ‘processor.DEFAULT.electron_lifetime_liquid’ returns the electron lifetime.
-
hax.runs.
get_run_info
(run_id, projection_query=None)[source]¶ Returns a dictionary with the runs database info for a given run_id. For XENON1T, this queries the runs db to get the complete run doc.
Parameters: - run_id – name or number, or list of such, of runs to query. If giving a list, it must be sorted!
- projection_query – can be - None (default): the entire run doc will be returned - string: runs db field name (with dots indicating subfields), we’ll query and return only that field. - anything else: passed as projection to pymongo.collection.find
For example ‘processor.DEFAULT.electron_lifetime_liquid’ returns the electron lifetime.
-
hax.runs.
get_run_name
(run_id)[source]¶ Return run name matching run_id. Returns run_id if run_id is string (presumably already run name)
-
hax.runs.
get_run_number
(run_id)[source]¶ Return run number matching run_id. Returns run_id if run_id is int (presumably already run int)
-
hax.runs.
get_rundb_collection
()[source]¶ Return the pymongo handle to the runs db collection. You can use this to do queries like .find etc.
-
hax.runs.
get_rundb_database
()[source]¶ Return the pymongo handle to the runs db database. You can use this to access other collections.
-
hax.runs.
load_corrections
()[source]¶ Load all corrections that are stored on MongoDB as defined by the corrections field in the hax config. Corrections must be named the same as their collection name in the database ‘run’.
Return runs by tag selection criteria.
Parameters: - dsets – pandas DataFrame, subset of datasets from hax.runs.datasets. If not provided, uses hax.runs.datasets itself (all datasets).
- include – String or list of strings of patterns of tags to include
- exclude – String or list of strings of patterns of tags to exclude. Exclusion criteria have higher priority than inclusion criteria.
- pattern_type – Type of pattern matching to use. Defaults to ‘fnmatch’, which means you can use unix shell-style wildcards (?, *). Alternative is ‘re’, which means you can use full python regular expressions.
- ignore_underscore – Ignore the underscore at the start of some tags (indicating some degree of officialness or automation) when matching.
- Examples:
- tags_selection(include=’blinded’) select all datasets with a blinded or _blinded tag.
- tags_selection(include=’*blinded’) … with blinded or _blinded, unblinded, blablinded, etc.
- tags_selection(include=[‘blinded’, ‘unblinded’]) … with blinded OR unblinded, but not blablinded.
- tags_selection(include=’blinded’, exclude=[‘bad’, ‘messy’]) select blinded dsatasets
- that aren’t bad or messy
-
hax.runs.
update_datasets
(query=None)[source]¶ Update hax.runs.datasets to contain latest datasets. Currently just loads XENON100 run 10 runs from a csv file. query: custom query, in case you only want to update partially??
hax.slow_control module¶
-
hax.slow_control.
get
(names, run=None, start=None, end=None, url=None)¶ Retrieve the data from the historian database (hax.slow_control.get is just a synonym of this function)
Parameters: - names – name or list of names of slow control variables; see get_historian_name.
- run – run number/name to return data for. If passed, start/end is ignored.
- start – String indicating start of time range, in arbitrary format (thanks to parsedatetime)
- end – String indicating end of time range, in arbitrary format
Returns: pandas Series of the values, with index the time in UTC. If you requested multiple names, pandas DataFrame
-
hax.slow_control.
get_pmt_data_last_measured
(run)[source]¶ Retrieve PMT information for a run from the historian database
Parameters: run – run number/name to return data for. Returns: pandas DataFrame of the values, with index the time in UTC.
-
hax.slow_control.
get_sc_data
(names, run=None, start=None, end=None, url=None)[source]¶ Retrieve the data from the historian database (hax.slow_control.get is just a synonym of this function)
Parameters: - names – name or list of names of slow control variables; see get_historian_name.
- run – run number/name to return data for. If passed, start/end is ignored.
- start – String indicating start of time range, in arbitrary format (thanks to parsedatetime)
- end – String indicating end of time range, in arbitrary format
Returns: pandas Series of the values, with index the time in UTC. If you requested multiple names, pandas DataFrame
hax.trigger_data module¶
-
hax.trigger_data.
get_aqm_pulses
(run_id)[source]¶ Return a dictionary of acquisition monitor pulse times in the run run_id. keys are channel labels (e.g. muon_veto_trigger). Under the keys ‘busy’ and ‘hev’, you’ll get the sorted combination of all busy/hev _on and _off signals.
-
hax.trigger_data.
get_trigger_data
(run_id, select_data_types='all', format_version=2)[source]¶ Return dictionary with the trigger data from run_id select_data_types can be ‘all’, a trigger data type name, or a list of trigger data type names. If you want to find out which data types exists, use ‘all’ and look at the keys of the dictionary.
hax.utils module¶
Utilities for use INSIDE hax (and perhaps random weird use outside hax) If you have a nice function that doesn’t fit anywhere, misc.py is where you want to go
-
hax.utils.
combine_pax_configs
(config, overrides)[source]¶ Combines configuration dictionaries config and overrides. overrides has higher priotity. each config must be a dictionary containing only string->dict pairs (like the pax/ConfigParser configs)
-
hax.utils.
find_file_in_folders
(filename, folders)[source]¶ Searches for filename in folders, then return full path or raise FileNotFoundError Does not recurse into subdirectories
-
hax.utils.
flatten_dict
(d, separator=':', _parent_key='')[source]¶ Flatten nested dictionaries into a single dictionary, indicating levels by separator Don’t set _parent_key argument, this is used for recursive calls. Stolen from http://stackoverflow.com/questions/6027558
-
hax.utils.
get_user_id
()[source]¶ Returns: string identifying the currently active system user as name@node Note: user can be set with the ‘USER’ environment variable, usually set on windows Note: on unix based systems you can use the password database to get the login name of the effective process user
-
hax.utils.
get_xenon100_dataset_number
(dsetname)[source]¶ Converts a XENON100 dataset name to a number
-
hax.utils.
human_to_utc_datetime
(x)[source]¶ Return a python UTC-localized datetime object corresponding to the human-readable date/time x :param x: string with a human-readable date/time indication (e.g. “now”). If you specify something absolute, it will
be taken as UTC.
-
hax.utils.
load_pickles
(filename, load_first=None)[source]¶ Returns list of pickles stored in filename. :param load_first: number of pickles to read. Otherwise reads until file is exhausted