shnitsel.clean.common#

Attributes#

Functions#

true_upto(mask, dim)

Helper function to assess whether a mask has only true entries up until a certain point

_filter_mask_from_criterion_mask(mask)

Generate cutoff array from the mask, specifying for each criterion, up to which point

_filter_mask_from_filtranda(filtranda)

Calculates first a filter mask and then the cutoffs from that mask using

_filter_mask_from_dataset(ds)

Returns a da containing cutoff times (the same as the good_upto data_var)

omit(frames_or_trajectory)

If all filter criteria are fulfilled throughout, keep the trajectory.

_log_omit(before, after)

truncate(frames_or_trajectory)

Perform a truncation on the trajectory or frameset, i.e. cut off the trajectory

transect(trajectory, cutoff_time)

Perform a transect, i.e. cut off the trajetory at time cutoff_time if it is valid until then

dispatch_filter(frames_or_trajectory[, filter_method])

Filter trajectories according to energy to exclude unphysical (insane) behaviour

cum_max_quantiles(filtranda_array[, quantiles, ...])

Quantiles of cumulative maxima

Module Contents#

true_upto(mask, dim)#

Helper function to assess whether a mask has only true entries up until a certain point along dimension dim. Used to check if criterion validity is maintained along the time dimension.

Returns array with values of dim coordinate up to which the values are all true or -np.inf if no frame is valid.

Parameters:
  • mask (xr.DataArray) – The mask holding boolean flags whether criteria are fulfilled.

  • dim (str) – The dimension along which to check continuous validity of criteria.

Returns:

The point in time up to which the criterion is fulfilled.

Return type:

xr.DataArray

_true_upto#
_filter_mask_from_criterion_mask(mask)#

Generate cutoff array from the mask, specifying for each criterion, up to which point the criterion is fulfilled.

Either holds a boolean filter_mask or a time values variable good_upto depending on whether a time dimension/coordinate is present.

Parameters:

mask (xr.DataArray) – The xarray holding the boolean flags whether a frame contains valid data for various criteria

Returns:

With name filter_mask which holds true boolean flags, whether a frame should be kept according to the respective criterion. If a time dimension is present, also holds a good_upto coordinate, which maps criteria to the time value at which the criterion is last fulfilled. If the time dimension is missing, will just have boolean flags. Also has a coordinate good_throughout, which indicates, whether the entire trajectory/frameset satisfies the criterion.

Return type:

xr.DataArray

_filter_mask_from_filtranda(filtranda)#

Calculates first a filter mask and then the cutoffs from that mask using _cutoffs_from_mask

Parameters:

filtranda (xarray.DataArray)

Return type:

xarray.DataArray

_filter_mask_from_dataset(ds)#

Returns a da containing cutoff times (the same as the good_upto data_var) and with a coord called good_throughout

The returned object has dimension {‘criterion’}.

Parameters:

ds (xr.Dataset) –

A Dataset containing either:
  • a ‘good_upto’ data_var and a ‘good_throughout’ coordinate

  • a ‘filtranda’ data_var with a ‘threshold’ coordinate

Returns:

  • A DataArray containing cutoff times (the same as the good_upto data_var)

  • and with a coord called good_throughout

  • The returned object has dimension {‘criterion’}.

Raises:

ValueError – If there is no filtration information in the Dataset

Return type:

xarray.DataArray

TrajectoryOrFrames#
omit(frames_or_trajectory)#

If all filter criteria are fulfilled throughout, keep the trajectory. Otherwise return None to omit it.

Parameters:

frames_or_trajectory (Frames | Trajectory) – Either the Frameset or the trajectory to filter

Returns:

The Frameset or Trajectory if all filter conditions are fulfilled or None if it should be omitted.

Return type:

Frames | Trajectory | None

_omit#
_log_omit(before, after)#
truncate(frames_or_trajectory)#

Perform a truncation on the trajectory or frameset, i.e. cut off the trajectory after the last frame that fulfils all filtration conditions.

Parameters:

frames_or_trajectory (TrajectoryOrFrames | xr.Dataset) – The dataset to truncate

Returns:

The truncated dataset.

Return type:

TrajectoryOrFrames | Trajectory | Frames

_truncate#
transect(trajectory, cutoff_time)#

Perform a transect, i.e. cut off the trajetory at time cutoff_time if it is valid until then or omit it, if it is not valid for long enough.

Trajectory must be a trajectory with time dimension.

Parameters:
  • trajectory (Trajectory | xr.Dataset) – The trajectory to transect

  • cutoff_time (float) – Time at which the trajectory should be cut off or discarded entirely if conditions are not satisfied until this time.

Returns:

Either the filtered trajectory with all frames being valid up until cutoff_time or None if the trajectory is not valid for long enough.

Return type:

Trajectory | None

_transect#
dispatch_filter(frames_or_trajectory, filter_method='truncate')#

Filter trajectories according to energy to exclude unphysical (insane) behaviour

Parameters:
  • frames_or_trajectory (TrajectoryOrFrames) – A Frames or Trajectory object with a filtranda variable set and a thresholds coordinate both along a criterion dimension.

  • filter_method (Literal['truncate', 'omit', 'annotate'] | float) –

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

  • optional

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

Return type:

The modified dataset with either data violating the

Raises:

ValueError – If an unsupported value for the cut parameter was provided.

cum_max_quantiles(filtranda_array, quantiles=None, cum_dim='time', group_dim='trajectory')#

Quantiles of cumulative maxima

Parameters:
  • filtranda_array (xr.DataArray) – A DataArray, or a Dataset with a data_var ‘filtranda’; either way, the Variable should have dimensions and coordinates corresponding to a (stacked or unstacked) ensemble of trajectories.

  • quantiles (Sequence[float], optional) – Which quantiles to calculate, by default [0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 1]

  • cum_dim (DimName, optional) – The dimension along which to accumulate the maxima, by default time

  • group_dim (DimName, optional) – The key/dimension along which to calculate the quantiles of the maxima, by default atrajectory.

Returns:

A DataArray with ‘quantile’ and ‘cum_dim’ dimensions; ‘group_dim’ dimension will have been removed to calculate quantiles; other dimensions remain unaffected.

Return type:

xr.DataArray