xyt package

Submodules

xyt.fake_data_generator module

class xyt.fake_data_generator.FakeDataGenerator(projection_epsg=4326, location_name=None, home_radius_km=20, num_users=15)[source]

Bases: object

A class for generating fake location-based data for testing and demonstration purposes.

Attributes:
  • projection_epsg (int): The EPSG code for the desired projection (default is 4326).

  • location_name (str): The name of a location for which to generate data. If provided, the generated data will be constrained within the bounding box of this location.

  • home_radius_km (int): The radius in kilometers within which home locations for different groups should be generated.

Example usage:
generator = FakeDataGenerator(location_name="Switzerland", num_users=15)
df_legs = generator.generate_legs(num_rows=10)
df_staypoints = generator.generate_staypoints(num_rows=10)
df_waypoints = generator.generate_waypoints(num_rows=10)
generate_dataframe(num_rows, trace_type)[source]

Generate a GeoDataFrame with synthetic location-based data for multiple groups.

Args:
  • num_rows (int): The number of data rows to generate for each group.

  • trace_type (str): The type of data to generate (‘Staypoint’ or ‘Leg’).

Returns:
  • GeoDataFrame: A GeoDataFrame containing the generated location-based data.

generate_home_location()[source]

Generate a random home location point.

Returns:
  • Point: A Shapely Point object representing the home location.

generate_intermediate_points(point_start, point_end, num_points, noise_range=(0.0001, 0.005))[source]

Generate intermediate points between two given points.

Args:
  • point_start (Point): The starting point.

  • point_end (Point): The ending point.

  • num_points (int): The number of intermediate points to generate.

  • noise_range (tuple, optional): Range for generating random noise (default is (0.0001, 0.005)).

Returns:
  • list: A list of Shapely Point objects representing intermediate points.

generate_legs(num_rows=15)[source]

Generate synthetic location-based data representing movement between locations (legs) for multiple users.

Args:
  • num_rows (int, optional): The number of data rows to generate for each user (default is 15).

Returns:
  • GeoDataFrame: A GeoDataFrame containing the generated legs data.

generate_random_point_in_location(home_location=None)[source]

Generate a random point within the specified location.

Args:
  • home_location (Point, optional): The home location within which to generate the random point.

Returns:
  • Point: A Shapely Point object representing the random point within the location.

generate_staypoints(num_rows=15)[source]

Generate synthetic location-based data representing user staypoints at various locations.

Args:
  • num_rows (int, optional): The number of data rows to generate for each user (default is 15).

Returns:
  • GeoDataFrame: A GeoDataFrame containing the generated staypoints data.

generate_waypoints(num_rows=15, num_extra_od_points=10, max_displacement_meters=20)[source]

Generate synthetic location-based data representing waypoints with timestamps and accuracy for multiple users.

Args:
  • num_rows (int, optional): The number of data rows to generate for each user (default is 15).

  • num_extra_od_points (int, optional): The number of extra points to add at the beginning and end of each linestring (default is 2).

  • max_displacement_meters (float, optional): The maximum displacement in meters for the extra points (default is 10 meters).

Returns:
  • GeoDataFrame: A GeoDataFrame containing the generated waypoints data.

get_bounding_box(location_name)[source]

Retrieve the bounding box coordinates for a specified location.

Args:
  • location_name (str): The name of the location for which to retrieve the bounding box.

Returns:
  • list: A list of bounding box coordinates [south, north, west, east].

xyt.gps_analytics module

class xyt.gps_analytics.GPSAnalytics[source]

Bases: object

Performs analytics on GPS data, including splitting activities over midnight, spatial clustering, deriving metrics, and computing daily statistics.

This class offers methods to manipulate GPS data stored in DataFrames. It provides functionalities to split activities that span midnight into two, perform spatial clustering using DBSCAN, derive various metrics related to trips and activities, and compute daily descriptive statistics.

get_daily_metrics(ext_staypoint: DataFrame) DataFrame[source]

Constructs a matrix of daily descriptive statistics.

Args:
  • ext_staypoint: DataFrame with relevant columns. Columns: ‘started_at’, ‘finished_at’, ‘user_id’, ‘duration’, ‘cluster’, ‘cluster_size’, ‘cluster_info’, ‘location_id’, ‘peak’, ‘first_dep’, ‘last_arr’, ‘home_loop’, ‘daily_trip_dist’, ‘num_trip’, ‘max_dist’, ‘min_dist’, ‘max_dist_from_home’, ‘dist_from_home’, ‘home_location_id’, ‘weekday’

Returns:
  • DataFrame with computed daily profiles.

get_metrics(staypoint: DataFrame, leg: DataFrame) DataFrame[source]

Computes additional variables and metrics.

Args:
  • staypoint: DataFrame with activity data. Columns: ‘activity_id’, ‘started_at’, ‘finished_at’, ‘purpose’, ‘user_id’, ‘lon’, ‘lat’

  • leg: DataFrame with leg data. Columns: ‘leg_id’, ‘started_at’, ‘finished_at’, ‘detected_mode’, ‘mode’, ‘user_id’, ‘geometry’, ‘next_activity_id’, ‘length’, ‘duration’

Returns:
  • DataFrame with computed metrics.

spatial_clustering(staypoint: DataFrame) DataFrame[source]

Aggregates locations of most visited places.

Args:
  • staypoint: DataFrame containing activity locations. Columns: ‘activity_id’, ‘started_at’, ‘finished_at’, ‘purpose’, ‘user_id’, ‘lon’, ‘lat’

Returns:
  • DataFrame with clustered locations and labels.

split_overnight(staypoint: DataFrame, time_columns=['started_at', 'finished_at']) DataFrame[source]

Splits activities going over midnight into two activities.

Description:
  1. Split activities that go over midnight into two activities

  2. Allocate the same geolocation and activity purpose to the splitted activity

  3. Compute the duration of the splitted activities

Args:
  • staypoint: DataFrame containing activities to split. Columns: ‘activity_id’, ‘started_at’, ‘finished_at’, ‘purpose’, ‘user_id’, ‘lon’, ‘lat’

  • time_columns: Columns for activity start and end times.

Returns:
  • DataFrame with split activities and new ‘duration’ column.

static verify_columns(df: DataFrame, expected_columns: list[str]) None[source]

Verifies if expected columns are present in the DataFrame.

Args:
  • df: DataFrame to check columns in.

  • expected_columns: List of expected column names.

xyt.gps_data_privacy module

class xyt.gps_data_privacy.GPSDataPrivacy[source]

Bases: object

A class for obfuscating and aggregating GPS data to enhance privacy.

This class provides methods to obfuscate GPS data by adding noise, shifting coordinates, and aggregating data within defined cell centers. It also calculates utility metrics for evaluating the impact of obfuscation on the data.

aggregate(waypoints_df, cell_size=0.2, delta=datetime.timedelta(seconds=900)) DataFrame[source]

Aggregates users in timedeltas and cells on the map. Returns a DataFrame with the count of users in a given timedelta and cell.

Args:
  • waypoints_df (pandas.DataFrame): DataFrame with ‘latitude’, ‘longitude’, ‘user_id’ and ‘tracked_at’ columns.

  • cell_size (float): Size of the square cells on the map, in kilometers.

  • delta (datetime.timedelta): Frequency for the time aggregation, e.g. 15 minutes.

Returns:
  • pandas.DataFrame: DataFrame with ‘tracked_at’, ‘cell_latitude’, ‘cell_longitude’ and ‘count’ columns.

    The ‘cell_latitude’ and ‘cell_longitude’ columns give coordinates of the centers of cells on the map.

get_obfuscation_utility(w_prepared, w_obfuscated, legs) float[source]

Calculates the ratio of legs affected by obfuscation to total legs.

Args:
  • w_prepared (pandas.DataFrame): Smoothed and cleaned waypoints DataFrame.

  • w_obfuscated (pandas.DataFrame): Smoothed, cleaned, and obfuscated waypoints DataFrame.

  • legs (pandas.DataFrame): DataFrame that contains assembled legs.

Returns:
  • [float]: Ratio of legs affected by obfuscation to total legs.

obfuscate(df, locations, radius=100, offset=30, mode='remove') -> (<class 'pandas.core.frame.DataFrame'>, list[tuple[float, float]])[source]

Obfuscates the regions of points given in ‘locations’ parameter by either removing all the points in their proximity or changing the location of these points to one noisy location in the proximity circle.

Args:
  • df (pandas.DataFrame): DataFrame with ‘latitude’ and ‘longitude’ columns.

  • locations (list): List of locations given as (latitude, longitude) tuples.

  • radius (int, optional): Radius of the obfuscation circle. Defaults to 100.

  • offset (int, optional): Smallest distance from the perimeter of the obfuscation circle at which the location of interest must be located. Defaults to 30.

  • mode (str, optional): Obfuscation mode, can be either ‘remove’ or ‘assign’. Defaults to ‘remove’.

Returns:
  • pandas.DataFrame: DataFrame with obfuscated regions.

xyt.gps_data_processor module

class xyt.gps_data_processor.GPSDataProcessor(radius=0.5, min_samples=5, time_gap=850, speed_th=2.78, acceleration_th=0.5, minimal_walking_duration=100, minimal_trip_duration=120, use_multiprocessing=False)[source]

Bases: object

GPSDataProcessor is a class designed for processing GPS data, including mode detection and leg unification.

This class provides methods for detecting user activities and modes of transportation in GPS data, as well as unifying individual legs and activities into meaningful segments.

Attributes:
  • speed_th (float): A threshold for detecting walks based on speed.

  • acceleration_th (float): A threshold for detecting walks based on acceleration.

  • minimal_walking_duration (float): Minimum duration required to classify a segment as a walk.

  • minimal_trip_duration (float): Minimum duration required for a trip.

  • use_multiprocessing (bool): Flag indicating whether to use multiple processes for parallel processing.

Example Usage:
# Initialize GPSDataProcessor with custom Args
processor = GPSDataProcessor(speed_th=1.5, acceleration_th=0.5, minimal_walking_duration=300, minimal_trip_duration=600, use_multiprocessing=True)

poi_waypoints = data_processor.guess_home_work(waypoints, cell_size=0.3)
smoothed_df = data_processor.smooth(poi_waypoints, sigma=100)
segmented_df = data_processor.segment(smoothed_df)
mode_df = data_processor.mode_detection(segmented_df)
legs = data_processor.get_legs(df = mode_df)
activities_density(args)[source]

Detect activities by density.

Args:
  • args (tuple): A tuple containing the waypoints DataFrame and a clusterer object.

Returns:
  • pandas.DataFrame: DataFrame with updated ‘detection’ column.

assign_cell(latitude, longitude, cell_size=0.2)[source]

Assign a cell number to a GPS point based on latitude and longitude.

Args:
  • latitude (float): Latitude of the GPS point.

  • longitude (float): Longitude of the GPS point.

  • cell_size (float): Size of the grid cell.

Returns:
  • int: Cell number for the GPS point.

correct_clusters(df)[source]

Correct detected clusters by merging them based on time and speed.

Args:
  • df (pandas.DataFrame): DataFrame with ‘detection’, ‘speed’, and ‘time_delta’ columns.

create_route(df) DataFrame[source]

Calculate additional statistics such as distance, time delta, speed, and acceleration for GPS waypoints.

Args:
  • df (pandas.DataFrame): The waypoints DataFrame to be processed.

Returns:
  • pandas.DataFrame: Waypoints DataFrame with additional statistics.

get_legs(df)[source]

Extract legs and activities from a DataFrame and aggregate the data for each leg or activity.

Args:
  • df (pandas.DataFrame): DataFrame with GPS data, including ‘detection’, ‘user_id’, and ‘type’ columns.

Returns:
  • pandas.DataFrame: DataFrame with aggregated legs and activities.

guess_home_work(waypoints, cell_size=0.2)[source]

Guess and assign home and work locations to waypoints based on processed GPS data.

Args:
  • waypoints (pandas.DataFrame): DataFrame with GPS waypoints.

  • cell_size (float): Size of the grid cell.

Returns:
  • pandas.DataFrame: DataFrame with ‘home_loc’ and ‘work_loc’ columns added.

static haversine_distance(lat1, lon1, lat2, lon2)[source]

Calculate the Haversine distance between two GPS coordinates.

Args:
  • lat1 (float): Latitude of the first point.

  • lon1 (float): Longitude of the first point.

  • lat2 (float): Latitude of the second point.

  • lon2 (float): Longitude of the second point.

Returns:
  • float: Haversine distance in meters.

mode_detection(df)[source]

Perform mode detection on a DataFrame, including walk detection and other modes using fuzzy logic.

Args:
  • df (pandas.DataFrame): DataFrame with GPS data.

Returns:
  • pandas.DataFrame: DataFrame with modes and walk tags in the ‘estimated_mode’ and ‘detection’ columns.

prepare_for_detection(df) DataFrame[source]

Flag detections as trips by default.

Args:
  • df (pandas.DataFrame): Waypoints DataFrame to be flagged.

Returns:
  • pandas.DataFrame: DataFrame with ‘detection’ column set to ‘trip’.

process_gps_data(waypoints, cell_size=0.2)[source]

Process GPS data to detect home and work locations for users.

Args:
  • waypoints (pandas.DataFrame): DataFrame with GPS waypoints.

  • cell_size (float): Size of the grid cell.

Returns:
  • dict: A dictionary containing user IDs as keys and their detected home and work locations as values.

segment(df) DataFrame[source]

Segment waypoints for all unique user_ids in the DataFrame.

Args:
  • df (pandas.DataFrame): Waypoints DataFrame with user_ids.

Returns:
  • pandas.DataFrame: DataFrame with the segment starts and ends for all users.

segment_per_user(df, user_id) DataFrame[source]

Find clusters of waypoints for a specific user.

Args:
  • df (pandas.DataFrame): Waypoints DataFrame to be processed.

  • user_id (int or str): The user_id for which to perform segmentation.

Returns:
  • pandas.DataFrame: DataFrame with the segment starts and ends for the specified user.

static smooth(df, accuracy_threshold=100, sigma=10, smoothing_method='time') DataFrame[source]

Clean and preprocess a raw GPS points DataFrame using smoothing techniques.

Args:
  • df (pandas.DataFrame): DataFrame with GPS points to be preprocessed.

  • accuracy_threshold (int, optional): Accuracy threshold for filtering. Defaults to 100.

  • sigma (int, optional): Sigma for Gaussian smoothing, defines the size of the smoothing window. Defaults to 10.

  • smoothing_method (str, optional): Smoothing method (‘time’ or ‘space’). Defaults to ‘time’.

Returns:
  • pandas.DataFrame: Cleaned and preprocessed DataFrame.

xyt.gps_to_actionspace module

class xyt.gps_to_actionspace.GPStoActionspace[source]

Bases: object

A class for analyzing GPS-based user activity and generating action spaces.

This class provides methods to process GPS data, compute action spaces for users, visualize action spaces using Matplotlib and Folium, calculate covariance matrices, and compute innovation rates based on user activity motifs.

compute_action_space(act: DataFrame, aggregation_method: str) DataFrame[source]

Compute action space Args for each user.

Args:
  • act (pandas.DataFrame): DataFrame with user activity data.

  • aggregation_method (str): Aggregation method (‘user_id’ or ‘user_id_day’).

Returns:
  • pandas.DataFrame: DataFrame with action space Args for each user.

covariance_matrix(action_space: ~pandas.core.frame.DataFrame, title: str = '', annot: bool = False, cmap=<matplotlib.colors.LinearSegmentedColormap object>)[source]

Map the correlation, covariance, or p-values of a set of observed variables.

Args:
  • action_space (pandas.DataFrame): DataFrame with action space data.

  • title (str): Title for the heatmap.

  • annot (bool): If True, return the values in the heatmap cells.

  • cmap: Color map for the heatmap.

Returns: - Heatmap of correlation, covariance, or p-values of observed variables.

static modified_std_distance(pp, center) float[source]

Calculate standard distance of a point array like std_distance() does in PySAL with the mean center, but we can specify here a different center.

Args:
  • pp: point pattern

  • center: array([ x, y])

Returns:
  • float: standard distance

plot_action_space(act: DataFrame, action_space: DataFrame, user: str, how: str = 'vignette', save: bool = False)[source]

Plot the action space for a specific user.

Args:
  • act (pandas.DataFrame): DataFrame with user activity data.

  • action_space (pandas.DataFrame): DataFrame with action space data.

  • user (str): User ID for whom the action space will be plotted.

  • how (str): Plotting method (‘vignette’ or ‘folium’).

  • save (bool): If True, save the plot.

Returns:
  • Visualization of the action space.

plot_ellipses(action_space: DataFrame, aggregation_method: str) Map[source]

Plot ellipses on a Folium map based on action space data.

Args:
  • action_space (pandas.DataFrame): DataFrame with action space data.

  • aggregation_method (str): Aggregation method (‘user_id’ or ‘user_id_day’).

Returns:
  • folium.Map: Folium map with ellipses.

static verify_columns(df: DataFrame, expected_columns: list[str]) None[source]

Verify if expected columns are present in the DataFrame.

Args:
  • df (pandas.DataFrame): DataFrame to check.

  • expected_columns (list[str]): List of column names expected in the DataFrame.

Raises:
  • ValueError: If columns are missing in the DataFrame.

xyt.gps_to_graph module

class xyt.gps_to_graph.GPStoGraph[source]

Bases: object

get_graphs(df: DataFrame, verbose: bool = True) DataFrame[source]

Extracts motifs and graphs from GPS data.

Args:
  • df (pd.DataFrame): DataFrame containing GPS data.

  • verbose (bool): Verbosity for multiprocessing (default is True).

Returns:
  • pd.DataFrame: DataFrame with extracted motifs and graphs.

motif_sequence(mtf: DataFrame, n_cols: int = 60) DataFrame[source]

Generates motif sequences for users based on motif data.

Args:
  • mtf (pd.DataFrame): DataFrame containing motif data.

  • n_cols (int): Number of columns for the motif sequence (default is 60).

Returns:
  • pd.DataFrame: DataFrame with generated motif sequences for users.

plot_graph(mtf: DataFrame, path='') None[source]

Creates GIFs displaying the graph motif for multiple users.

Args:
  • mtf (pd.DataFrame): DataFrame containing motif data.

  • path (str): Path to save the GIFs (default is ‘’).

plot_motif(mtf: DataFrame) None[source]

Plots motifs and their frequency distribution.

Args:
  • mtf (pd.DataFrame): DataFrame containing motif data.

static verify_columns(df: DataFrame, expected_columns: list[str]) None[source]

Checks for missing columns in a DataFrame.

Args:
  • df (pd.DataFrame): DataFrame to check columns.

  • expected_columns (list[str]): List of expected column names.

Raises:
  • ValueError: If columns in expected_columns are missing in the DataFrame.

xyt.xyt_plot module

xyt.xyt_plot.plot_gps_on_map(df, trace_type=None, home_col=None, work_col=None, geo_columns=None)[source]

Plot location-based data on a Folium map with different colors for each group.

Args:
  • df (pd.DataFrame): A Pandas DataFrame containing the location-based data to be plotted.

  • trace_type (str, optional): The type of data to be plotted (‘Stay’, ‘Track’, or ‘Waypoint’). If provided, only data of the specified trace type will be plotted.

  • home_col (str, optional): Name of the column containing the home coordinates. Default is None.

  • work_col (str, optional): Name of the column containing the work coordinates. Default is None.

  • geo_columns (str or list, optional): Name of the column(s) containing the latitude and longitude coordinates. Can be a string (e.g., ‘geometry’) or a list (e.g., [‘latitude’, ‘longitude’]). Default is None.

Returns:
  • Interactive Folium map.

Module contents