How to divide two timestamps in equal chunks

TLDR:

pd.date_range(start=s, end=e, periods=period)

Task

Given two timestamps, how to create equally spaced out times between them?

My first thought, when it comes to creating evenly spaced numbers is using linspace. This would require transforming the datetimes into unix timestamps, generating the intermediate timestamps and transforming everything back to datetime objects. However there is a more elegant method using the pandas.date_range function now (pandas doc)! The above discussed alternatives can be also found at geeksforgeeks.

Solution

To process multiple dates as well as take into account, to exclude or include the start and end date in the final sequence, i expanded on that core functionality and included some more options.

from typing import Union, List
import datetime  as dt
import pandas as pd

def divide_date_equally(start:Union[List[dt.datetime],dt.datetime], 
                        end:Union[List[dt.datetime],dt.datetime], 
                        chunks:Union[List[int],int],
                        include_start:bool=True,
                        include_end:bool=True):
  """
  This function creates equally sized chunks between a start and a end date. Some cases might want to exclude
  either of them, which can be controlled through the include_X flags. Start and End dates can also be located in a list
  to calculate multiple pairs.

  :param start: The start date.
  :param end: The end date.
  :param chunks: How many elements the result should contain. If a list of start and end date pairs are submitted, this parameter can be a
          intger, which results in all the elements being treated the same, or a list of integers, which allows for more specified chunks definition.
  :param include_start: Whether the start date should be included in the resulting list
  :param include_end: Whether the end date should be included in the resulting list

  Example:
    start = 1, end = 5, chunks = 5 => [1,2,3,4,5]
    start = 1, end = 5, chunks = 4, include_start=False => [2,3,4,5]
  """

  # Checking for our parameters first
  if (not isinstance(start, list)) != (not isinstance(end, list)):
    raise TypeError ("Both start and end parameters have to be either a list or not a list")
  elif (not isinstance(start, list)) & (not isinstance(start, list)):
    print("elements are not in a list")
    start = [start]
    end = [end]
  elif (isinstance(start, list)) & (isinstance(end, list)):
    print("elements are a list")
    pass
  else:
    raise Exception ("something funky is going on")
  
  # chunks can be a non list, then lets transform it into an appropriate list item
  if not isinstance(chunks, list):
    chunks = [chunks]*len(start)
  else:
    if len(chunks) != len(start):
      raise Exception ("Mismatch of length between the provided chunks and dates")
  
  # offset if we do not want to include the start or end time point 
  offset_start = int(not include_start)
  offset_end = int(not include_end)
  total_offset = offset_start + offset_end

  result = []
  for s,e,n in zip(start, end, chunks):
    result.append(pd.date_range(start=s, end=e, periods=n+total_offset)[offset_start:n+total_offset-offset_end])
  return result

Leave a Comment

Your email address will not be published. Required fields are marked *

hungsblog | Nguyen Hung Manh | Dresden
Scroll to Top