Wie man zwei Zeitstempel in gleiche Teile teilt

TLDR:

pd.date_range(start=s, end=e, periods=period)

Problem

Wie kann man gleichmäßige Zeitabstände zwischen zwei Zeitstempeln erzeugen?

Mein erster Gedanke, wenn es darum geht, gleichmäßig verteilte Zahlen zu erstellen, ist die Verwendung von linspace. Dazu müssten die Datumsangaben in Unix-Zeitstempel umgewandelt, die entsprechenden Zeitstempel erzeugt und alles wieder und alles wieder in den Ausgangsdatentyp transformiert werden. Allerdings gibt es mittlerweile eine elegantere Methode, die die Funktion pandas.date_range verwendet (pandas doc)! Die oben beschriebenen Alternativen finden sich auch bei geeksforgeeks wieder.

Lösung

Um nun mehrere Zeitstempel zu verarbeiten und das Start- und Enddatum in der endgültigen Ergebnis zu berücksichtigen, habe ich diese Kernfunktionalität erweitert und einige weitere Parameter hinzugefügt.

from typing import Union, List
import datetime  as dt
import pandas as pd

def divide_date_equally(start:Union[List[dt.datetime],dt.datetime], 
                        end:Union[List[dt.datetime],dt.datetime], 
                        chunks:Union[List[int],int],
                        include_start:bool=True,
                        include_end:bool=True):
  """
  This function creates equally sized chunks between a start and a end date. Some cases might want to exclude
  either of them, which can be controlled through the include_X flags. Start and End dates can also be located in a list
  to calculate multiple pairs.

  :param start: The start date.
  :param end: The end date.
  :param chunks: How many elements the result should contain. If a list of start and end date pairs are submitted, this parameter can be a
          intger, which results in all the elements being treated the same, or a list of integers, which allows for more specified chunks definition.
  :param include_start: Whether the start date should be included in the resulting list
  :param include_end: Whether the end date should be included in the resulting list

  Example:
    start = 1, end = 5, chunks = 5 => [1,2,3,4,5]
    start = 1, end = 5, chunks = 4, include_start=False => [2,3,4,5]
  """

  # Checking for our parameters first
  if (not isinstance(start, list)) != (not isinstance(end, list)):
    raise TypeError ("Both start and end parameters have to be either a list or not a list")
  elif (not isinstance(start, list)) & (not isinstance(start, list)):
    print("elements are not in a list")
    start = [start]
    end = [end]
  elif (isinstance(start, list)) & (isinstance(end, list)):
    print("elements are a list")
    pass
  else:
    raise Exception ("something funky is going on")
  
  # chunks can be a non list, then lets transform it into an appropriate list item
  if not isinstance(chunks, list):
    chunks = [chunks]*len(start)
  else:
    if len(chunks) != len(start):
      raise Exception ("Mismatch of length between the provided chunks and dates")
  
  # offset if we do not want to include the start or end time point 
  offset_start = int(not include_start)
  offset_end = int(not include_end)
  total_offset = offset_start + offset_end

  result = []
  for s,e,n in zip(start, end, chunks):
    result.append(pd.date_range(start=s, end=e, periods=n+total_offset)[offset_start:n+total_offset-offset_end])
  return result

TLDR:

Problem

Lösung

Kommentar verfassen Kommentieren abbrechen