ntv-pandas.ntv_pandas

NTV-pandas Package

Created on Sept 2023

@author: philippe@loco-labs.io

This package contains the following classes and functions:

  • ntv-pandas.ntv_pandas.pandas_ntv_connector :
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type

NTV-pandas : A semantic, compact and reversible JSON-pandas converter

Why a NTV-pandas converter ?

pandas provide JSON converter but three limitations are present:

  • the JSON-pandas converter take into account a few data types,
  • the JSON-pandas converter is not always reversible (round type)
  • external dtype (e.g. TableSchema type) are not included

main features

The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation. The converter integrates:

  • all the pandas dtype and the data-type associated to a JSON representation,
  • an always reversible conversion,
  • a full compatibility with TableSchema specification

example

In the example below, a DataFrame with several data types is converted to JSON.

The DataFrame resulting from this JSON is identical to the initial DataFrame (reversibility).

With the existing JSON interface, this conversion is not possible.

data example

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'index':           [100, 200, 300, 400, 500, 600],
                'dates::date':     pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5), date(2022,1,21)]),
                'value':           [10, 10, 20, 20, 30, 30],
                'value32':         pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),
                'res':             [10, 20, 30, 10, 20, 30],
                'coord::point':    pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),
                'names':           pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),
                'unique':          True }

In [3]: df = pd.DataFrame(data).set_index('index')

In [4]: df
Out[4]:
              dates::date  value  value32  res coord::point   names  unique
        index
        100    1964-01-01     10       12   10  POINT (1 2)    john    True
        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
        500    1985-02-05     30       32   20  POINT (3 4)  hector    True
        600    2022-01-21     30       32   30  POINT (5 6)   maria    True

JSON representation

In [5]: df_to_json = npd.to_json(df)
        pprint(df_to_json, compact=True, width=120)
Out[5]:
        {':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],
                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],
                  'index': [100, 200, 300, 400, 500, 600],
                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
                  'res': [10, 20, 30, 10, 20, 30],
                  'unique': [True, True, True, True, True, True],
                  'value': [10, 10, 20, 20, 30, 30],
                  'value32::int32': [12, 12, 22, 22, 32, 32]}}

Reversibility

In [5]: df_from_json = npd.read_json(df_to_json)
        print('df created from JSON is equal to initial df ? ', df_from_json.equals(df))

Out[5]: df created from JSON is equal to initial df ?  True
  1# -*- coding: utf-8 -*-
  2"""
  3***NTV-pandas Package***
  4
  5Created on Sept 2023
  6
  7@author: philippe@loco-labs.io
  8
  9This package contains the following classes and functions:
 10
 11- `ntv-pandas.ntv_pandas.pandas_ntv_connector` :
 12    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec`
 13    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec`
 14    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json`
 15    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json`
 16    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type`
 17
 18
 19# NTV-pandas : A semantic, compact and reversible JSON-pandas converter
 20
 21# Why a NTV-pandas converter ?
 22
 23pandas provide JSON converter but three limitations are present:
 24- the JSON-pandas converter take into account a few data types,
 25- the JSON-pandas converter is not always reversible (round type)
 26- external dtype (e.g. TableSchema type) are not included
 27
 28# main features
 29
 30The NTV-pandas converter uses the [semantic NTV format
 31](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm) 
 32to include a large set of data types in a JSON representation.
 33The converter integrates:
 34- all the pandas `dtype` and the data-type associated to a JSON representation,
 35- an always reversible conversion,
 36- a full compatibility with TableSchema specification
 37
 38# example
 39
 40In the example below, a DataFrame with several data types is converted to JSON.
 41
 42The DataFrame resulting from this JSON is identical to the initial DataFrame (reversibility).
 43
 44With the existing JSON interface, this conversion is not possible.
 45
 46*data example*
 47```python
 48In [1]: from shapely.geometry import Point
 49        from datetime import date
 50        import pandas as pd
 51        import ntv_pandas as npd
 52
 53In [2]: data = {'index':           [100, 200, 300, 400, 500, 600],
 54                'dates::date':     pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5), date(2022,1,21)]),
 55                'value':           [10, 10, 20, 20, 30, 30],
 56                'value32':         pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),
 57                'res':             [10, 20, 30, 10, 20, 30],
 58                'coord::point':    pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),
 59                'names':           pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),
 60                'unique':          True }
 61
 62In [3]: df = pd.DataFrame(data).set_index('index')
 63
 64In [4]: df
 65Out[4]:
 66              dates::date  value  value32  res coord::point   names  unique
 67        index
 68        100    1964-01-01     10       12   10  POINT (1 2)    john    True
 69        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
 70        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
 71        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
 72        500    1985-02-05     30       32   20  POINT (3 4)  hector    True
 73        600    2022-01-21     30       32   30  POINT (5 6)   maria    True
 74```
 75
 76*JSON representation*
 77
 78```python
 79In [5]: df_to_json = npd.to_json(df)
 80        pprint(df_to_json, compact=True, width=120)
 81Out[5]:
 82        {':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],
 83                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],
 84                  'index': [100, 200, 300, 400, 500, 600],
 85                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
 86                  'res': [10, 20, 30, 10, 20, 30],
 87                  'unique': [True, True, True, True, True, True],
 88                  'value': [10, 10, 20, 20, 30, 30],
 89                  'value32::int32': [12, 12, 22, 22, 32, 32]}}
 90```
 91
 92*Reversibility*
 93
 94```python
 95In [5]: df_from_json = npd.read_json(df_to_json)
 96        print('df created from JSON is equal to initial df ? ', df_from_json.equals(df))
 97
 98Out[5]: df created from JSON is equal to initial df ?  True
 99```
100
101
102"""
103from ntv_pandas.pandas_ntv_connector import DataFrameConnec, SeriesConnec, read_json, to_json, as_def_type
104
105
106#print('package :', __package__)