ntv-pandas.ntv_pandas

NTV-pandas Package

Created on Sept 2023

@author: philippe@loco-labs.io

This package contains the following classes and functions:

  • ntv-pandas.ntv_pandas.pandas_ntv_connector :
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.PdUtil
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json
    • ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type

Why a NTV-pandas converter ?

pandas provide JSON converter but three limitations are present:

  • the JSON-pandas converter take into account few data types,
  • the JSON-pandas converter is not always reversible (conversion round trip)
  • external data types (e.g. TableSchema types) are not included

main features

The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.

The converter integrates:

  • all the pandas dtype and the data-type associated to a JSON representation,
  • an always reversible conversion,
  • a full compatibility with TableSchema specification

NTV-pandas was developped originally in the json-NTV project

example

In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).

The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).

With the existing JSON interface, these conversions are not possible.

data example

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'index':        [100, 200, 300, 400, 500],
                'dates::date':  [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
                'value':        [10, 10, 20, 20, 30],
                'value32':      pd.Series([12, 12, 22, 22, 32], dtype='int32'),
                'res':          [10, 20, 30, 10, 20],
                'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
                'names':        pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
                'unique':       True }

In [3]: df = pd.DataFrame(data).set_index('index')
        df.index.name = None

In [4]: df
Out[4]:       dates::date  value  value32  res coord::point   names  unique
        100    1964-01-01     10       12   10  POINT (1 2)    john    True
        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
        500    1985-02-05     30       32   20  POINT (3 4)  hector    True

JSON-NTV representation

In [5]: df_to_json = npd.to_json(df)
        pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
                  'value': [10, 10, 20, 20, 30],
                  'value32::int32': [12, 12, 22, 22, 32],
                  'res': [10, 20, 30, 10, 20],
                  'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
                  'unique': True}}

Reversibility

In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True

Table Schema representation

In [7]: df_to_table = npd.to_json(df, table=True)
        pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
         'dates': '1964-01-01',
         'value': 10,
         'value32': 12,
         'res': 10,
         'coord': [1.0, 2.0],
         'names': 'john',
         'unique': True}

In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
                    {'name': 'dates', 'type': 'date'},
                    {'name': 'value', 'type': 'integer'},
                    {'name': 'value32', 'type': 'integer', 'format': 'int32'},
                    {'name': 'res', 'type': 'integer'},
                    {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
                    {'name': 'names', 'type': 'string'},
                    {'name': 'unique', 'type': 'boolean'}],
         'primaryKey': ['index'],
         'pandas_version': '1.4.0'}

Reversibility

In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True
  1# -*- coding: utf-8 -*-
  2"""
  3***NTV-pandas Package***
  4
  5Created on Sept 2023
  6
  7@author: philippe@loco-labs.io
  8
  9This package contains the following classes and functions:
 10
 11- `ntv-pandas.ntv_pandas.pandas_ntv_connector` :
 12    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec`
 13    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec`
 14    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.PdUtil`
 15    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json`
 16    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json`
 17    - `ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type`
 18
 19
 20# Why a NTV-pandas converter ?
 21pandas provide JSON converter but three limitations are present:
 22- the JSON-pandas converter take into account few data types,
 23- the JSON-pandas converter is not always reversible (conversion round trip)
 24- external data types (e.g. TableSchema types) are not included
 25
 26# main features
 27The NTV-pandas converter uses the [semantic NTV format](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm) 
 28to include a large set of data types in a JSON representation.    
 29    
 30The converter integrates:
 31- all the pandas `dtype` and the data-type associated to a JSON representation,
 32- an always reversible conversion,
 33- a full compatibility with TableSchema specification
 34
 35NTV-pandas was developped originally in the [json-NTV project](https://github.com/loco-philippe/NTV)
 36
 37# example
 38
 39In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).
 40
 41The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).
 42
 43With the existing JSON interface, these conversions are not possible.
 44
 45*data example*
 46```python
 47In [1]: from shapely.geometry import Point
 48        from datetime import date
 49        import pandas as pd
 50        import ntv_pandas as npd
 51
 52In [2]: data = {'index':        [100, 200, 300, 400, 500],
 53                'dates::date':  [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
 54                'value':        [10, 10, 20, 20, 30],
 55                'value32':      pd.Series([12, 12, 22, 22, 32], dtype='int32'),
 56                'res':          [10, 20, 30, 10, 20],
 57                'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
 58                'names':        pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
 59                'unique':       True }
 60
 61In [3]: df = pd.DataFrame(data).set_index('index')
 62        df.index.name = None
 63
 64In [4]: df
 65Out[4]:       dates::date  value  value32  res coord::point   names  unique
 66        100    1964-01-01     10       12   10  POINT (1 2)    john    True
 67        200    1985-02-05     10       12   20  POINT (3 4)    eric    True
 68        300    2022-01-21     20       22   30  POINT (5 6)  judith    True
 69        400    1964-01-01     20       22   10  POINT (7 8)    mila    True
 70        500    1985-02-05     30       32   20  POINT (3 4)  hector    True
 71```
 72
 73*JSON-NTV representation*
 74
 75```python
 76In [5]: df_to_json = npd.to_json(df)
 77        pprint(df_to_json, compact=True, width=120, sort_dicts=False)
 78Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
 79                  'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
 80                  'value': [10, 10, 20, 20, 30],
 81                  'value32::int32': [12, 12, 22, 22, 32],
 82                  'res': [10, 20, 30, 10, 20],
 83                  'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
 84                  'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
 85                  'unique': True}}
 86```
 87
 88*Reversibility*
 89
 90```python
 91In [6]: print(npd.read_json(df_to_json).equals(df))
 92Out[6]: True
 93```
 94
 95*Table Schema representation*
 96
 97```python
 98In [7]: df_to_table = npd.to_json(df, table=True)
 99        pprint(df_to_table['data'][0], sort_dicts=False)
100Out[7]: {'index': 100,
101         'dates': '1964-01-01',
102         'value': 10,
103         'value32': 12,
104         'res': 10,
105         'coord': [1.0, 2.0],
106         'names': 'john',
107         'unique': True}
108
109In [8]: pprint(df_to_table['schema'], sort_dicts=False)
110Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
111                    {'name': 'dates', 'type': 'date'},
112                    {'name': 'value', 'type': 'integer'},
113                    {'name': 'value32', 'type': 'integer', 'format': 'int32'},
114                    {'name': 'res', 'type': 'integer'},
115                    {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
116                    {'name': 'names', 'type': 'string'},
117                    {'name': 'unique', 'type': 'boolean'}],
118         'primaryKey': ['index'],
119         'pandas_version': '1.4.0'}
120```
121
122*Reversibility*
123
124```python
125In [9]: print(npd.read_json(df_to_table).equals(df))
126Out[9]: True
127```
128
129"""
130from ntv_pandas.pandas_ntv_connector import DataFrameConnec, SeriesConnec, read_json, to_json, as_def_type
131
132
133#print('package :', __package__)