ntv-pandas.ntv_pandas
NTV-pandas Package
Created on Sept 2023
@author: philippe@loco-labs.io
This package contains the following classes and functions:
ntv-pandas.ntv_pandas.pandas_ntv_connector
:ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec
ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec
ntv-pandas.ntv_pandas.pandas_ntv_connector.PdUtil
ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json
ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json
ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type
Why a NTV-pandas converter ?
pandas provide JSON converter but three limitations are present:
- the JSON-pandas converter take into account few data types,
- the JSON-pandas converter is not always reversible (conversion round trip)
- external data types (e.g. TableSchema types) are not included
main features
The NTV-pandas converter uses the semantic NTV format to include a large set of data types in a JSON representation.
The converter integrates:
- all the pandas
dtype
and the data-type associated to a JSON representation, - an always reversible conversion,
- a full compatibility with TableSchema specification
NTV-pandas was developped originally in the json-NTV project
example
In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format).
The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility).
With the existing JSON interface, these conversions are not possible.
data example
In [1]: from shapely.geometry import Point
from datetime import date
import pandas as pd
import ntv_pandas as npd
In [2]: data = {'index': [100, 200, 300, 400, 500],
'dates::date': [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)],
'value': [10, 10, 20, 20, 30],
'value32': pd.Series([12, 12, 22, 22, 32], dtype='int32'),
'res': [10, 20, 30, 10, 20],
'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)],
'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'),
'unique': True }
In [3]: df = pd.DataFrame(data).set_index('index')
df.index.name = None
In [4]: df
Out[4]: dates::date value value32 res coord::point names unique
100 1964-01-01 10 12 10 POINT (1 2) john True
200 1985-02-05 10 12 20 POINT (3 4) eric True
300 2022-01-21 20 22 30 POINT (5 6) judith True
400 1964-01-01 20 22 10 POINT (7 8) mila True
500 1985-02-05 30 32 20 POINT (3 4) hector True
JSON-NTV representation
In [5]: df_to_json = npd.to_json(df)
pprint(df_to_json, compact=True, width=120, sort_dicts=False)
Out[5]: {':tab': {'index': [100, 200, 300, 400, 500],
'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'],
'value': [10, 10, 20, 20, 30],
'value32::int32': [12, 12, 22, 22, 32],
'res': [10, 20, 30, 10, 20],
'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]],
'names::string': ['john', 'eric', 'judith', 'mila', 'hector'],
'unique': True}}
Reversibility
In [6]: print(npd.read_json(df_to_json).equals(df))
Out[6]: True
Table Schema representation
In [7]: df_to_table = npd.to_json(df, table=True)
pprint(df_to_table['data'][0], sort_dicts=False)
Out[7]: {'index': 100,
'dates': '1964-01-01',
'value': 10,
'value32': 12,
'res': 10,
'coord': [1.0, 2.0],
'names': 'john',
'unique': True}
In [8]: pprint(df_to_table['schema'], sort_dicts=False)
Out[8]: {'fields': [{'name': 'index', 'type': 'integer'},
{'name': 'dates', 'type': 'date'},
{'name': 'value', 'type': 'integer'},
{'name': 'value32', 'type': 'integer', 'format': 'int32'},
{'name': 'res', 'type': 'integer'},
{'name': 'coord', 'type': 'geopoint', 'format': 'array'},
{'name': 'names', 'type': 'string'},
{'name': 'unique', 'type': 'boolean'}],
'primaryKey': ['index'],
'pandas_version': '1.4.0'}
Reversibility
In [9]: print(npd.read_json(df_to_table).equals(df))
Out[9]: True
1# -*- coding: utf-8 -*- 2""" 3***NTV-pandas Package*** 4 5Created on Sept 2023 6 7@author: philippe@loco-labs.io 8 9This package contains the following classes and functions: 10 11- `ntv-pandas.ntv_pandas.pandas_ntv_connector` : 12 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.DataFrameConnec` 13 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.SeriesConnec` 14 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.PdUtil` 15 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.to_json` 16 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.read_json` 17 - `ntv-pandas.ntv_pandas.pandas_ntv_connector.as_def_type` 18 19 20# Why a NTV-pandas converter ? 21pandas provide JSON converter but three limitations are present: 22- the JSON-pandas converter take into account few data types, 23- the JSON-pandas converter is not always reversible (conversion round trip) 24- external data types (e.g. TableSchema types) are not included 25 26# main features 27The NTV-pandas converter uses the [semantic NTV format](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm) 28to include a large set of data types in a JSON representation. 29 30The converter integrates: 31- all the pandas `dtype` and the data-type associated to a JSON representation, 32- an always reversible conversion, 33- a full compatibility with TableSchema specification 34 35NTV-pandas was developped originally in the [json-NTV project](https://github.com/loco-philippe/NTV) 36 37# example 38 39In the example below, a DataFrame with multiple data types is converted to JSON (first to NTV format and then to Table Schema format). 40 41The DataFrame resulting from these JSON conversions are identical to the initial DataFrame (reversibility). 42 43With the existing JSON interface, these conversions are not possible. 44 45*data example* 46```python 47In [1]: from shapely.geometry import Point 48 from datetime import date 49 import pandas as pd 50 import ntv_pandas as npd 51 52In [2]: data = {'index': [100, 200, 300, 400, 500], 53 'dates::date': [date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5)], 54 'value': [10, 10, 20, 20, 30], 55 'value32': pd.Series([12, 12, 22, 22, 32], dtype='int32'), 56 'res': [10, 20, 30, 10, 20], 57 'coord::point': [Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4)], 58 'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector'], dtype='string'), 59 'unique': True } 60 61In [3]: df = pd.DataFrame(data).set_index('index') 62 df.index.name = None 63 64In [4]: df 65Out[4]: dates::date value value32 res coord::point names unique 66 100 1964-01-01 10 12 10 POINT (1 2) john True 67 200 1985-02-05 10 12 20 POINT (3 4) eric True 68 300 2022-01-21 20 22 30 POINT (5 6) judith True 69 400 1964-01-01 20 22 10 POINT (7 8) mila True 70 500 1985-02-05 30 32 20 POINT (3 4) hector True 71``` 72 73*JSON-NTV representation* 74 75```python 76In [5]: df_to_json = npd.to_json(df) 77 pprint(df_to_json, compact=True, width=120, sort_dicts=False) 78Out[5]: {':tab': {'index': [100, 200, 300, 400, 500], 79 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05'], 80 'value': [10, 10, 20, 20, 30], 81 'value32::int32': [12, 12, 22, 22, 32], 82 'res': [10, 20, 30, 10, 20], 83 'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0]], 84 'names::string': ['john', 'eric', 'judith', 'mila', 'hector'], 85 'unique': True}} 86``` 87 88*Reversibility* 89 90```python 91In [6]: print(npd.read_json(df_to_json).equals(df)) 92Out[6]: True 93``` 94 95*Table Schema representation* 96 97```python 98In [7]: df_to_table = npd.to_json(df, table=True) 99 pprint(df_to_table['data'][0], sort_dicts=False) 100Out[7]: {'index': 100, 101 'dates': '1964-01-01', 102 'value': 10, 103 'value32': 12, 104 'res': 10, 105 'coord': [1.0, 2.0], 106 'names': 'john', 107 'unique': True} 108 109In [8]: pprint(df_to_table['schema'], sort_dicts=False) 110Out[8]: {'fields': [{'name': 'index', 'type': 'integer'}, 111 {'name': 'dates', 'type': 'date'}, 112 {'name': 'value', 'type': 'integer'}, 113 {'name': 'value32', 'type': 'integer', 'format': 'int32'}, 114 {'name': 'res', 'type': 'integer'}, 115 {'name': 'coord', 'type': 'geopoint', 'format': 'array'}, 116 {'name': 'names', 'type': 'string'}, 117 {'name': 'unique', 'type': 'boolean'}], 118 'primaryKey': ['index'], 119 'pandas_version': '1.4.0'} 120``` 121 122*Reversibility* 123 124```python 125In [9]: print(npd.read_json(df_to_table).equals(df)) 126Out[9]: True 127``` 128 129""" 130from ntv_pandas.pandas_ntv_connector import DataFrameConnec, SeriesConnec, read_json, to_json, as_def_type 131 132 133#print('package :', __package__)