python.observation.dataset

Created on Thu May 26 20:30:00 2022

@author: philippe@loco-labs.io

The python.observation.dataset module contains the Dataset class.

Documentation is available in other pages :

The Json Standard for Dataset is define here
The concept of 'indexed list' is describe in this page.
The non-regression test are at this page
The examples are :

View Source

  1# -*- coding: utf-8 -*-
  2"""
  3Created on Thu May 26 20:30:00 2022
  4
  5@author: philippe@loco-labs.io
  6
  7The `python.observation.dataset` module contains the `Dataset` class.
  8
  9Documentation is available in other pages :
 10
 11- The Json Standard for Dataset is define
 12[here](https://github.com/loco-philippe/Environmental-Sensing/tree/main/documentation/DatasetJSON-Standard.pdf)
 13- The concept of 'indexed list' is describe in
 14[this page](https://github.com/loco-philippe/Environmental-Sensing/wiki/Indexed-list).
 15- The non-regression test are at
 16[this page](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Tests/test_dataset.py)
 17- The [examples](https://github.com/loco-philippe/Environmental-Sensing/tree/main/python/Examples/Dataset)
 18 are :
 19    - [creation](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_creation.ipynb)
 20    - [variable](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_variable.ipynb)
 21    - [update](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_update.ipynb)
 22    - [structure](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure.ipynb)
 23    - [structure-analysis](https://github.com/loco-philippe/Environmental-Sensing/blob/main/python/Examples/Dataset/Dataset_structure-analysis.ipynb)
 24
 25---
 26"""
 27# %% declarations
 28from collections import Counter
 29from copy import copy
 30from abc import ABC
 31import math
 32import json
 33import csv
 34
 35from observation.fields import Nfield
 36from observation.util import util
 37from observation.dataset_interface import DatasetInterface, DatasetError
 38from observation.dataset_structure import DatasetStructure
 39from observation.dataset_analysis import Analysis
 40from json_ntv.ntv import Ntv, NtvConnector
 41
 42class Dataset(DatasetStructure, DatasetInterface, ABC):
 43    # %% intro
 44    '''
 45    An `Dataset` is a representation of an indexed list.
 46
 47    *Attributes (for @property see methods)* :
 48
 49    - **lindex** : list of Field
 50    - **analysis** : Analysis object (data structure)
 51
 52    The methods defined in this class are :
 53
 54    *constructor (@classmethod))*
 55
 56    - `Dataset.ntv`
 57    - `Dataset.from_csv`
 58    - `Dataset.from_ntv`
 59    - `Dataset.from_file`
 60    - `Dataset.merge`
 61
 62    *abstract static methods (@abstractmethod, @staticmethod)*
 63
 64    - `Dataset.field_class`
 65    
 66    *dynamic value - module analysis (getters @property)*
 67
 68    - `Dataset.extidx`
 69    - `Dataset.extidxext`
 70    - `Dataset.groups`
 71    - `Dataset.idxname`
 72    - `Dataset.idxlen`
 73    - `Dataset.iidx`
 74    - `Dataset.lenidx`
 75    - `Dataset.lidx`
 76    - `Dataset.lidxrow`
 77    - `Dataset.lisvar`
 78    - `Dataset.lvar`
 79    - `Dataset.lvarname`
 80    - `Dataset.lvarrow`
 81    - `Dataset.lunicname`
 82    - `Dataset.lunicrow`
 83    - `Dataset.primaryname`
 84    - `Dataset.setidx`
 85    - `Dataset.zip`
 86
 87    *dynamic value (getters @property)*
 88
 89    - `Dataset.keys`
 90    - `Dataset.iindex`
 91    - `Dataset.indexlen`
 92    - `Dataset.lenindex`
 93    - `Dataset.lname`
 94    - `Dataset.tiindex`
 95
 96    *global value (getters @property)*
 97
 98    - `Dataset.category`
 99    - `Dataset.complete`
100    - `Dataset.consistent`
101    - `Dataset.dimension`
102    - `Dataset.lencomplete`
103    - `Dataset.primary`
104    - `Dataset.secondary`
105
106    *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)*
107
108    - `Dataset.couplingmatrix`
109    - `Dataset.idxrecord`
110    - `Dataset.indexinfos`
111    - `Dataset.indicator`
112    - `Dataset.iscanonorder`
113    - `Dataset.isinrecord`
114    - `Dataset.keytoval`
115    - `Dataset.loc`
116    - `Dataset.nindex`
117    - `Dataset.record`
118    - `Dataset.recidx`
119    - `Dataset.recvar`
120    - `Dataset.tree`
121    - `Dataset.valtokey`
122
123    *add - update methods (`observation.dataset_structure.DatasetStructure`)*
124
125    - `Dataset.add`
126    - `Dataset.addindex`
127    - `Dataset.append`
128    - `Dataset.delindex`
129    - `Dataset.delrecord`
130    - `Dataset.orindex`
131    - `Dataset.renameindex`
132    - `Dataset.setvar`
133    - `Dataset.setname`
134    - `Dataset.updateindex`
135
136    *structure management - methods (`observation.dataset_structure.DatasetStructure`)*
137
138    - `Dataset.applyfilter`
139    - `Dataset.coupling`
140    - `Dataset.full`
141    - `Dataset.getduplicates`
142    - `Dataset.mix`
143    - `Dataset.merging`
144    - `Dataset.reindex`
145    - `Dataset.reorder`
146    - `Dataset.setfilter`
147    - `Dataset.sort`
148    - `Dataset.swapindex`
149    - `Dataset.setcanonorder`
150    - `Dataset.tostdcodec`
151
152    *exports methods (`observation.dataset_interface.DatasetInterface`)*
153
154    - `Dataset.json`
155    - `Dataset.plot`
156    - `Dataset.to_obj`
157    - `Dataset.to_csv`
158    - `Dataset.to_dataframe`
159    - `Dataset.to_file`
160    - `Dataset.to_ntv`
161    - `Dataset.to_obj`
162    - `Dataset.to_xarray`
163    - `Dataset.view`
164    - `Dataset.vlist`
165    - `Dataset.voxel`
166    '''
167
168    field_class = None
169    
170    def __init__(self, listidx=None, reindex=True):
171        '''
172        Dataset constructor.
173
174        *Parameters*
175
176        - **listidx** :  list (default None) - list of Field data
177        - **reindex** : boolean (default True) - if True, default codec for each Field'''
178
179        self.name     = self.__class__.__name__
180        self.field    = self.field_class
181        self.analysis = Analysis(self)
182        self.lindex   = []
183        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
184            self.lindex = [copy(idx) for idx in listidx.lindex]
185            return
186        if not listidx:
187            return
188        self.lindex   = listidx
189        if reindex:
190            self.reindex()
191        self.analysis.actualize()
192        return
193
194    """@classmethod
195    def dic(cls, idxdic=None, reindex=True):
196        '''
197        Dataset constructor (external dictionnary).
198
199        *Parameters*
200
201        - **idxdic** : {name : values}  (see data model)
202        if not idxdic:
203            return cls.ext(idxval=None, idxname=None, reindex=reindex)
204        if isinstance(idxdic, Dataset):
205            return idxdic
206        if not isinstance(idxdic, dict):
207            raise DatasetError("idxdic not dict")
208        return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()),
209                       reindex=reindex)"""
210
211    """@classmethod
212    def ext(cls, idxval=None, idxname=None, reindex=True):
213        '''
214        Dataset constructor (external index).
215
216        *Parameters*
217
218        - **idxval** : list of Field or list of values (see data model)
219        - **idxname** : list of string (default None) - list of Field name (see data model)
220        if idxval is None:
221            idxval = []
222        if not isinstance(idxval, list):
223            return None
224        val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval]
225        lenval = [len(idx) for idx in val]
226        if lenval and max(lenval) != min(lenval):
227            raise DatasetError('the length of Field are different')
228        length = lenval[0] if lenval else 0
229        if idxname is None:
230            idxname = [None] * len(val)
231        for ind, name in enumerate(idxname):
232            if name is None or name == ES.defaultindex:
233                idxname[ind] = 'i'+str(ind)
234        lidx = [list(FieldInterface.decodeobj(
235            idx, typevalue, context=False)) for idx in val]
236        lindex = [Field(idx[2], name, list(range(length)), idx[1],
237                         lendefault=length, reindex=reindex)
238                  for idx, name in zip(lidx, idxname)]
239        return cls(lindex, reindex=False)"""
240
241    @classmethod
242    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
243                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
244        '''
245        Dataset constructor (from a csv file). Each column represents index values.
246
247        *Parameters*
248
249        - **filename** : string (default 'dataset.csv'), name of the file to read
250        - **header** : boolean (default True). If True, the first raw is dedicated to names
251        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
252        - **optcsv** : dict (default : quoting) - see csv.reader options'''
253        if not optcsv:
254            optcsv = {}
255        if not nrow:
256            nrow = -1
257        with open(filename, newline='', encoding="utf-8") as file:
258            reader = csv.reader(file, **optcsv)
259            irow = 0
260            for row in reader:
261                if irow == nrow:
262                    break
263                if irow == 0:
264                    idxval = [[] for i in range(len(row))]
265                    idxname = [''] * len(row)
266                if irow == 0 and header:
267                    idxname = row
268                else:
269                    for i in range(len(row)):
270                        if decode_json:
271                            try:
272                                idxval[i].append(json.loads(row[i]))
273                            except:
274                                idxval[i].append(row[i])
275                        else:
276                            idxval[i].append(row[i])
277                irow += 1
278        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
279        return cls(listidx=lindex, reindex=True)
280
281    @classmethod
282    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
283        '''
284        Generate Object from file storage.
285
286         *Parameters*
287
288        - **filename** : string - file name (with path)
289        - **forcestring** : boolean (default False) - if True,
290        forces the UTF-8 data format, else the format is calculated
291        - **reindex** : boolean (default True) - if True, default codec for each Field
292        - **decode_str**: boolean (default False) - if True, string are loaded in json data
293
294        *Returns* : new Object'''
295        with open(filename, 'rb') as file:
296            btype = file.read(1)
297        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
298            with open(filename, 'r', newline='', encoding="utf-8") as file:
299                bjson = file.read()
300        else:
301            with open(filename, 'rb') as file:
302                bjson = file.read()
303        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
304
305    """@classmethod
306    def obj(cls, bsd=None, reindex=True, context=True):
307        '''
308        Generate a new Object from a bytes, string or list value
309
310        *Parameters*
311
312        - **bsd** : bytes, string or list data to convert
313        - **reindex** : boolean (default True) - if True, default codec for each Field
314        - **context** : boolean (default True) - if False, only codec and keys are included'''
315        return cls.from_obj(bsd, reindex=reindex, context=context)"""
316
317    @classmethod
318    def ntv(cls, ntv_value, reindex=True):
319        '''Generate an Dataset Object from a ntv_value
320
321        *Parameters*
322
323        - **ntv_value** : bytes, string, Ntv object to convert
324        - **reindex** : boolean (default True) - if True, default codec for each Field'''
325        return cls.from_ntv(ntv_value, reindex=reindex)
326    
327    @classmethod
328    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
329        '''Generate an Dataset Object from a ntv_value
330
331        *Parameters*
332
333        - **ntv_value** : bytes, string, Ntv object to convert
334        - **reindex** : boolean (default True) - if True, default codec for each Field
335        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
336        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
337        if len(ntv) == 0:
338            return cls()
339        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
340        leng = max([idx[6] for idx in lidx])
341        for ind in range(len(lidx)):
342            if lidx[ind][0] == '':
343                lidx[ind][0] = 'i'+str(ind)
344            NtvConnector.init_ntv_keys(ind, lidx, leng)
345            #Dataset._init_ntv_keys(ind, lidx, leng)
346        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
347                     reindex=reindex) for idx in lidx]
348        return cls(lindex, reindex=reindex)
349
350    """@classmethod
351    def from_obj(cls, bsd=None, reindex=True, context=True):
352        '''
353        Generate an Dataset Object from a bytes, string or list value
354
355        *Parameters*
356
357        - **bsd** : bytes, string, DataFrame or list data to convert
358        - **reindex** : boolean (default True) - if True, default codec for each Field
359        - **context** : boolean (default True) - if False, only codec and keys are included'''
360        if isinstance(bsd, cls):
361            return bsd
362        if bsd is None:
363            bsd = []
364        if isinstance(bsd, bytes):
365            lis = cbor2.loads(bsd)
366        elif isinstance(bsd, str):
367            lis = json.loads(bsd, object_hook=CborDecoder().codecbor)
368        elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame':
369            lis = bsd
370        else:
371            raise DatasetError("the type of parameter is not available")
372        return cls._init_obj(lis, reindex=reindex, context=context)"""
373
374    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
375        '''
376        Merge method replaces Dataset objects included into its constituents.
377
378        *Parameters*
379
380        - **fillvalue** : object (default nan) - value used for the additional data
381        - **reindex** : boolean (default False) - if True, set default codec after transformation
382        - **simplename** : boolean (default False) - if True, new Field name are
383        the same as merged Field name else it is a composed name.
384
385        *Returns*: merged Dataset '''
386        ilc = copy(self)
387        delname = []
388        row = ilc[0]
389        if not isinstance(row, list):
390            row = [row]
391        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
392                                                      simplename=simplename)
393        if oldname and not oldname in merged.lname:
394            delname.append(oldname)
395        for ind in range(1, len(ilc)):
396            oldidx = ilc.nindex(oldname)
397            for name in newname:
398                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
399            row = ilc[ind]
400            if not isinstance(row, list):
401                row = [row]
402            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
403                                                       simplename=simplename)
404            if oldname and newname != [oldname]:
405                delname.append(oldname)
406            for name in newname:
407                oldidx = merged.nindex(oldname)
408                fillval = self.field.s_to_i(fillvalue)
409                merged.addindex(
410                    self.field([fillval] * len(merged), name, oldidx.keys))
411            merged += rec
412        for name in set(delname):
413            if name:
414                merged.delindex(name)
415        if reindex:
416            merged.reindex()
417        ilc.lindex = merged.lindex
418        return ilc
419
420    @classmethod
421    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
422        '''
423        Dataset constructor (external index).
424
425        *Parameters*
426
427        - **idxval** : list of Field or list of values (see data model)
428        - **idxname** : list of string (default None) - list of Field name (see data model)'''
429        if idxval is None:
430            idxval = []
431        if not isinstance(idxval, list):
432            return None
433        val = []
434        for idx in idxval:
435            if not isinstance(idx, list):
436                val.append([idx])
437            else:
438                val.append(idx)
439        lenval = [len(idx) for idx in val]
440        if lenval and max(lenval) != min(lenval):
441            raise DatasetError('the length of Iindex are different')
442        length = lenval[0] if lenval else 0
443        idxname = [None] * len(val) if idxname is None else idxname
444        for ind, name in enumerate(idxname):
445            if name is None or name == '$default':
446                idxname[ind] = 'i'+str(ind)
447        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
448                                  fast=fast) for codec, name in zip(val, idxname)]
449        return cls(lindex, reindex=False)
450    
451# %% internal
452
453    """@staticmethod
454    def _init_ntv_keys(ind, lidx, leng):
455        ''' initialization of explicit keys data in lidx object'''
456        # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6
457        name, typ, codec, parent, keys, coef, length = lidx[ind]
458        if (keys, parent, coef) == (None, None, None):  # full or unique
459            if len(codec) == 1: # unique
460                lidx[ind][4] = [0] * leng
461            elif len(codec) == leng:    # full
462                lidx[ind][4] = list(range(leng))
463            else:
464                raise DatasetError('impossible to generate keys')
465            return
466        if keys and len(keys) > 1 and parent is None:  #complete
467            return
468        if coef:  #primary
469            lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)]
470            lidx[ind][3] = None
471            return  
472        if parent is None:
473            raise DatasetError('keys not referenced')          
474        if not lidx[parent][4] or len(lidx[parent][4]) != leng:
475            Dataset._init_ntv_keys(parent, lidx, leng)
476        if not keys and len(codec) == len(lidx[parent][2]):    # implicit
477            lidx[ind][4] = lidx[parent][4]
478            lidx[ind][3] = None
479            return
480        lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys)  # relative
481        lidx[ind][3] = None
482        return"""
483
484    @staticmethod
485    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False):
486        #row = rec[0] if isinstance(rec, list) else rec
487        row = rec[0]
488        if not isinstance(row, list):
489            row = [row]
490        var = -1
491        for ind, val in enumerate(row):
492            if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']:
493                var = ind
494                break
495        if var < 0:
496            return (rec, None, [])
497        ilis = row[var]
498        oldname = rec.lname[var]
499        if ilis.lname == ['i0']:
500            newname = [oldname]
501            ilis.setname(newname)
502        elif not simplename:
503            newname = [oldname + '_' + name for name in ilis.lname]
504            ilis.setname(newname)
505        else:
506            newname = copy(ilis.lname)
507        for name in rec.lname:
508            if name in newname:
509                newname.remove(name)
510            else:
511                updidx = name in ilis.lname and not updateidx
512                ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
513                              merge=mergeidx, update=updidx)
514                #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)],
515                #              merge=mergeidx, update=updidx)
516        return (ilis, oldname, newname)
517
518# %% special
519    def __str__(self):
520        '''return string format for var and lidx'''
521        stri = ''
522        if self.lvar:
523            stri += 'variables :\n'
524            for idx in self.lvar:
525                stri += '    ' + str(idx) + '\n'
526        if self.lidx:
527            stri += 'index :\n'
528            for idx in self.lidx:
529                stri += '    ' + str(idx) + '\n'
530        return stri
531
532    def __repr__(self):
533        '''return classname, number of value and number of indexes'''
534        return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']'
535
536    def __len__(self):
537        ''' len of values'''
538        if not self.lindex:
539            return 0
540        return len(self.lindex[0])
541
542    def __contains__(self, item):
543        ''' list of lindex values'''
544        return item in self.lindex
545
546    def __getitem__(self, ind):
547        ''' return value record (value conversion)'''
548        res = [idx[ind] for idx in self.lindex]
549        if len(res) == 1:
550            return res[0]
551        return res
552
553    def __setitem__(self, ind, item):
554        ''' modify the Field values for each Field at the row ind'''
555        if not isinstance(item, list):
556            item = [item]
557        for val, idx in zip(item, self.lindex):
558            idx[ind] = val
559
560    def __delitem__(self, ind):
561        ''' remove all Field item at the row ind'''
562        for idx in self.lindex:
563            del idx[ind]
564
565    def __hash__(self):
566        '''return sum of all hash(Field)'''
567        return sum([hash(idx) for idx in self.lindex])
568
569    def _hashi(self):
570        '''return sum of all hashi(Field)'''
571        return sum([idx._hashi() for idx in self.lindex])
572
573    def __eq__(self, other):
574        ''' equal if hash values are equal'''
575        return hash(self) == hash(other)
576
577    def __add__(self, other):
578        ''' Add other's values to self's values in a new Dataset'''
579        newil = copy(self)
580        newil.__iadd__(other)
581        return newil
582
583    def __iadd__(self, other):
584        ''' Add other's values to self's values'''
585        return self.add(other, name=True, solve=False)
586
587    def __or__(self, other):
588        ''' Add other's index to self's index in a new Dataset'''
589        newil = copy(self)
590        newil.__ior__(other)
591        return newil
592
593    def __ior__(self, other):
594        ''' Add other's index to self's index'''
595        return self.orindex(other, first=False, merge=True, update=False)
596
597    def __copy__(self):
598        ''' Copy all the data '''
599        return self.__class__(self)
600
601# %% property
602    @property
603    def complete(self):
604        '''return a boolean (True if Dataset is complete and consistent)'''
605        return self.lencomplete == len(self) and self.consistent
606
607    @property
608    def consistent(self):
609        ''' True if all the record are different'''
610        if not self.iidx:
611            return True
612        return max(Counter(zip(*self.iidx)).values()) == 1
613
614    @property
615    def category(self):
616        ''' dict with category for each Field'''
617        return {field['name']: field['cat'] for field in self.indexinfos()}
618
619    @property
620    def dimension(self):
621        ''' integer : number of primary Field'''
622        return len(self.primary)
623
624    @property
625    def extidx(self):
626        '''idx values (see data model)'''
627        return [idx.values for idx in self.lidx]
628
629    @property
630    def extidxext(self):
631        '''idx val (see data model)'''
632        return [idx.val for idx in self.lidx]
633
634    @property
635    def groups(self):
636        ''' list with crossed Field groups'''
637        return self.analysis.getgroups()
638
639    @property
640    def idxname(self):
641        ''' list of idx name'''
642        return [idx.name for idx in self.lidx]
643
644    @property
645    def idxlen(self):
646        ''' list of idx codec length'''
647        return [len(idx.codec) for idx in self.lidx]
648
649    @property
650    def indexlen(self):
651        ''' list of index codec length'''
652        return [len(idx.codec) for idx in self.lindex]
653
654    @property
655    def iidx(self):
656        ''' list of keys for each idx'''
657        return [idx.keys for idx in self.lidx]
658
659    @property
660    def iindex(self):
661        ''' list of keys for each index'''
662        return [idx.keys for idx in self.lindex]
663
664    @property
665    def keys(self):
666        ''' list of keys for each index'''
667        return [idx.keys for idx in self.lindex]
668
669    @property
670    def lencomplete(self):
671        '''number of values if complete (prod(idxlen primary))'''
672        primary = self.primary
673        return util.mul([self.idxlen[i] for i in primary])
674
675    @property
676    def lenindex(self):
677        ''' number of indexes'''
678        return len(self.lindex)
679
680    @property
681    def lenidx(self):
682        ''' number of idx'''
683        return len(self.lidx)
684
685    @property
686    def lidx(self):
687        '''list of idx'''
688        return [self.lindex[i] for i in self.lidxrow]
689
690    @property
691    def lisvar(self):
692        '''list of boolean : True if Field is var'''
693        return [name in self.lvarname for name in self.lname]
694
695    @property
696    def lvar(self):
697        '''list of var'''
698        return [self.lindex[i] for i in self.lvarrow]
699
700    @property
701    def lvarname(self):
702        ''' list of variable Field name'''
703        return self.analysis.getvarname()
704
705    @property
706    def lunicrow(self):
707        '''list of unic idx row'''
708        return [self.lname.index(name) for name in self.lunicname]
709
710    @property
711    def lvarrow(self):
712        '''list of var row'''
713        return [self.lname.index(name) for name in self.lvarname]
714
715    @property
716    def lidxrow(self):
717        '''list of idx row'''
718        return [i for i in range(self.lenindex) if i not in self.lvarrow]
719
720    @property
721    def lunicname(self):
722        ''' list of unique index name'''
723        return [idx.name for idx in self.lindex if len(idx.codec) == 1]
724
725    @property
726    def lname(self):
727        ''' list of index name'''
728        return [idx.name for idx in self.lindex]
729
730    @property
731    def primary(self):
732        ''' list of primary idx'''
733        return self.analysis.getprimary()
734
735    @property
736    def primaryname(self):
737        ''' list of primary name'''
738        return [self.lidx[idx].name for idx in self.primary]
739
740    @property
741    def secondary(self):
742        ''' list of secondary idx'''
743        return self.analysis.getsecondary()
744
745    @property
746    def secondaryname(self):
747        ''' list of secondary name'''
748        return [self.lindex[idx].name for idx in self.secondary]
749
750    @property
751    def setidx(self):
752        '''list of codec for each idx'''
753        return [idx.codec for idx in self.lidx]
754
755    @property
756    def tiindex(self):
757        ''' list of keys for each record'''
758        return util.list(list(zip(*self.iindex)))
759
760    @property
761    def zip(self):
762        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
763        textidx = util.transpose(self.extidx)
764        if not textidx:
765            return None
766        return tuple(tuple(idx) for idx in textidx)

class Dataset(observation.dataset_structure.DatasetStructure, observation.dataset_interface.DatasetInterface, abc.ABC): View Source

 43class Dataset(DatasetStructure, DatasetInterface, ABC):
 44    # %% intro
 45    '''
 46    An `Dataset` is a representation of an indexed list.
 47
 48    *Attributes (for @property see methods)* :
 49
 50    - **lindex** : list of Field
 51    - **analysis** : Analysis object (data structure)
 52
 53    The methods defined in this class are :
 54
 55    *constructor (@classmethod))*
 56
 57    - `Dataset.ntv`
 58    - `Dataset.from_csv`
 59    - `Dataset.from_ntv`
 60    - `Dataset.from_file`
 61    - `Dataset.merge`
 62
 63    *abstract static methods (@abstractmethod, @staticmethod)*
 64
 65    - `Dataset.field_class`
 66    
 67    *dynamic value - module analysis (getters @property)*
 68
 69    - `Dataset.extidx`
 70    - `Dataset.extidxext`
 71    - `Dataset.groups`
 72    - `Dataset.idxname`
 73    - `Dataset.idxlen`
 74    - `Dataset.iidx`
 75    - `Dataset.lenidx`
 76    - `Dataset.lidx`
 77    - `Dataset.lidxrow`
 78    - `Dataset.lisvar`
 79    - `Dataset.lvar`
 80    - `Dataset.lvarname`
 81    - `Dataset.lvarrow`
 82    - `Dataset.lunicname`
 83    - `Dataset.lunicrow`
 84    - `Dataset.primaryname`
 85    - `Dataset.setidx`
 86    - `Dataset.zip`
 87
 88    *dynamic value (getters @property)*
 89
 90    - `Dataset.keys`
 91    - `Dataset.iindex`
 92    - `Dataset.indexlen`
 93    - `Dataset.lenindex`
 94    - `Dataset.lname`
 95    - `Dataset.tiindex`
 96
 97    *global value (getters @property)*
 98
 99    - `Dataset.category`
100    - `Dataset.complete`
101    - `Dataset.consistent`
102    - `Dataset.dimension`
103    - `Dataset.lencomplete`
104    - `Dataset.primary`
105    - `Dataset.secondary`
106
107    *selecting - infos methods (`observation.dataset_structure.DatasetStructure`)*
108
109    - `Dataset.couplingmatrix`
110    - `Dataset.idxrecord`
111    - `Dataset.indexinfos`
112    - `Dataset.indicator`
113    - `Dataset.iscanonorder`
114    - `Dataset.isinrecord`
115    - `Dataset.keytoval`
116    - `Dataset.loc`
117    - `Dataset.nindex`
118    - `Dataset.record`
119    - `Dataset.recidx`
120    - `Dataset.recvar`
121    - `Dataset.tree`
122    - `Dataset.valtokey`
123
124    *add - update methods (`observation.dataset_structure.DatasetStructure`)*
125
126    - `Dataset.add`
127    - `Dataset.addindex`
128    - `Dataset.append`
129    - `Dataset.delindex`
130    - `Dataset.delrecord`
131    - `Dataset.orindex`
132    - `Dataset.renameindex`
133    - `Dataset.setvar`
134    - `Dataset.setname`
135    - `Dataset.updateindex`
136
137    *structure management - methods (`observation.dataset_structure.DatasetStructure`)*
138
139    - `Dataset.applyfilter`
140    - `Dataset.coupling`
141    - `Dataset.full`
142    - `Dataset.getduplicates`
143    - `Dataset.mix`
144    - `Dataset.merging`
145    - `Dataset.reindex`
146    - `Dataset.reorder`
147    - `Dataset.setfilter`
148    - `Dataset.sort`
149    - `Dataset.swapindex`
150    - `Dataset.setcanonorder`
151    - `Dataset.tostdcodec`
152
153    *exports methods (`observation.dataset_interface.DatasetInterface`)*
154
155    - `Dataset.json`
156    - `Dataset.plot`
157    - `Dataset.to_obj`
158    - `Dataset.to_csv`
159    - `Dataset.to_dataframe`
160    - `Dataset.to_file`
161    - `Dataset.to_ntv`
162    - `Dataset.to_obj`
163    - `Dataset.to_xarray`
164    - `Dataset.view`
165    - `Dataset.vlist`
166    - `Dataset.voxel`
167    '''
168
169    field_class = None
170    
171    def __init__(self, listidx=None, reindex=True):
172        '''
173        Dataset constructor.
174
175        *Parameters*
176
177        - **listidx** :  list (default None) - list of Field data
178        - **reindex** : boolean (default True) - if True, default codec for each Field'''
179
180        self.name     = self.__class__.__name__
181        self.field    = self.field_class
182        self.analysis = Analysis(self)
183        self.lindex   = []
184        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
185            self.lindex = [copy(idx) for idx in listidx.lindex]
186            return
187        if not listidx:
188            return
189        self.lindex   = listidx
190        if reindex:
191            self.reindex()
192        self.analysis.actualize()
193        return
194
195    """@classmethod
196    def dic(cls, idxdic=None, reindex=True):
197        '''
198        Dataset constructor (external dictionnary).
199
200        *Parameters*
201
202        - **idxdic** : {name : values}  (see data model)
203        if not idxdic:
204            return cls.ext(idxval=None, idxname=None, reindex=reindex)
205        if isinstance(idxdic, Dataset):
206            return idxdic
207        if not isinstance(idxdic, dict):
208            raise DatasetError("idxdic not dict")
209        return cls.ext(idxval=list(idxdic.values()), idxname=list(idxdic.keys()),
210                       reindex=reindex)"""
211
212    """@classmethod
213    def ext(cls, idxval=None, idxname=None, reindex=True):
214        '''
215        Dataset constructor (external index).
216
217        *Parameters*
218
219        - **idxval** : list of Field or list of values (see data model)
220        - **idxname** : list of string (default None) - list of Field name (see data model)
221        if idxval is None:
222            idxval = []
223        if not isinstance(idxval, list):
224            return None
225        val = [ [idx] if not isinstance(idx, list) else idx for idx in idxval]
226        lenval = [len(idx) for idx in val]
227        if lenval and max(lenval) != min(lenval):
228            raise DatasetError('the length of Field are different')
229        length = lenval[0] if lenval else 0
230        if idxname is None:
231            idxname = [None] * len(val)
232        for ind, name in enumerate(idxname):
233            if name is None or name == ES.defaultindex:
234                idxname[ind] = 'i'+str(ind)
235        lidx = [list(FieldInterface.decodeobj(
236            idx, typevalue, context=False)) for idx in val]
237        lindex = [Field(idx[2], name, list(range(length)), idx[1],
238                         lendefault=length, reindex=reindex)
239                  for idx, name in zip(lidx, idxname)]
240        return cls(lindex, reindex=False)"""
241
242    @classmethod
243    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
244                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
245        '''
246        Dataset constructor (from a csv file). Each column represents index values.
247
248        *Parameters*
249
250        - **filename** : string (default 'dataset.csv'), name of the file to read
251        - **header** : boolean (default True). If True, the first raw is dedicated to names
252        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
253        - **optcsv** : dict (default : quoting) - see csv.reader options'''
254        if not optcsv:
255            optcsv = {}
256        if not nrow:
257            nrow = -1
258        with open(filename, newline='', encoding="utf-8") as file:
259            reader = csv.reader(file, **optcsv)
260            irow = 0
261            for row in reader:
262                if irow == nrow:
263                    break
264                if irow == 0:
265                    idxval = [[] for i in range(len(row))]
266                    idxname = [''] * len(row)
267                if irow == 0 and header:
268                    idxname = row
269                else:
270                    for i in range(len(row)):
271                        if decode_json:
272                            try:
273                                idxval[i].append(json.loads(row[i]))
274                            except:
275                                idxval[i].append(row[i])
276                        else:
277                            idxval[i].append(row[i])
278                irow += 1
279        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
280        return cls(listidx=lindex, reindex=True)
281
282    @classmethod
283    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
284        '''
285        Generate Object from file storage.
286
287         *Parameters*
288
289        - **filename** : string - file name (with path)
290        - **forcestring** : boolean (default False) - if True,
291        forces the UTF-8 data format, else the format is calculated
292        - **reindex** : boolean (default True) - if True, default codec for each Field
293        - **decode_str**: boolean (default False) - if True, string are loaded in json data
294
295        *Returns* : new Object'''
296        with open(filename, 'rb') as file:
297            btype = file.read(1)
298        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
299            with open(filename, 'r', newline='', encoding="utf-8") as file:
300                bjson = file.read()
301        else:
302            with open(filename, 'rb') as file:
303                bjson = file.read()
304        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
305
306    """@classmethod
307    def obj(cls, bsd=None, reindex=True, context=True):
308        '''
309        Generate a new Object from a bytes, string or list value
310
311        *Parameters*
312
313        - **bsd** : bytes, string or list data to convert
314        - **reindex** : boolean (default True) - if True, default codec for each Field
315        - **context** : boolean (default True) - if False, only codec and keys are included'''
316        return cls.from_obj(bsd, reindex=reindex, context=context)"""
317
318    @classmethod
319    def ntv(cls, ntv_value, reindex=True):
320        '''Generate an Dataset Object from a ntv_value
321
322        *Parameters*
323
324        - **ntv_value** : bytes, string, Ntv object to convert
325        - **reindex** : boolean (default True) - if True, default codec for each Field'''
326        return cls.from_ntv(ntv_value, reindex=reindex)
327    
328    @classmethod
329    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
330        '''Generate an Dataset Object from a ntv_value
331
332        *Parameters*
333
334        - **ntv_value** : bytes, string, Ntv object to convert
335        - **reindex** : boolean (default True) - if True, default codec for each Field
336        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
337        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
338        if len(ntv) == 0:
339            return cls()
340        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
341        leng = max([idx[6] for idx in lidx])
342        for ind in range(len(lidx)):
343            if lidx[ind][0] == '':
344                lidx[ind][0] = 'i'+str(ind)
345            NtvConnector.init_ntv_keys(ind, lidx, leng)
346            #Dataset._init_ntv_keys(ind, lidx, leng)
347        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
348                     reindex=reindex) for idx in lidx]
349        return cls(lindex, reindex=reindex)
350
351    """@classmethod
352    def from_obj(cls, bsd=None, reindex=True, context=True):
353        '''
354        Generate an Dataset Object from a bytes, string or list value
355
356        *Parameters*
357
358        - **bsd** : bytes, string, DataFrame or list data to convert
359        - **reindex** : boolean (default True) - if True, default codec for each Field
360        - **context** : boolean (default True) - if False, only codec and keys are included'''
361        if isinstance(bsd, cls):
362            return bsd
363        if bsd is None:
364            bsd = []
365        if isinstance(bsd, bytes):
366            lis = cbor2.loads(bsd)
367        elif isinstance(bsd, str):
368            lis = json.loads(bsd, object_hook=CborDecoder().codecbor)
369        elif isinstance(bsd, (list, dict)) or bsd.__class__.__name__ == 'DataFrame':
370            lis = bsd
371        else:
372            raise DatasetError("the type of parameter is not available")
373        return cls._init_obj(lis, reindex=reindex, context=context)"""
374
375    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
376        '''
377        Merge method replaces Dataset objects included into its constituents.
378
379        *Parameters*
380
381        - **fillvalue** : object (default nan) - value used for the additional data
382        - **reindex** : boolean (default False) - if True, set default codec after transformation
383        - **simplename** : boolean (default False) - if True, new Field name are
384        the same as merged Field name else it is a composed name.
385
386        *Returns*: merged Dataset '''
387        ilc = copy(self)
388        delname = []
389        row = ilc[0]
390        if not isinstance(row, list):
391            row = [row]
392        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
393                                                      simplename=simplename)
394        if oldname and not oldname in merged.lname:
395            delname.append(oldname)
396        for ind in range(1, len(ilc)):
397            oldidx = ilc.nindex(oldname)
398            for name in newname:
399                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
400            row = ilc[ind]
401            if not isinstance(row, list):
402                row = [row]
403            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
404                                                       simplename=simplename)
405            if oldname and newname != [oldname]:
406                delname.append(oldname)
407            for name in newname:
408                oldidx = merged.nindex(oldname)
409                fillval = self.field.s_to_i(fillvalue)
410                merged.addindex(
411                    self.field([fillval] * len(merged), name, oldidx.keys))
412            merged += rec
413        for name in set(delname):
414            if name:
415                merged.delindex(name)
416        if reindex:
417            merged.reindex()
418        ilc.lindex = merged.lindex
419        return ilc
420
421    @classmethod
422    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
423        '''
424        Dataset constructor (external index).
425
426        *Parameters*
427
428        - **idxval** : list of Field or list of values (see data model)
429        - **idxname** : list of string (default None) - list of Field name (see data model)'''
430        if idxval is None:
431            idxval = []
432        if not isinstance(idxval, list):
433            return None
434        val = []
435        for idx in idxval:
436            if not isinstance(idx, list):
437                val.append([idx])
438            else:
439                val.append(idx)
440        lenval = [len(idx) for idx in val]
441        if lenval and max(lenval) != min(lenval):
442            raise DatasetError('the length of Iindex are different')
443        length = lenval[0] if lenval else 0
444        idxname = [None] * len(val) if idxname is None else idxname
445        for ind, name in enumerate(idxname):
446            if name is None or name == '$default':
447                idxname[ind] = 'i'+str(ind)
448        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
449                                  fast=fast) for codec, name in zip(val, idxname)]
450        return cls(lindex, reindex=False)
451    
452# %% internal
453
454    """@staticmethod
455    def _init_ntv_keys(ind, lidx, leng):
456        ''' initialization of explicit keys data in lidx object'''
457        # name: 0, type: 1, codec: 2, parent: 3, keys: 4, coef: 5, leng: 6
458        name, typ, codec, parent, keys, coef, length = lidx[ind]
459        if (keys, parent, coef) == (None, None, None):  # full or unique
460            if len(codec) == 1: # unique
461                lidx[ind][4] = [0] * leng
462            elif len(codec) == leng:    # full
463                lidx[ind][4] = list(range(leng))
464            else:
465                raise DatasetError('impossible to generate keys')
466            return
467        if keys and len(keys) > 1 and parent is None:  #complete
468            return
469        if coef:  #primary
470            lidx[ind][4] = [(ikey % (coef * len(codec))) // coef for ikey in range(leng)]
471            lidx[ind][3] = None
472            return  
473        if parent is None:
474            raise DatasetError('keys not referenced')          
475        if not lidx[parent][4] or len(lidx[parent][4]) != leng:
476            Dataset._init_ntv_keys(parent, lidx, leng)
477        if not keys and len(codec) == len(lidx[parent][2]):    # implicit
478            lidx[ind][4] = lidx[parent][4]
479            lidx[ind][3] = None
480            return
481        lidx[ind][4] = Nfield.keysfromderkeys(lidx[parent][4], keys)  # relative
482        lidx[ind][3] = None
483        return"""
484
485    @staticmethod
486    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False):
487        #row = rec[0] if isinstance(rec, list) else rec
488        row = rec[0]
489        if not isinstance(row, list):
490            row = [row]
491        var = -1
492        for ind, val in enumerate(row):
493            if val.__class__.__name__ in ['Sdataset', 'Ndataset', 'Observation']:
494                var = ind
495                break
496        if var < 0:
497            return (rec, None, [])
498        ilis = row[var]
499        oldname = rec.lname[var]
500        if ilis.lname == ['i0']:
501            newname = [oldname]
502            ilis.setname(newname)
503        elif not simplename:
504            newname = [oldname + '_' + name for name in ilis.lname]
505            ilis.setname(newname)
506        else:
507            newname = copy(ilis.lname)
508        for name in rec.lname:
509            if name in newname:
510                newname.remove(name)
511            else:
512                updidx = name in ilis.lname and not updateidx
513                ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
514                              merge=mergeidx, update=updidx)
515                #ilis.addindex([name, [rec.nindex(name)[0]] * len(ilis)],
516                #              merge=mergeidx, update=updidx)
517        return (ilis, oldname, newname)
518
519# %% special
520    def __str__(self):
521        '''return string format for var and lidx'''
522        stri = ''
523        if self.lvar:
524            stri += 'variables :\n'
525            for idx in self.lvar:
526                stri += '    ' + str(idx) + '\n'
527        if self.lidx:
528            stri += 'index :\n'
529            for idx in self.lidx:
530                stri += '    ' + str(idx) + '\n'
531        return stri
532
533    def __repr__(self):
534        '''return classname, number of value and number of indexes'''
535        return self.__class__.__name__ + '[' + str(len(self)) + ', ' + str(self.lenindex) + ']'
536
537    def __len__(self):
538        ''' len of values'''
539        if not self.lindex:
540            return 0
541        return len(self.lindex[0])
542
543    def __contains__(self, item):
544        ''' list of lindex values'''
545        return item in self.lindex
546
547    def __getitem__(self, ind):
548        ''' return value record (value conversion)'''
549        res = [idx[ind] for idx in self.lindex]
550        if len(res) == 1:
551            return res[0]
552        return res
553
554    def __setitem__(self, ind, item):
555        ''' modify the Field values for each Field at the row ind'''
556        if not isinstance(item, list):
557            item = [item]
558        for val, idx in zip(item, self.lindex):
559            idx[ind] = val
560
561    def __delitem__(self, ind):
562        ''' remove all Field item at the row ind'''
563        for idx in self.lindex:
564            del idx[ind]
565
566    def __hash__(self):
567        '''return sum of all hash(Field)'''
568        return sum([hash(idx) for idx in self.lindex])
569
570    def _hashi(self):
571        '''return sum of all hashi(Field)'''
572        return sum([idx._hashi() for idx in self.lindex])
573
574    def __eq__(self, other):
575        ''' equal if hash values are equal'''
576        return hash(self) == hash(other)
577
578    def __add__(self, other):
579        ''' Add other's values to self's values in a new Dataset'''
580        newil = copy(self)
581        newil.__iadd__(other)
582        return newil
583
584    def __iadd__(self, other):
585        ''' Add other's values to self's values'''
586        return self.add(other, name=True, solve=False)
587
588    def __or__(self, other):
589        ''' Add other's index to self's index in a new Dataset'''
590        newil = copy(self)
591        newil.__ior__(other)
592        return newil
593
594    def __ior__(self, other):
595        ''' Add other's index to self's index'''
596        return self.orindex(other, first=False, merge=True, update=False)
597
598    def __copy__(self):
599        ''' Copy all the data '''
600        return self.__class__(self)
601
602# %% property
603    @property
604    def complete(self):
605        '''return a boolean (True if Dataset is complete and consistent)'''
606        return self.lencomplete == len(self) and self.consistent
607
608    @property
609    def consistent(self):
610        ''' True if all the record are different'''
611        if not self.iidx:
612            return True
613        return max(Counter(zip(*self.iidx)).values()) == 1
614
615    @property
616    def category(self):
617        ''' dict with category for each Field'''
618        return {field['name']: field['cat'] for field in self.indexinfos()}
619
620    @property
621    def dimension(self):
622        ''' integer : number of primary Field'''
623        return len(self.primary)
624
625    @property
626    def extidx(self):
627        '''idx values (see data model)'''
628        return [idx.values for idx in self.lidx]
629
630    @property
631    def extidxext(self):
632        '''idx val (see data model)'''
633        return [idx.val for idx in self.lidx]
634
635    @property
636    def groups(self):
637        ''' list with crossed Field groups'''
638        return self.analysis.getgroups()
639
640    @property
641    def idxname(self):
642        ''' list of idx name'''
643        return [idx.name for idx in self.lidx]
644
645    @property
646    def idxlen(self):
647        ''' list of idx codec length'''
648        return [len(idx.codec) for idx in self.lidx]
649
650    @property
651    def indexlen(self):
652        ''' list of index codec length'''
653        return [len(idx.codec) for idx in self.lindex]
654
655    @property
656    def iidx(self):
657        ''' list of keys for each idx'''
658        return [idx.keys for idx in self.lidx]
659
660    @property
661    def iindex(self):
662        ''' list of keys for each index'''
663        return [idx.keys for idx in self.lindex]
664
665    @property
666    def keys(self):
667        ''' list of keys for each index'''
668        return [idx.keys for idx in self.lindex]
669
670    @property
671    def lencomplete(self):
672        '''number of values if complete (prod(idxlen primary))'''
673        primary = self.primary
674        return util.mul([self.idxlen[i] for i in primary])
675
676    @property
677    def lenindex(self):
678        ''' number of indexes'''
679        return len(self.lindex)
680
681    @property
682    def lenidx(self):
683        ''' number of idx'''
684        return len(self.lidx)
685
686    @property
687    def lidx(self):
688        '''list of idx'''
689        return [self.lindex[i] for i in self.lidxrow]
690
691    @property
692    def lisvar(self):
693        '''list of boolean : True if Field is var'''
694        return [name in self.lvarname for name in self.lname]
695
696    @property
697    def lvar(self):
698        '''list of var'''
699        return [self.lindex[i] for i in self.lvarrow]
700
701    @property
702    def lvarname(self):
703        ''' list of variable Field name'''
704        return self.analysis.getvarname()
705
706    @property
707    def lunicrow(self):
708        '''list of unic idx row'''
709        return [self.lname.index(name) for name in self.lunicname]
710
711    @property
712    def lvarrow(self):
713        '''list of var row'''
714        return [self.lname.index(name) for name in self.lvarname]
715
716    @property
717    def lidxrow(self):
718        '''list of idx row'''
719        return [i for i in range(self.lenindex) if i not in self.lvarrow]
720
721    @property
722    def lunicname(self):
723        ''' list of unique index name'''
724        return [idx.name for idx in self.lindex if len(idx.codec) == 1]
725
726    @property
727    def lname(self):
728        ''' list of index name'''
729        return [idx.name for idx in self.lindex]
730
731    @property
732    def primary(self):
733        ''' list of primary idx'''
734        return self.analysis.getprimary()
735
736    @property
737    def primaryname(self):
738        ''' list of primary name'''
739        return [self.lidx[idx].name for idx in self.primary]
740
741    @property
742    def secondary(self):
743        ''' list of secondary idx'''
744        return self.analysis.getsecondary()
745
746    @property
747    def secondaryname(self):
748        ''' list of secondary name'''
749        return [self.lindex[idx].name for idx in self.secondary]
750
751    @property
752    def setidx(self):
753        '''list of codec for each idx'''
754        return [idx.codec for idx in self.lidx]
755
756    @property
757    def tiindex(self):
758        ''' list of keys for each record'''
759        return util.list(list(zip(*self.iindex)))
760
761    @property
762    def zip(self):
763        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
764        textidx = util.transpose(self.extidx)
765        if not textidx:
766            return None
767        return tuple(tuple(idx) for idx in textidx)

An Dataset is a representation of an indexed list.

Attributes (for @property see methods) :

lindex : list of Field
analysis : Analysis object (data structure)

The methods defined in this class are :

constructor (@classmethod))

Dataset.ntv
Dataset.from_csv
Dataset.from_ntv
Dataset.from_file
Dataset.merge

abstract static methods (@abstractmethod, @staticmethod)

Dataset.field_class

dynamic value - module analysis (getters @property)

Dataset.extidx
Dataset.extidxext
Dataset.groups
Dataset.idxname
Dataset.idxlen
Dataset.iidx
Dataset.lenidx
Dataset.lidx
Dataset.lidxrow
Dataset.lisvar
Dataset.lvar
Dataset.lvarname
Dataset.lvarrow
Dataset.lunicname
Dataset.lunicrow
Dataset.primaryname
Dataset.setidx
Dataset.zip

dynamic value (getters @property)

Dataset.keys
Dataset.iindex
Dataset.indexlen
Dataset.lenindex
Dataset.lname
Dataset.tiindex

global value (getters @property)

Dataset.category
Dataset.complete
Dataset.consistent
Dataset.dimension
Dataset.lencomplete
Dataset.primary
Dataset.secondary

selecting - infos methods (observation.dataset_structure.DatasetStructure)

Dataset.couplingmatrix
Dataset.idxrecord
Dataset.indexinfos
Dataset.indicator
Dataset.iscanonorder
Dataset.isinrecord
Dataset.keytoval
Dataset.loc
Dataset.nindex
Dataset.record
Dataset.recidx
Dataset.recvar
Dataset.tree
Dataset.valtokey

add - update methods (observation.dataset_structure.DatasetStructure)

Dataset.add
Dataset.addindex
Dataset.append
Dataset.delindex
Dataset.delrecord
Dataset.orindex
Dataset.renameindex
Dataset.setvar
Dataset.setname
Dataset.updateindex

structure management - methods (observation.dataset_structure.DatasetStructure)

Dataset.applyfilter
Dataset.coupling
Dataset.full
Dataset.getduplicates
Dataset.mix
Dataset.merging
Dataset.reindex
Dataset.reorder
Dataset.setfilter
Dataset.sort
Dataset.swapindex
Dataset.setcanonorder
Dataset.tostdcodec

exports methods (observation.dataset_interface.DatasetInterface)

Dataset.json
Dataset.plot
Dataset.to_obj
Dataset.to_csv
Dataset.to_dataframe
Dataset.to_file
Dataset.to_ntv
Dataset.to_obj
Dataset.to_xarray
Dataset.view
Dataset.vlist
Dataset.voxel

Dataset(listidx=None, reindex=True) View Source

171    def __init__(self, listidx=None, reindex=True):
172        '''
173        Dataset constructor.
174
175        *Parameters*
176
177        - **listidx** :  list (default None) - list of Field data
178        - **reindex** : boolean (default True) - if True, default codec for each Field'''
179
180        self.name     = self.__class__.__name__
181        self.field    = self.field_class
182        self.analysis = Analysis(self)
183        self.lindex   = []
184        if listidx.__class__.__name__ in ['Dataset', 'Observation', 'Ndataset', 'Sdataset']:
185            self.lindex = [copy(idx) for idx in listidx.lindex]
186            return
187        if not listidx:
188            return
189        self.lindex   = listidx
190        if reindex:
191            self.reindex()
192        self.analysis.actualize()
193        return

Dataset constructor.

Parameters

listidx : list (default None) - list of Field data
reindex : boolean (default True) - if True, default codec for each Field

field_class = None

name

field

analysis

lindex

@classmethod

def from_csv( cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, decode_json=True, optcsv={'quoting': 2}): View Source

242    @classmethod
243    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
244                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
245        '''
246        Dataset constructor (from a csv file). Each column represents index values.
247
248        *Parameters*
249
250        - **filename** : string (default 'dataset.csv'), name of the file to read
251        - **header** : boolean (default True). If True, the first raw is dedicated to names
252        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
253        - **optcsv** : dict (default : quoting) - see csv.reader options'''
254        if not optcsv:
255            optcsv = {}
256        if not nrow:
257            nrow = -1
258        with open(filename, newline='', encoding="utf-8") as file:
259            reader = csv.reader(file, **optcsv)
260            irow = 0
261            for row in reader:
262                if irow == nrow:
263                    break
264                if irow == 0:
265                    idxval = [[] for i in range(len(row))]
266                    idxname = [''] * len(row)
267                if irow == 0 and header:
268                    idxname = row
269                else:
270                    for i in range(len(row)):
271                        if decode_json:
272                            try:
273                                idxval[i].append(json.loads(row[i]))
274                            except:
275                                idxval[i].append(row[i])
276                        else:
277                            idxval[i].append(row[i])
278                irow += 1
279        lindex = [cls.field_class.from_ntv({name:idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
280        return cls(listidx=lindex, reindex=True)

Dataset constructor (from a csv file). Each column represents index values.

Parameters

filename : string (default 'dataset.csv'), name of the file to read
header : boolean (default True). If True, the first raw is dedicated to names
nrow : integer (default None). Number of row. If None, all the row else nrow
optcsv : dict (default : quoting) - see csv.reader options

@classmethod

def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False): View Source

282    @classmethod
283    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
284        '''
285        Generate Object from file storage.
286
287         *Parameters*
288
289        - **filename** : string - file name (with path)
290        - **forcestring** : boolean (default False) - if True,
291        forces the UTF-8 data format, else the format is calculated
292        - **reindex** : boolean (default True) - if True, default codec for each Field
293        - **decode_str**: boolean (default False) - if True, string are loaded in json data
294
295        *Returns* : new Object'''
296        with open(filename, 'rb') as file:
297            btype = file.read(1)
298        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
299            with open(filename, 'r', newline='', encoding="utf-8") as file:
300                bjson = file.read()
301        else:
302            with open(filename, 'rb') as file:
303                bjson = file.read()
304        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)

Generate Object from file storage.

Parameters

filename : string - file name (with path)
forcestring : boolean (default False) - if True, forces the UTF-8 data format, else the format is calculated
reindex : boolean (default True) - if True, default codec for each Field
decode_str: boolean (default False) - if True, string are loaded in json data

Returns : new Object

@classmethod

def ntv(cls, ntv_value, reindex=True): View Source

318    @classmethod
319    def ntv(cls, ntv_value, reindex=True):
320        '''Generate an Dataset Object from a ntv_value
321
322        *Parameters*
323
324        - **ntv_value** : bytes, string, Ntv object to convert
325        - **reindex** : boolean (default True) - if True, default codec for each Field'''
326        return cls.from_ntv(ntv_value, reindex=reindex)

Generate an Dataset Object from a ntv_value

Parameters

ntv_value : bytes, string, Ntv object to convert
reindex : boolean (default True) - if True, default codec for each Field

@classmethod

def from_ntv(cls, ntv_value, reindex=True, decode_str=False): View Source

328    @classmethod
329    def from_ntv(cls, ntv_value, reindex=True, decode_str=False):
330        '''Generate an Dataset Object from a ntv_value
331
332        *Parameters*
333
334        - **ntv_value** : bytes, string, Ntv object to convert
335        - **reindex** : boolean (default True) - if True, default codec for each Field
336        - **decode_str**: boolean (default False) - if True, string are loaded in json data'''
337        ntv = Ntv.obj(ntv_value, decode_str=decode_str)
338        if len(ntv) == 0:
339            return cls()
340        lidx = [list(cls.field_class.decode_ntv(ntvf)) for ntvf in ntv]
341        leng = max([idx[6] for idx in lidx])
342        for ind in range(len(lidx)):
343            if lidx[ind][0] == '':
344                lidx[ind][0] = 'i'+str(ind)
345            NtvConnector.init_ntv_keys(ind, lidx, leng)
346            #Dataset._init_ntv_keys(ind, lidx, leng)
347        lindex = [cls.field_class(idx[2], idx[0], idx[4], None, # idx[1] pour le type,
348                     reindex=reindex) for idx in lidx]
349        return cls(lindex, reindex=reindex)

Generate an Dataset Object from a ntv_value

Parameters

ntv_value : bytes, string, Ntv object to convert
reindex : boolean (default True) - if True, default codec for each Field
decode_str: boolean (default False) - if True, string are loaded in json data

def merge(self, fillvalue=nan, reindex=False, simplename=False): View Source

375    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
376        '''
377        Merge method replaces Dataset objects included into its constituents.
378
379        *Parameters*
380
381        - **fillvalue** : object (default nan) - value used for the additional data
382        - **reindex** : boolean (default False) - if True, set default codec after transformation
383        - **simplename** : boolean (default False) - if True, new Field name are
384        the same as merged Field name else it is a composed name.
385
386        *Returns*: merged Dataset '''
387        ilc = copy(self)
388        delname = []
389        row = ilc[0]
390        if not isinstance(row, list):
391            row = [row]
392        merged, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
393                                                      simplename=simplename)
394        if oldname and not oldname in merged.lname:
395            delname.append(oldname)
396        for ind in range(1, len(ilc)):
397            oldidx = ilc.nindex(oldname)
398            for name in newname:
399                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
400            row = ilc[ind]
401            if not isinstance(row, list):
402                row = [row]
403            rec, oldname, newname = Dataset._mergerecord(self.ext(row, ilc.lname),
404                                                       simplename=simplename)
405            if oldname and newname != [oldname]:
406                delname.append(oldname)
407            for name in newname:
408                oldidx = merged.nindex(oldname)
409                fillval = self.field.s_to_i(fillvalue)
410                merged.addindex(
411                    self.field([fillval] * len(merged), name, oldidx.keys))
412            merged += rec
413        for name in set(delname):
414            if name:
415                merged.delindex(name)
416        if reindex:
417            merged.reindex()
418        ilc.lindex = merged.lindex
419        return ilc

Merge method replaces Dataset objects included into its constituents.

Parameters

fillvalue : object (default nan) - value used for the additional data
reindex : boolean (default False) - if True, set default codec after transformation
simplename : boolean (default False) - if True, new Field name are the same as merged Field name else it is a composed name.

Returns: merged Dataset

@classmethod

def ext(cls, idxval=None, idxname=None, reindex=True, fast=False): View Source

421    @classmethod
422    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
423        '''
424        Dataset constructor (external index).
425
426        *Parameters*
427
428        - **idxval** : list of Field or list of values (see data model)
429        - **idxname** : list of string (default None) - list of Field name (see data model)'''
430        if idxval is None:
431            idxval = []
432        if not isinstance(idxval, list):
433            return None
434        val = []
435        for idx in idxval:
436            if not isinstance(idx, list):
437                val.append([idx])
438            else:
439                val.append(idx)
440        lenval = [len(idx) for idx in val]
441        if lenval and max(lenval) != min(lenval):
442            raise DatasetError('the length of Iindex are different')
443        length = lenval[0] if lenval else 0
444        idxname = [None] * len(val) if idxname is None else idxname
445        for ind, name in enumerate(idxname):
446            if name is None or name == '$default':
447                idxname[ind] = 'i'+str(ind)
448        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
449                                  fast=fast) for codec, name in zip(val, idxname)]
450        return cls(lindex, reindex=False)

Dataset constructor (external index).

Parameters

idxval : list of Field or list of values (see data model)
idxname : list of string (default None) - list of Field name (see data model)

complete

return a boolean (True if Dataset is complete and consistent)

consistent

True if all the record are different

Inherited Members

observation.dataset_structure.DatasetStructure: add; addindex; append; applyfilter; couplingmatrix; coupling; delrecord; delindex; full; getduplicates; iscanonorder; isinrecord; idxrecord; indexinfos; indicator; keytoval; loc; mix; merging; nindex; orindex; record; recidx; recvar; reindex; renameindex; reorder; setcanonorder; setfilter; setname; sort; swapindex; tostdcodec; tree; updateindex; valtokey
observation.dataset_interface.DatasetInterface: json; plot; to_csv; to_dataframe; to_file; to_ntv; to_xarray; voxel; view; vlist