tab-dataset.tab_dataset.dataset

The dataset module is part of the tab-dataset package.

It contains the classes DatasetAnalysis, Cdataset for Dataset entities.

For more information, see the user guide or the github repository.

   1# -*- coding: utf-8 -*-
   2"""
   3The `dataset` module is part of the `tab-dataset` package.
   4
   5It contains the classes `DatasetAnalysis`, `Cdataset` for Dataset entities.
   6
   7For more information, see the 
   8[user guide](https://loco-philippe.github.io/tab-dataset/docs/user_guide.html) 
   9or the [github repository](https://github.com/loco-philippe/tab-dataset).
  10"""
  11from collections import Counter
  12from copy import copy
  13import math
  14import json
  15import csv
  16
  17
  18from tab_dataset.cfield import Cutil
  19from tab_dataset.dataset_interface import DatasetInterface
  20from tab_dataset.field import Nfield, Sfield
  21from tab_dataset.cdataset import Cdataset, DatasetError
  22
  23FILTER = '$filter'
  24
  25class Sdataset(DatasetInterface, Cdataset):
  26    # %% intro
  27    '''
  28    `Sdataset` is a child class of Cdataset where internal value can be different
  29    from external value (list is converted in tuple and dict in json-object).
  30    
  31    One attribute is added: 'field' to define the 'field' class.
  32
  33    The methods defined in this class are :
  34
  35    *constructor (@classmethod)*
  36
  37    - `Sdataset.from_csv`
  38    - `Sdataset.from_file`
  39    - `Sdataset.merge`
  40    - `Sdataset.ext`
  41    - `Cdataset.ntv`
  42    - `Cdataset.from_ntv`
  43
  44    *dynamic value - module analysis (getters @property)*
  45
  46    - `DatasetAnalysis.analysis`
  47    - `DatasetAnalysis.anafields`
  48    - `Sdataset.extidx`
  49    - `Sdataset.extidxext`
  50    - `DatasetAnalysis.field_partition`
  51    - `Sdataset.idxname`
  52    - `Sdataset.idxlen`
  53    - `Sdataset.iidx`
  54    - `Sdataset.lenidx`
  55    - `Sdataset.lidx`
  56    - `Sdataset.lidxrow`
  57    - `Sdataset.lisvar`
  58    - `Sdataset.lvar`
  59    - `DatasetAnalysis.lvarname`
  60    - `Sdataset.lvarrow`
  61    - `Cdataset.lunicname`
  62    - `Cdataset.lunicrow`
  63    - `DatasetAnalysis.partitions`
  64    - `DatasetAnalysis.primaryname`
  65    - `DatasetAnalysis.relation`
  66    - `DatasetAnalysis.secondaryname`
  67    - `Sdataset.setidx`
  68    - `Sdataset.zip`
  69
  70    *dynamic value (getters @property)*
  71
  72    - `Cdataset.keys`
  73    - `Cdataset.iindex`
  74    - `Cdataset.indexlen`
  75    - `Cdataset.lenindex`
  76    - `Cdataset.lname`
  77    - `Cdataset.tiindex`
  78
  79    *global value (getters @property)*
  80
  81    - `DatasetAnalysis.complete`
  82    - `Sdataset.consistent`
  83    - `DatasetAnalysis.dimension`
  84    - `Sdataset.primary`
  85    - `Sdataset.secondary`
  86
  87    *selecting - infos methods*
  88
  89    - `Sdataset.idxrecord`
  90    - `DatasetAnalysis.indexinfos`
  91    - `DatasetAnalysis.indicator`
  92    - `Sdataset.iscanonorder`
  93    - `Sdataset.isinrecord`
  94    - `Sdataset.keytoval`
  95    - `Sdataset.loc`
  96    - `Cdataset.nindex`
  97    - `Sdataset.record`
  98    - `Sdataset.recidx`
  99    - `Sdataset.recvar`
 100    - `Cdataset.to_analysis`
 101    - `DatasetAnalysis.tree`
 102    - `Sdataset.valtokey`
 103
 104    *add - update methods*
 105
 106    - `Cdataset.add`
 107    - `Sdataset.addindex`
 108    - `Sdataset.append`
 109    - `Cdataset.delindex`
 110    - `Sdataset.delrecord`
 111    - `Sdataset.orindex`
 112    - `Cdataset.renameindex`
 113    - `Cdataset.setname`
 114    - `Sdataset.updateindex`
 115
 116    *structure management - methods*
 117
 118    - `Sdataset.applyfilter`
 119    - `Cdataset.check_relation`
 120    - `Cdataset.check_relationship`
 121    - `Sdataset.coupling`
 122    - `Sdataset.full`
 123    - `Sdataset.getduplicates`
 124    - `Sdataset.mix`
 125    - `Sdataset.merging`
 126    - `Cdataset.reindex`
 127    - `Cdataset.reorder`
 128    - `Sdataset.setfilter`
 129    - `Sdataset.sort`
 130    - `Cdataset.swapindex`
 131    - `Sdataset.setcanonorder`
 132    - `Sdataset.tostdcodec`
 133
 134    *exports methods (`observation.dataset_interface.DatasetInterface`)*
 135
 136    - `Dataset.json`
 137    - `Dataset.plot`
 138    - `Dataset.to_obj`
 139    - `Dataset.to_csv`
 140    - `Dataset.to_dataframe`
 141    - `Dataset.to_file`
 142    - `Dataset.to_ntv`
 143    - `Dataset.to_obj`
 144    - `Dataset.to_xarray`
 145    - `Dataset.view`
 146    - `Dataset.vlist`
 147    - `Dataset.voxel`
 148    '''
 149
 150    field_class = Sfield
 151
 152    def __init__(self, listidx=None, name=None, reindex=True):
 153        '''
 154        Dataset constructor.
 155
 156        *Parameters*
 157
 158        - **listidx** :  list (default None) - list of Field data
 159        - **name** :  string (default None) - name of the dataset
 160        - **reindex** : boolean (default True) - if True, default codec for each Field'''
 161
 162        self.field = self.field_class
 163        Cdataset.__init__(self, listidx, name, reindex=reindex)
 164
 165    @classmethod
 166    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
 167                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
 168        '''
 169        Dataset constructor (from a csv file). Each column represents index values.
 170
 171        *Parameters*
 172
 173        - **filename** : string (default 'dataset.csv'), name of the file to read
 174        - **header** : boolean (default True). If True, the first raw is dedicated to names
 175        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
 176        - **optcsv** : dict (default : quoting) - see csv.reader options'''
 177        if not optcsv:
 178            optcsv = {}
 179        if not nrow:
 180            nrow = -1
 181        with open(filename, newline='', encoding="utf-8") as file:
 182            reader = csv.reader(file, **optcsv)
 183            irow = 0
 184            for row in reader:
 185                if irow == nrow:
 186                    break
 187                if irow == 0:
 188                    idxval = [[] for i in range(len(row))]
 189                    idxname = [''] * len(row)
 190                if irow == 0 and header:
 191                    idxname = row
 192                else:
 193                    for i in range(len(row)):
 194                        if decode_json:
 195                            try:
 196                                idxval[i].append(json.loads(row[i]))
 197                            except:
 198                                idxval[i].append(row[i])
 199                        else:
 200                            idxval[i].append(row[i])
 201                irow += 1
 202        lindex = [cls.field_class.from_ntv(
 203            {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
 204        return cls(listidx=lindex, reindex=True)
 205
 206    @classmethod
 207    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
 208        '''
 209        Generate Object from file storage.
 210
 211         *Parameters*
 212
 213        - **filename** : string - file name (with path)
 214        - **forcestring** : boolean (default False) - if True,
 215        forces the UTF-8 data format, else the format is calculated
 216        - **reindex** : boolean (default True) - if True, default codec for each Field
 217        - **decode_str**: boolean (default False) - if True, string are loaded in json data
 218
 219        *Returns* : new Object'''
 220        with open(filename, 'rb') as file:
 221            btype = file.read(1)
 222        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
 223            with open(filename, 'r', newline='', encoding="utf-8") as file:
 224                bjson = file.read()
 225        else:
 226            with open(filename, 'rb') as file:
 227                bjson = file.read()
 228        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
 229
 230    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
 231        '''
 232        Merge method replaces Dataset objects included into its constituents.
 233
 234        *Parameters*
 235
 236        - **fillvalue** : object (default nan) - value used for the additional data
 237        - **reindex** : boolean (default False) - if True, set default codec after transformation
 238        - **simplename** : boolean (default False) - if True, new Field name are
 239        the same as merged Field name else it is a composed name.
 240
 241        *Returns*: merged Dataset '''
 242        ilc = copy(self)
 243        delname = []
 244        row = ilc[0]
 245        if not isinstance(row, list):
 246            row = [row]
 247        merged, oldname, newname = self.__class__._mergerecord(
 248            self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue,
 249            reindex=reindex)
 250        delname.append(oldname)
 251        for ind in range(1, len(ilc)):
 252            oldidx = ilc.nindex(oldname)
 253            for name in newname:
 254                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
 255            row = ilc[ind]
 256            if not isinstance(row, list):
 257                row = [row]
 258            rec, oldname, newname = self.__class__._mergerecord(
 259                self.ext(row, ilc.lname), simplename=simplename)
 260            if oldname and newname != [oldname]:
 261                delname.append(oldname)
 262            for name in newname:
 263                oldidx = merged.nindex(oldname)
 264                fillval = self.field.s_to_i(fillvalue)
 265                merged.addindex(
 266                    self.field([fillval] * len(merged), name, oldidx.keys))
 267            merged += rec
 268        for name in set(delname):
 269            if name:
 270                merged.delindex(name)
 271        if reindex:
 272            merged.reindex()
 273        ilc.lindex = merged.lindex
 274        return ilc
 275
 276    @classmethod
 277    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
 278        '''
 279        Dataset constructor (external index).
 280
 281        *Parameters*
 282
 283        - **idxval** : list of Field or list of values (see data model)
 284        - **idxname** : list of string (default None) - list of Field name (see data model)'''
 285        if idxval is None:
 286            idxval = []
 287        if not isinstance(idxval, list):
 288            return None
 289        val = []
 290        for idx in idxval:
 291            if not isinstance(idx, list):
 292                val.append([idx])
 293            else:
 294                val.append(idx)
 295        lenval = [len(idx) for idx in val]
 296        if lenval and max(lenval) != min(lenval):
 297            raise DatasetError('the length of Iindex are different')
 298        length = lenval[0] if lenval else 0
 299        idxname = [None] * len(val) if idxname is None else idxname
 300        for ind, name in enumerate(idxname):
 301            if name is None or name == '$default':
 302                idxname[ind] = 'i'+str(ind)
 303        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
 304                                  fast=fast) for codec, name in zip(val, idxname)]
 305        return cls(lindex, reindex=False)
 306
 307# %% internal
 308    @staticmethod
 309    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False, 
 310                     fillvalue=math.nan, reindex=False):
 311        row = rec[0]
 312        if not isinstance(row, list):
 313            row = [row]
 314        var = -1
 315        for ind, val in enumerate(row):
 316            if val.__class__.__name__ in ['Sdataset', 'Ndataset']:
 317                var = ind
 318                break
 319        if var < 0:
 320            return (rec, None, [])
 321        #ilis = row[var]
 322        ilis = row[var].merge(simplename=simplename, fillvalue=fillvalue, reindex=reindex)
 323        oldname = rec.lname[var]
 324        if ilis.lname == ['i0']:
 325            newname = [oldname]
 326            ilis.setname(newname)
 327        elif not simplename:
 328            newname = [oldname + '_' + name for name in ilis.lname]
 329            ilis.setname(newname)
 330        else:
 331            newname = copy(ilis.lname)
 332        for name in rec.lname:
 333            if name in newname:
 334                newname.remove(name)
 335            else:
 336                updidx = name in ilis.lname and not updateidx
 337                #ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
 338                ilis.addindex(ilis.field([rec.nindex(name)[0]] * len(ilis), name),
 339                              merge=mergeidx, update=updidx)
 340        return (ilis, oldname, newname)
 341
 342# %% special
 343    def __str__(self):
 344        '''return string format for var and lidx'''
 345        stri = ''
 346        if self.lvar:
 347            stri += 'variables :\n'
 348            for idx in self.lvar:
 349                stri += '    ' + str(idx) + '\n'
 350        if self.lunic:
 351            stri += 'uniques :\n'
 352            for idx in self.lunic:
 353                stri += '    ' + str({idx.name: idx.s_to_e(idx.codec[0])}) + '\n' 
 354        if self.lidx and self.lidx != self.lunic:
 355            stri += 'index :\n'
 356            for idx in list(set(self.lidx) - set(self.lunic)):
 357                stri += '    ' + str(idx) + '\n'
 358        return stri
 359
 360    def __add__(self, other):
 361        ''' Add other's values to self's values in a new Dataset'''
 362        newil = copy(self)
 363        newil.__iadd__(other)
 364        return newil
 365
 366    def __iadd__(self, other):
 367        ''' Add other's values to self's values'''
 368        return self.add(other, name=True, solve=False)
 369
 370    def __or__(self, other):
 371        ''' Add other's index to self's index in a new Dataset'''
 372        newil = copy(self)
 373        newil.__ior__(other)
 374        return newil
 375
 376    def __ior__(self, other):
 377        ''' Add other's index to self's index'''
 378        return self.orindex(other, first=False, merge=True, update=False)
 379
 380# %% property
 381    @property
 382    def consistent(self):
 383        ''' True if all the record are different'''
 384        selfiidx = self.iidx
 385        if not selfiidx:
 386            return True
 387        return max(Counter(zip(*selfiidx)).values()) == 1
 388
 389    @property
 390    def extidx(self):
 391        '''idx values (see data model)'''
 392        return [idx.values for idx in self.lidx]
 393
 394    @property
 395    def extidxext(self):
 396        '''idx val (see data model)'''
 397        return [idx.val for idx in self.lidx]
 398
 399    @property
 400    def idxname(self):
 401        ''' list of idx name'''
 402        return [idx.name for idx in self.lidx]
 403
 404    @property
 405    def idxlen(self):
 406        ''' list of idx codec length'''
 407        return [len(idx.codec) for idx in self.lidx]
 408
 409    @property
 410    def iidx(self):
 411        ''' list of keys for each idx'''
 412        return [idx.keys for idx in self.lidx]
 413
 414    @property
 415    def lenidx(self):
 416        ''' number of idx'''
 417        return len(self.lidx)
 418
 419    @property
 420    def lidx(self):
 421        '''list of idx'''
 422        return [self.lindex[i] for i in self.lidxrow]
 423
 424    @property
 425    def lisvar(self):
 426        '''list of boolean : True if Field is var'''
 427        return [name in self.lvarname for name in self.lname]
 428
 429    @property
 430    def lvar(self):
 431        '''list of var'''
 432        return [self.lindex[i] for i in self.lvarrow]
 433
 434    @property
 435    def lunic(self):
 436        '''list of unic index'''
 437        return [self.lindex[i] for i in self.lunicrow]
 438
 439    @property
 440    def lvarrow(self):
 441        '''list of var row'''
 442        return [self.lname.index(name) for name in self.lvarname]
 443
 444    @property
 445    def lidxrow(self):
 446        '''list of idx row'''
 447        return [i for i in range(self.lenindex) if i not in self.lvarrow]
 448
 449    @property
 450    def primary(self):
 451        ''' list of primary idx'''
 452        return [self.lidxrow.index(self.lname.index(name)) for name in self.primaryname]
 453
 454    @property
 455    def secondary(self):
 456        ''' list of secondary idx'''
 457        return [self.lidxrow.index(self.lname.index(name)) for name in self.secondaryname]
 458
 459    @property
 460    def setidx(self):
 461        '''list of codec for each idx'''
 462        return [idx.codec for idx in self.lidx]
 463
 464    @property
 465    def zip(self):
 466        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
 467        textidx = Cutil.transpose(self.extidx)
 468        if not textidx:
 469            return None
 470        return tuple(tuple(idx) for idx in textidx)
 471
 472    # %% structure
 473    def addindex(self, index, first=False, merge=False, update=False):
 474        '''add a new index.
 475
 476        *Parameters*
 477
 478        - **index** : Field - index to add (can be index Ntv representation)
 479        - **first** : If True insert index at the first row, else at the end
 480        - **merge** : create a new index if merge is False
 481        - **update** : if True, update actual values if index name is present (and merge is True)
 482
 483        *Returns* : none '''
 484        idx = self.field.ntv(index)
 485        idxname = self.lname
 486        if len(idx) != len(self) and len(self) > 0:
 487            raise DatasetError('sizes are different')
 488        if not idx.name in idxname:
 489            if first:
 490                self.lindex.insert(0, idx)
 491            else:
 492                self.lindex.append(idx)
 493        elif not merge:  # si idx.name in idxname
 494            while idx.name in idxname:
 495                idx.name += '(2)'
 496            if first:
 497                self.lindex.insert(0, idx)
 498            else:
 499                self.lindex.append(idx)
 500        elif update:  # si merge et si idx.name in idxname
 501            self.lindex[idxname.index(idx.name)].setlistvalue(idx.values)
 502
 503    def append(self, record, unique=False):
 504        '''add a new record.
 505
 506        *Parameters*
 507
 508        - **record** :  list of new index values to add to Dataset
 509        - **unique** :  boolean (default False) - Append isn't done if unique
 510        is True and record present
 511
 512        *Returns* : list - key record'''
 513        if self.lenindex != len(record):
 514            raise DatasetError('len(record) not consistent')
 515        record = self.field.l_to_i(record)
 516        if self.isinrecord(self.idxrecord(record), False) and unique:
 517            return None
 518        return [self.lindex[i].append(record[i]) for i in range(self.lenindex)]
 519
 520    def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True):
 521        '''delete records with defined filter value.
 522        Filter is deleted after record filtering.
 523
 524        *Parameters*
 525
 526        - **reverse** :  boolean (default False) - delete record with filter's
 527        value is reverse
 528        - **filtname** : string (default FILTER) - Name of the filter Field added
 529        - **delfilter** :  boolean (default True) - If True, delete filter's Field
 530        - **inplace** : boolean (default True) - if True, filter is apply to self,
 531
 532        *Returns* : self or new Dataset'''
 533        if not filtname in self.lname:
 534            return None
 535        if inplace:
 536            ilis = self
 537        else:
 538            ilis = copy(self)
 539        ifilt = ilis.lname.index(filtname)
 540        ilis.sort([ifilt], reverse=not reverse, func=None)
 541        lisind = ilis.lindex[ifilt].recordfromvalue(reverse)
 542        if lisind:
 543            minind = min(lisind)
 544            for idx in ilis.lindex:
 545                del idx.keys[minind:]
 546        if inplace:
 547            self.delindex(filtname)
 548        else:
 549            ilis.delindex(filtname)
 550            if delfilter:
 551                self.delindex(filtname)
 552        ilis.reindex()
 553        return ilis
 554
 555    def coupling(self, derived=True, level=0.1):
 556        '''Transform idx with low dist in coupled or derived indexes (codec extension).
 557
 558        *Parameters*
 559
 560        - **level** : float (default 0.1) - param threshold to apply coupling.
 561        - **derived** : boolean (default : True). If True, indexes are derived,
 562        else coupled.
 563
 564        *Returns* : None'''
 565        ana = self.analysis
 566        child = [[]] * len(ana)
 567        childroot = []
 568        level = level * len(self)
 569        for idx in range(self.lenindex):
 570            if derived:
 571                iparent = ana.fields[idx].p_distomin.index
 572            else:
 573                iparent = ana.fields[idx].p_distance.index
 574            if iparent == -1:
 575                childroot.append(idx)
 576            else:
 577                child[iparent].append(idx)
 578        for idx in childroot:
 579            self._couplingidx(idx, child, derived, level, ana)
 580
 581    def _couplingidx(self, idx, child, derived, level, ana):
 582        ''' Field coupling (included childrens of the Field)'''
 583        fields = ana.fields
 584        if derived:
 585            iparent = fields[idx].p_distomin.index
 586            dparent = ana.get_relation(*sorted([idx, iparent])).distomin
 587        else:
 588            iparent = fields[idx].p_distance.index
 589            dparent = ana.get_relation(*sorted([idx, iparent])).distance
 590        # if fields[idx].category in ('coupled', 'unique') or iparent == -1\
 591        if fields[idx].category in ('coupled', 'unique') \
 592                or dparent >= level or dparent == 0:
 593            return
 594        if child[idx]:
 595            for childidx in child[idx]:
 596                self._couplingidx(childidx, child, derived, level, ana)
 597        self.lindex[iparent].coupling(self.lindex[idx], derived=derived,
 598                                      duplicate=False)
 599        return
 600
 601    def delrecord(self, record, extern=True):
 602        '''remove a record.
 603
 604        *Parameters*
 605
 606        - **record** :  list - index values to remove to Dataset
 607        - **extern** : if True, compare record values to external representation
 608        of self.value, else, internal
 609
 610        *Returns* : row deleted'''
 611        self.reindex()
 612        reckeys = self.valtokey(record, extern=extern)
 613        if None in reckeys:
 614            return None
 615        row = self.tiindex.index(reckeys)
 616        for idx in self:
 617            del idx[row]
 618        return row
 619
 620    def _fullindex(self, ind, keysadd, indexname, varname, leng, fillvalue, fillextern):
 621        if not varname:
 622            varname = []
 623        idx = self.lindex[ind]
 624        lenadd = len(keysadd[0])
 625        if len(idx) == leng:
 626            return
 627        #inf = self.indexinfos()
 628        ana = self.anafields
 629        parent = ana[ind].p_derived.view('index')
 630        # if inf[ind]['cat'] == 'unique':
 631        if ana[ind].category == 'unique':
 632            idx.set_keys(idx.keys + [0] * lenadd)
 633        elif self.lname[ind] in indexname:
 634            idx.set_keys(idx.keys + keysadd[indexname.index(self.lname[ind])])
 635        # elif inf[ind]['parent'] == -1 or self.lname[ind] in varname:
 636        elif parent == -1 or self.lname[ind] in varname:
 637            fillval = fillvalue
 638            if fillextern:
 639                fillval = self.field.s_to_i(fillvalue)
 640            idx.set_keys(idx.keys + [len(idx.codec)] * len(keysadd[0]))
 641            idx.set_codec(idx.codec + [fillval])
 642        else:
 643            #parent = inf[ind]['parent']
 644            if len(self.lindex[parent]) != leng:
 645                self._fullindex(parent, keysadd, indexname, varname, leng,
 646                                fillvalue, fillextern)
 647            # if inf[ind]['cat'] == 'coupled':
 648            if ana[ind].category == 'coupled':
 649                idx.tocoupled(self.lindex[parent], coupling=True)
 650            else:
 651                idx.tocoupled(self.lindex[parent], coupling=False)
 652
 653    def full(self, reindex=False, idxname=None, varname=None, fillvalue='-',
 654             fillextern=True, inplace=True, canonical=True):
 655        '''tranform a list of indexes in crossed indexes (value extension).
 656
 657        *Parameters*
 658
 659        - **idxname** : list of string - name of indexes to transform
 660        - **varname** : string - name of indexes to use
 661        - **reindex** : boolean (default False) - if True, set default codec
 662        before transformation
 663        - **fillvalue** : object value used for var extension
 664        - **fillextern** : boolean(default True) - if True, fillvalue is converted
 665        to internal value
 666        - **inplace** : boolean (default True) - if True, filter is apply to self,
 667        - **canonical** : boolean (default True) - if True, Field are ordered
 668        in canonical order
 669
 670        *Returns* : self or new Dataset'''
 671        ilis = self if inplace else copy(self)
 672        if not idxname:
 673            idxname = ilis.primaryname
 674        if reindex:
 675            ilis.reindex()
 676        keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname])
 677        if keysadd and len(keysadd) != 0:
 678            newlen = len(keysadd[0]) + len(ilis)
 679            for ind in range(ilis.lenindex):
 680                ilis._fullindex(ind, keysadd, idxname, varname, newlen,
 681                                fillvalue, fillextern)
 682        if canonical:
 683            ilis.setcanonorder()
 684        return ilis
 685
 686    def getduplicates(self, indexname=None, resindex=None, indexview=None):
 687        '''check duplicate cod in a list of indexes. Result is add in a new
 688        index or returned.
 689
 690        *Parameters*
 691
 692        - **indexname** : list of string (default none) - name of indexes to check
 693        (if None, all Field)
 694        - **resindex** : string (default None) - Add a new index named resindex
 695        with check result (False if duplicate)
 696        - **indexview** : list of str (default None) - list of fields to return
 697
 698        *Returns* : list of int - list of rows with duplicate cod '''
 699        if not indexname:
 700            indexname = self.lname
 701        duplicates = []
 702        for name in indexname:
 703            duplicates += self.nindex(name).getduplicates()
 704        if resindex and isinstance(resindex, str):
 705            newidx = self.field([True] * len(self), name=resindex)
 706            for item in duplicates:
 707                newidx[item] = False
 708            self.addindex(newidx)
 709        dupl = tuple(set(duplicates))
 710        if not indexview:
 711            return dupl
 712        return [tuple(self.record(ind, indexview)) for ind in dupl]
 713
 714    def iscanonorder(self):
 715        '''return True if primary indexes have canonical ordered keys'''
 716        primary = self.primary
 717        canonorder = Cutil.canonorder(
 718            [len(self.lidx[idx].codec) for idx in primary])
 719        return canonorder == [self.lidx[idx].keys for idx in primary]
 720
 721    def isinrecord(self, record, extern=True):
 722        '''Check if record is present in self.
 723
 724        *Parameters*
 725
 726        - **record** : list - value for each Field
 727        - **extern** : if True, compare record values to external representation
 728        of self.value, else, internal
 729
 730        *Returns boolean* : True if found'''
 731        if extern:
 732            return record in Cutil.transpose(self.extidxext)
 733        return record in Cutil.transpose(self.extidx)
 734
 735    def idxrecord(self, record):
 736        '''return rec array (without variable) from complete record (with variable)'''
 737        return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))]
 738
 739    def keytoval(self, listkey, extern=True):
 740        '''
 741        convert a keys list (key for each index) to a values list (value for each index).
 742
 743        *Parameters*
 744
 745        - **listkey** : key for each index
 746        - **extern** : boolean (default True) - if True, compare rec to val else to values
 747
 748        *Returns*
 749
 750        - **list** : value for each index'''
 751        return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)]
 752
 753    def loc(self, rec, extern=True, row=False):
 754        '''
 755        Return record or row corresponding to a list of idx values.
 756
 757        *Parameters*
 758
 759        - **rec** : list - value for each idx
 760        - **extern** : boolean (default True) - if True, compare rec to val,
 761        else to values
 762        - **row** : Boolean (default False) - if True, return list of row,
 763        else list of records
 764
 765        *Returns*
 766
 767        - **object** : variable value or None if not found'''
 768        locrow = None
 769        try:
 770            if len(rec) == self.lenindex:
 771                locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern))
 772                                               for i in range(self.lenindex)]))
 773            elif len(rec) == self.lenidx:
 774                locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern))
 775                                               for i in range(self.lenidx)]))
 776        except:
 777            pass
 778        if locrow is None:
 779            return None
 780        if row:
 781            return locrow
 782        return [self.record(locr, extern=extern) for locr in locrow]
 783
 784    def mix(self, other, fillvalue=None):
 785        '''add other Field not included in self and add other's values'''
 786        sname = set(self.lname)
 787        oname = set(other.lname)
 788        newself = copy(self)
 789        copother = copy(other)
 790        for nam in oname - sname:
 791            newself.addindex({nam: [fillvalue] * len(newself)})
 792        for nam in sname - oname:
 793            copother.addindex({nam: [fillvalue] * len(copother)})
 794        return newself.add(copother, name=True, solve=False)
 795
 796    def merging(self, listname=None):
 797        ''' add a new Field build with Field define in listname.
 798        Values of the new Field are set of values in listname Field'''
 799        #self.addindex(Field.merging([self.nindex(name) for name in listname]))
 800        self.addindex(Sfield.merging([self.nindex(name) for name in listname]))
 801
 802    def orindex(self, other, first=False, merge=False, update=False):
 803        ''' Add other's index to self's index (with same length)
 804
 805        *Parameters*
 806
 807        - **other** : self class - object to add
 808        - **first** : Boolean (default False) - If True insert indexes
 809        at the first row, else at the end
 810        - **merge** : Boolean (default False) - create a new index
 811        if merge is False
 812        - **update** : Boolean (default False) - if True, update actual
 813        values if index name is present (and merge is True)
 814
 815        *Returns* : none '''
 816        if len(self) != 0 and len(self) != len(other) and len(other) != 0:
 817            raise DatasetError("the sizes are not equal")
 818        otherc = copy(other)
 819        for idx in otherc.lindex:
 820            self.addindex(idx, first=first, merge=merge, update=update)
 821        return self
 822
 823    def record(self, row, indexname=None, extern=True):
 824        '''return the record at the row
 825
 826        *Parameters*
 827
 828        - **row** : int - row of the record
 829        - **extern** : boolean (default True) - if True, return val record else
 830        value record
 831        - **indexname** : list of str (default None) - list of fields to return
 832        *Returns*
 833
 834        - **list** : val record or value record'''
 835        if indexname is None:
 836            indexname = self.lname
 837        if extern:
 838            record = [idx.val[row] for idx in self.lindex]
 839            #record = [idx.values[row].to_obj() for idx in self.lindex]
 840            #record = [idx.valrow(row) for idx in self.lindex]
 841        else:
 842            record = [idx.values[row] for idx in self.lindex]
 843        return [record[self.lname.index(name)] for name in indexname]
 844
 845    def recidx(self, row, extern=True):
 846        '''return the list of idx val or values at the row
 847
 848        *Parameters*
 849
 850        - **row** : int - row of the record
 851        - **extern** : boolean (default True) - if True, return val rec else value rec
 852
 853        *Returns*
 854
 855        - **list** : val or value for idx'''
 856        if extern:
 857            return [idx.values[row].to_obj() for idx in self.lidx]
 858            # return [idx.valrow(row) for idx in self.lidx]
 859        return [idx.values[row] for idx in self.lidx]
 860
 861    def recvar(self, row, extern=True):
 862        '''return the list of var val or values at the row
 863
 864        *Parameters*
 865
 866        - **row** : int - row of the record
 867        - **extern** : boolean (default True) - if True, return val rec else value rec
 868
 869        *Returns*
 870
 871        - **list** : val or value for var'''
 872        if extern:
 873            return [idx.values[row].to_obj() for idx in self.lvar]
 874            # return [idx.valrow(row) for idx in self.lvar]
 875        return [idx.values[row] for idx in self.lvar]
 876
 877    def setcanonorder(self, reindex=False):
 878        '''Set the canonical index order : primary - secondary/unique - variable.
 879        Set the canonical keys order : ordered keys in the first columns.
 880
 881        *Parameters*
 882        - **reindex** : boolean (default False) - if True, set default codec after
 883        transformation
 884
 885        *Return* : self'''
 886        order = self.primaryname
 887        order += self.secondaryname
 888        order += self.lvarname
 889        order += self.lunicname
 890        self.swapindex(order)
 891        self.sort(reindex=reindex)
 892        # self.analysis.actualize()
 893        return self
 894
 895    def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False):
 896        '''Add a filter index with boolean values
 897
 898        - **filt** : list of boolean - values of the filter idx to add
 899        - **first** : boolean (default False) - If True insert index at the first row,
 900        else at the end
 901        - **filtname** : string (default FILTER) - Name of the filter Field added
 902
 903        *Returns* : self'''
 904        if not filt:
 905            filt = [True] * len(self)
 906        idx = self.field(filt, name=filtname)
 907        idx.reindex()
 908        if not idx.cod in ([True, False], [False, True], [True], [False]):
 909            raise DatasetError('filt is not consistent')
 910        if unique:
 911            for name in self.lname:
 912                if name[:len(FILTER)] == FILTER:
 913                    self.delindex(FILTER)
 914        self.addindex(idx, first=first)
 915        return self
 916
 917    def sort(self, order=None, reverse=False, func=str, reindex=True):
 918        '''Sort data following the index order and apply the ascending or descending
 919        sort function to values.
 920
 921        *Parameters*
 922
 923        - **order** : list (default None)- new order of index to apply. If None or [],
 924        the sort function is applied to the existing order of indexes.
 925        - **reverse** : boolean (default False)- ascending if True, descending if False
 926        - **func**    : function (default str) - parameter key used in the sorted function
 927        - **reindex** : boolean (default True) - if True, apply a new codec order (key = func)
 928
 929        *Returns* : self'''
 930        if not order:
 931            order = list(range(self.lenindex))
 932        orderfull = order + list(set(range(self.lenindex)) - set(order))
 933        if reindex:
 934            for i in order:
 935                self.lindex[i].reindex(codec=sorted(
 936                    self.lindex[i].codec, key=func))
 937        newidx = Cutil.transpose(sorted(Cutil.transpose(
 938            [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]),
 939            reverse=reverse))
 940        for i in range(self.lenindex):
 941            self.lindex[orderfull[i]].set_keys(newidx[i])
 942        return self
 943
 944    """
 945    def swapindex(self, order):
 946        '''
 947        Change the order of the index .
 948
 949        *Parameters*
 950
 951        - **order** : list of int or list of name - new order of index to apply.
 952
 953        *Returns* : self '''
 954        if self.lenindex != len(order):
 955            raise DatasetError('length of order and Dataset different')
 956        if not order or isinstance(order[0], int):
 957            self.lindex = [self.lindex[ind] for ind in order]
 958        elif isinstance(order[0], str):
 959            self.lindex = [self.nindex(name) for name in order]
 960        return self
 961    """
 962
 963    def tostdcodec(self, inplace=False, full=True):
 964        '''Transform all codec in full or default codec.
 965
 966        *Parameters*
 967
 968        - **inplace** : boolean  (default False) - if True apply transformation
 969        to self, else to a new Dataset
 970        - **full** : boolean (default True)- full codec if True, default if False
 971
 972
 973        *Return Dataset* : self or new Dataset'''
 974        lindex = [idx.tostdcodec(inplace=False, full=full)
 975                  for idx in self.lindex]
 976        if inplace:
 977            self.lindex = lindex
 978            return self
 979        return self.__class__(lindex, self.lvarname)
 980
 981    def updateindex(self, listvalue, index, extern=True):
 982        '''update values of an index.
 983
 984        *Parameters*
 985
 986        - **listvalue** : list - index values to replace
 987        - **index** : integer - index row to update
 988        - **extern** : if True, the listvalue has external representation, else internal
 989
 990        *Returns* : none '''
 991        self.lindex[index].setlistvalue(listvalue, extern=extern)
 992
 993    def valtokey(self, rec, extern=True):
 994        '''convert a record list (value or val for each idx) to a key list
 995        (key for each index).
 996
 997        *Parameters*
 998
 999        - **rec** : list of value or val for each index
1000        - **extern** : if True, the rec value has external representation, else internal
1001
1002        *Returns*
1003
1004        - **list of int** : record key for each index'''
1005        return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)]
1006
1007class Ndataset(Sdataset):
1008    # %% Ndataset
1009    '''    
1010    `Ndataset` is a child class of Cdataset where internal value are NTV entities.
1011    
1012    All the methods are the same as `Sdataset`.
1013    '''
1014    field_class = Nfield
class Sdataset(tab_dataset.dataset_interface.DatasetInterface, tab_dataset.cdataset.Cdataset):
  26class Sdataset(DatasetInterface, Cdataset):
  27    # %% intro
  28    '''
  29    `Sdataset` is a child class of Cdataset where internal value can be different
  30    from external value (list is converted in tuple and dict in json-object).
  31    
  32    One attribute is added: 'field' to define the 'field' class.
  33
  34    The methods defined in this class are :
  35
  36    *constructor (@classmethod)*
  37
  38    - `Sdataset.from_csv`
  39    - `Sdataset.from_file`
  40    - `Sdataset.merge`
  41    - `Sdataset.ext`
  42    - `Cdataset.ntv`
  43    - `Cdataset.from_ntv`
  44
  45    *dynamic value - module analysis (getters @property)*
  46
  47    - `DatasetAnalysis.analysis`
  48    - `DatasetAnalysis.anafields`
  49    - `Sdataset.extidx`
  50    - `Sdataset.extidxext`
  51    - `DatasetAnalysis.field_partition`
  52    - `Sdataset.idxname`
  53    - `Sdataset.idxlen`
  54    - `Sdataset.iidx`
  55    - `Sdataset.lenidx`
  56    - `Sdataset.lidx`
  57    - `Sdataset.lidxrow`
  58    - `Sdataset.lisvar`
  59    - `Sdataset.lvar`
  60    - `DatasetAnalysis.lvarname`
  61    - `Sdataset.lvarrow`
  62    - `Cdataset.lunicname`
  63    - `Cdataset.lunicrow`
  64    - `DatasetAnalysis.partitions`
  65    - `DatasetAnalysis.primaryname`
  66    - `DatasetAnalysis.relation`
  67    - `DatasetAnalysis.secondaryname`
  68    - `Sdataset.setidx`
  69    - `Sdataset.zip`
  70
  71    *dynamic value (getters @property)*
  72
  73    - `Cdataset.keys`
  74    - `Cdataset.iindex`
  75    - `Cdataset.indexlen`
  76    - `Cdataset.lenindex`
  77    - `Cdataset.lname`
  78    - `Cdataset.tiindex`
  79
  80    *global value (getters @property)*
  81
  82    - `DatasetAnalysis.complete`
  83    - `Sdataset.consistent`
  84    - `DatasetAnalysis.dimension`
  85    - `Sdataset.primary`
  86    - `Sdataset.secondary`
  87
  88    *selecting - infos methods*
  89
  90    - `Sdataset.idxrecord`
  91    - `DatasetAnalysis.indexinfos`
  92    - `DatasetAnalysis.indicator`
  93    - `Sdataset.iscanonorder`
  94    - `Sdataset.isinrecord`
  95    - `Sdataset.keytoval`
  96    - `Sdataset.loc`
  97    - `Cdataset.nindex`
  98    - `Sdataset.record`
  99    - `Sdataset.recidx`
 100    - `Sdataset.recvar`
 101    - `Cdataset.to_analysis`
 102    - `DatasetAnalysis.tree`
 103    - `Sdataset.valtokey`
 104
 105    *add - update methods*
 106
 107    - `Cdataset.add`
 108    - `Sdataset.addindex`
 109    - `Sdataset.append`
 110    - `Cdataset.delindex`
 111    - `Sdataset.delrecord`
 112    - `Sdataset.orindex`
 113    - `Cdataset.renameindex`
 114    - `Cdataset.setname`
 115    - `Sdataset.updateindex`
 116
 117    *structure management - methods*
 118
 119    - `Sdataset.applyfilter`
 120    - `Cdataset.check_relation`
 121    - `Cdataset.check_relationship`
 122    - `Sdataset.coupling`
 123    - `Sdataset.full`
 124    - `Sdataset.getduplicates`
 125    - `Sdataset.mix`
 126    - `Sdataset.merging`
 127    - `Cdataset.reindex`
 128    - `Cdataset.reorder`
 129    - `Sdataset.setfilter`
 130    - `Sdataset.sort`
 131    - `Cdataset.swapindex`
 132    - `Sdataset.setcanonorder`
 133    - `Sdataset.tostdcodec`
 134
 135    *exports methods (`observation.dataset_interface.DatasetInterface`)*
 136
 137    - `Dataset.json`
 138    - `Dataset.plot`
 139    - `Dataset.to_obj`
 140    - `Dataset.to_csv`
 141    - `Dataset.to_dataframe`
 142    - `Dataset.to_file`
 143    - `Dataset.to_ntv`
 144    - `Dataset.to_obj`
 145    - `Dataset.to_xarray`
 146    - `Dataset.view`
 147    - `Dataset.vlist`
 148    - `Dataset.voxel`
 149    '''
 150
 151    field_class = Sfield
 152
 153    def __init__(self, listidx=None, name=None, reindex=True):
 154        '''
 155        Dataset constructor.
 156
 157        *Parameters*
 158
 159        - **listidx** :  list (default None) - list of Field data
 160        - **name** :  string (default None) - name of the dataset
 161        - **reindex** : boolean (default True) - if True, default codec for each Field'''
 162
 163        self.field = self.field_class
 164        Cdataset.__init__(self, listidx, name, reindex=reindex)
 165
 166    @classmethod
 167    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
 168                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
 169        '''
 170        Dataset constructor (from a csv file). Each column represents index values.
 171
 172        *Parameters*
 173
 174        - **filename** : string (default 'dataset.csv'), name of the file to read
 175        - **header** : boolean (default True). If True, the first raw is dedicated to names
 176        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
 177        - **optcsv** : dict (default : quoting) - see csv.reader options'''
 178        if not optcsv:
 179            optcsv = {}
 180        if not nrow:
 181            nrow = -1
 182        with open(filename, newline='', encoding="utf-8") as file:
 183            reader = csv.reader(file, **optcsv)
 184            irow = 0
 185            for row in reader:
 186                if irow == nrow:
 187                    break
 188                if irow == 0:
 189                    idxval = [[] for i in range(len(row))]
 190                    idxname = [''] * len(row)
 191                if irow == 0 and header:
 192                    idxname = row
 193                else:
 194                    for i in range(len(row)):
 195                        if decode_json:
 196                            try:
 197                                idxval[i].append(json.loads(row[i]))
 198                            except:
 199                                idxval[i].append(row[i])
 200                        else:
 201                            idxval[i].append(row[i])
 202                irow += 1
 203        lindex = [cls.field_class.from_ntv(
 204            {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
 205        return cls(listidx=lindex, reindex=True)
 206
 207    @classmethod
 208    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
 209        '''
 210        Generate Object from file storage.
 211
 212         *Parameters*
 213
 214        - **filename** : string - file name (with path)
 215        - **forcestring** : boolean (default False) - if True,
 216        forces the UTF-8 data format, else the format is calculated
 217        - **reindex** : boolean (default True) - if True, default codec for each Field
 218        - **decode_str**: boolean (default False) - if True, string are loaded in json data
 219
 220        *Returns* : new Object'''
 221        with open(filename, 'rb') as file:
 222            btype = file.read(1)
 223        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
 224            with open(filename, 'r', newline='', encoding="utf-8") as file:
 225                bjson = file.read()
 226        else:
 227            with open(filename, 'rb') as file:
 228                bjson = file.read()
 229        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)
 230
 231    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
 232        '''
 233        Merge method replaces Dataset objects included into its constituents.
 234
 235        *Parameters*
 236
 237        - **fillvalue** : object (default nan) - value used for the additional data
 238        - **reindex** : boolean (default False) - if True, set default codec after transformation
 239        - **simplename** : boolean (default False) - if True, new Field name are
 240        the same as merged Field name else it is a composed name.
 241
 242        *Returns*: merged Dataset '''
 243        ilc = copy(self)
 244        delname = []
 245        row = ilc[0]
 246        if not isinstance(row, list):
 247            row = [row]
 248        merged, oldname, newname = self.__class__._mergerecord(
 249            self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue,
 250            reindex=reindex)
 251        delname.append(oldname)
 252        for ind in range(1, len(ilc)):
 253            oldidx = ilc.nindex(oldname)
 254            for name in newname:
 255                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
 256            row = ilc[ind]
 257            if not isinstance(row, list):
 258                row = [row]
 259            rec, oldname, newname = self.__class__._mergerecord(
 260                self.ext(row, ilc.lname), simplename=simplename)
 261            if oldname and newname != [oldname]:
 262                delname.append(oldname)
 263            for name in newname:
 264                oldidx = merged.nindex(oldname)
 265                fillval = self.field.s_to_i(fillvalue)
 266                merged.addindex(
 267                    self.field([fillval] * len(merged), name, oldidx.keys))
 268            merged += rec
 269        for name in set(delname):
 270            if name:
 271                merged.delindex(name)
 272        if reindex:
 273            merged.reindex()
 274        ilc.lindex = merged.lindex
 275        return ilc
 276
 277    @classmethod
 278    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
 279        '''
 280        Dataset constructor (external index).
 281
 282        *Parameters*
 283
 284        - **idxval** : list of Field or list of values (see data model)
 285        - **idxname** : list of string (default None) - list of Field name (see data model)'''
 286        if idxval is None:
 287            idxval = []
 288        if not isinstance(idxval, list):
 289            return None
 290        val = []
 291        for idx in idxval:
 292            if not isinstance(idx, list):
 293                val.append([idx])
 294            else:
 295                val.append(idx)
 296        lenval = [len(idx) for idx in val]
 297        if lenval and max(lenval) != min(lenval):
 298            raise DatasetError('the length of Iindex are different')
 299        length = lenval[0] if lenval else 0
 300        idxname = [None] * len(val) if idxname is None else idxname
 301        for ind, name in enumerate(idxname):
 302            if name is None or name == '$default':
 303                idxname[ind] = 'i'+str(ind)
 304        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
 305                                  fast=fast) for codec, name in zip(val, idxname)]
 306        return cls(lindex, reindex=False)
 307
 308# %% internal
 309    @staticmethod
 310    def _mergerecord(rec, mergeidx=True, updateidx=True, simplename=False, 
 311                     fillvalue=math.nan, reindex=False):
 312        row = rec[0]
 313        if not isinstance(row, list):
 314            row = [row]
 315        var = -1
 316        for ind, val in enumerate(row):
 317            if val.__class__.__name__ in ['Sdataset', 'Ndataset']:
 318                var = ind
 319                break
 320        if var < 0:
 321            return (rec, None, [])
 322        #ilis = row[var]
 323        ilis = row[var].merge(simplename=simplename, fillvalue=fillvalue, reindex=reindex)
 324        oldname = rec.lname[var]
 325        if ilis.lname == ['i0']:
 326            newname = [oldname]
 327            ilis.setname(newname)
 328        elif not simplename:
 329            newname = [oldname + '_' + name for name in ilis.lname]
 330            ilis.setname(newname)
 331        else:
 332            newname = copy(ilis.lname)
 333        for name in rec.lname:
 334            if name in newname:
 335                newname.remove(name)
 336            else:
 337                updidx = name in ilis.lname and not updateidx
 338                #ilis.addindex({name: [rec.nindex(name)[0]] * len(ilis)},
 339                ilis.addindex(ilis.field([rec.nindex(name)[0]] * len(ilis), name),
 340                              merge=mergeidx, update=updidx)
 341        return (ilis, oldname, newname)
 342
 343# %% special
 344    def __str__(self):
 345        '''return string format for var and lidx'''
 346        stri = ''
 347        if self.lvar:
 348            stri += 'variables :\n'
 349            for idx in self.lvar:
 350                stri += '    ' + str(idx) + '\n'
 351        if self.lunic:
 352            stri += 'uniques :\n'
 353            for idx in self.lunic:
 354                stri += '    ' + str({idx.name: idx.s_to_e(idx.codec[0])}) + '\n' 
 355        if self.lidx and self.lidx != self.lunic:
 356            stri += 'index :\n'
 357            for idx in list(set(self.lidx) - set(self.lunic)):
 358                stri += '    ' + str(idx) + '\n'
 359        return stri
 360
 361    def __add__(self, other):
 362        ''' Add other's values to self's values in a new Dataset'''
 363        newil = copy(self)
 364        newil.__iadd__(other)
 365        return newil
 366
 367    def __iadd__(self, other):
 368        ''' Add other's values to self's values'''
 369        return self.add(other, name=True, solve=False)
 370
 371    def __or__(self, other):
 372        ''' Add other's index to self's index in a new Dataset'''
 373        newil = copy(self)
 374        newil.__ior__(other)
 375        return newil
 376
 377    def __ior__(self, other):
 378        ''' Add other's index to self's index'''
 379        return self.orindex(other, first=False, merge=True, update=False)
 380
 381# %% property
 382    @property
 383    def consistent(self):
 384        ''' True if all the record are different'''
 385        selfiidx = self.iidx
 386        if not selfiidx:
 387            return True
 388        return max(Counter(zip(*selfiidx)).values()) == 1
 389
 390    @property
 391    def extidx(self):
 392        '''idx values (see data model)'''
 393        return [idx.values for idx in self.lidx]
 394
 395    @property
 396    def extidxext(self):
 397        '''idx val (see data model)'''
 398        return [idx.val for idx in self.lidx]
 399
 400    @property
 401    def idxname(self):
 402        ''' list of idx name'''
 403        return [idx.name for idx in self.lidx]
 404
 405    @property
 406    def idxlen(self):
 407        ''' list of idx codec length'''
 408        return [len(idx.codec) for idx in self.lidx]
 409
 410    @property
 411    def iidx(self):
 412        ''' list of keys for each idx'''
 413        return [idx.keys for idx in self.lidx]
 414
 415    @property
 416    def lenidx(self):
 417        ''' number of idx'''
 418        return len(self.lidx)
 419
 420    @property
 421    def lidx(self):
 422        '''list of idx'''
 423        return [self.lindex[i] for i in self.lidxrow]
 424
 425    @property
 426    def lisvar(self):
 427        '''list of boolean : True if Field is var'''
 428        return [name in self.lvarname for name in self.lname]
 429
 430    @property
 431    def lvar(self):
 432        '''list of var'''
 433        return [self.lindex[i] for i in self.lvarrow]
 434
 435    @property
 436    def lunic(self):
 437        '''list of unic index'''
 438        return [self.lindex[i] for i in self.lunicrow]
 439
 440    @property
 441    def lvarrow(self):
 442        '''list of var row'''
 443        return [self.lname.index(name) for name in self.lvarname]
 444
 445    @property
 446    def lidxrow(self):
 447        '''list of idx row'''
 448        return [i for i in range(self.lenindex) if i not in self.lvarrow]
 449
 450    @property
 451    def primary(self):
 452        ''' list of primary idx'''
 453        return [self.lidxrow.index(self.lname.index(name)) for name in self.primaryname]
 454
 455    @property
 456    def secondary(self):
 457        ''' list of secondary idx'''
 458        return [self.lidxrow.index(self.lname.index(name)) for name in self.secondaryname]
 459
 460    @property
 461    def setidx(self):
 462        '''list of codec for each idx'''
 463        return [idx.codec for idx in self.lidx]
 464
 465    @property
 466    def zip(self):
 467        '''return a zip format for transpose(extidx) : tuple(tuple(rec))'''
 468        textidx = Cutil.transpose(self.extidx)
 469        if not textidx:
 470            return None
 471        return tuple(tuple(idx) for idx in textidx)
 472
 473    # %% structure
 474    def addindex(self, index, first=False, merge=False, update=False):
 475        '''add a new index.
 476
 477        *Parameters*
 478
 479        - **index** : Field - index to add (can be index Ntv representation)
 480        - **first** : If True insert index at the first row, else at the end
 481        - **merge** : create a new index if merge is False
 482        - **update** : if True, update actual values if index name is present (and merge is True)
 483
 484        *Returns* : none '''
 485        idx = self.field.ntv(index)
 486        idxname = self.lname
 487        if len(idx) != len(self) and len(self) > 0:
 488            raise DatasetError('sizes are different')
 489        if not idx.name in idxname:
 490            if first:
 491                self.lindex.insert(0, idx)
 492            else:
 493                self.lindex.append(idx)
 494        elif not merge:  # si idx.name in idxname
 495            while idx.name in idxname:
 496                idx.name += '(2)'
 497            if first:
 498                self.lindex.insert(0, idx)
 499            else:
 500                self.lindex.append(idx)
 501        elif update:  # si merge et si idx.name in idxname
 502            self.lindex[idxname.index(idx.name)].setlistvalue(idx.values)
 503
 504    def append(self, record, unique=False):
 505        '''add a new record.
 506
 507        *Parameters*
 508
 509        - **record** :  list of new index values to add to Dataset
 510        - **unique** :  boolean (default False) - Append isn't done if unique
 511        is True and record present
 512
 513        *Returns* : list - key record'''
 514        if self.lenindex != len(record):
 515            raise DatasetError('len(record) not consistent')
 516        record = self.field.l_to_i(record)
 517        if self.isinrecord(self.idxrecord(record), False) and unique:
 518            return None
 519        return [self.lindex[i].append(record[i]) for i in range(self.lenindex)]
 520
 521    def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True):
 522        '''delete records with defined filter value.
 523        Filter is deleted after record filtering.
 524
 525        *Parameters*
 526
 527        - **reverse** :  boolean (default False) - delete record with filter's
 528        value is reverse
 529        - **filtname** : string (default FILTER) - Name of the filter Field added
 530        - **delfilter** :  boolean (default True) - If True, delete filter's Field
 531        - **inplace** : boolean (default True) - if True, filter is apply to self,
 532
 533        *Returns* : self or new Dataset'''
 534        if not filtname in self.lname:
 535            return None
 536        if inplace:
 537            ilis = self
 538        else:
 539            ilis = copy(self)
 540        ifilt = ilis.lname.index(filtname)
 541        ilis.sort([ifilt], reverse=not reverse, func=None)
 542        lisind = ilis.lindex[ifilt].recordfromvalue(reverse)
 543        if lisind:
 544            minind = min(lisind)
 545            for idx in ilis.lindex:
 546                del idx.keys[minind:]
 547        if inplace:
 548            self.delindex(filtname)
 549        else:
 550            ilis.delindex(filtname)
 551            if delfilter:
 552                self.delindex(filtname)
 553        ilis.reindex()
 554        return ilis
 555
 556    def coupling(self, derived=True, level=0.1):
 557        '''Transform idx with low dist in coupled or derived indexes (codec extension).
 558
 559        *Parameters*
 560
 561        - **level** : float (default 0.1) - param threshold to apply coupling.
 562        - **derived** : boolean (default : True). If True, indexes are derived,
 563        else coupled.
 564
 565        *Returns* : None'''
 566        ana = self.analysis
 567        child = [[]] * len(ana)
 568        childroot = []
 569        level = level * len(self)
 570        for idx in range(self.lenindex):
 571            if derived:
 572                iparent = ana.fields[idx].p_distomin.index
 573            else:
 574                iparent = ana.fields[idx].p_distance.index
 575            if iparent == -1:
 576                childroot.append(idx)
 577            else:
 578                child[iparent].append(idx)
 579        for idx in childroot:
 580            self._couplingidx(idx, child, derived, level, ana)
 581
 582    def _couplingidx(self, idx, child, derived, level, ana):
 583        ''' Field coupling (included childrens of the Field)'''
 584        fields = ana.fields
 585        if derived:
 586            iparent = fields[idx].p_distomin.index
 587            dparent = ana.get_relation(*sorted([idx, iparent])).distomin
 588        else:
 589            iparent = fields[idx].p_distance.index
 590            dparent = ana.get_relation(*sorted([idx, iparent])).distance
 591        # if fields[idx].category in ('coupled', 'unique') or iparent == -1\
 592        if fields[idx].category in ('coupled', 'unique') \
 593                or dparent >= level or dparent == 0:
 594            return
 595        if child[idx]:
 596            for childidx in child[idx]:
 597                self._couplingidx(childidx, child, derived, level, ana)
 598        self.lindex[iparent].coupling(self.lindex[idx], derived=derived,
 599                                      duplicate=False)
 600        return
 601
 602    def delrecord(self, record, extern=True):
 603        '''remove a record.
 604
 605        *Parameters*
 606
 607        - **record** :  list - index values to remove to Dataset
 608        - **extern** : if True, compare record values to external representation
 609        of self.value, else, internal
 610
 611        *Returns* : row deleted'''
 612        self.reindex()
 613        reckeys = self.valtokey(record, extern=extern)
 614        if None in reckeys:
 615            return None
 616        row = self.tiindex.index(reckeys)
 617        for idx in self:
 618            del idx[row]
 619        return row
 620
 621    def _fullindex(self, ind, keysadd, indexname, varname, leng, fillvalue, fillextern):
 622        if not varname:
 623            varname = []
 624        idx = self.lindex[ind]
 625        lenadd = len(keysadd[0])
 626        if len(idx) == leng:
 627            return
 628        #inf = self.indexinfos()
 629        ana = self.anafields
 630        parent = ana[ind].p_derived.view('index')
 631        # if inf[ind]['cat'] == 'unique':
 632        if ana[ind].category == 'unique':
 633            idx.set_keys(idx.keys + [0] * lenadd)
 634        elif self.lname[ind] in indexname:
 635            idx.set_keys(idx.keys + keysadd[indexname.index(self.lname[ind])])
 636        # elif inf[ind]['parent'] == -1 or self.lname[ind] in varname:
 637        elif parent == -1 or self.lname[ind] in varname:
 638            fillval = fillvalue
 639            if fillextern:
 640                fillval = self.field.s_to_i(fillvalue)
 641            idx.set_keys(idx.keys + [len(idx.codec)] * len(keysadd[0]))
 642            idx.set_codec(idx.codec + [fillval])
 643        else:
 644            #parent = inf[ind]['parent']
 645            if len(self.lindex[parent]) != leng:
 646                self._fullindex(parent, keysadd, indexname, varname, leng,
 647                                fillvalue, fillextern)
 648            # if inf[ind]['cat'] == 'coupled':
 649            if ana[ind].category == 'coupled':
 650                idx.tocoupled(self.lindex[parent], coupling=True)
 651            else:
 652                idx.tocoupled(self.lindex[parent], coupling=False)
 653
 654    def full(self, reindex=False, idxname=None, varname=None, fillvalue='-',
 655             fillextern=True, inplace=True, canonical=True):
 656        '''tranform a list of indexes in crossed indexes (value extension).
 657
 658        *Parameters*
 659
 660        - **idxname** : list of string - name of indexes to transform
 661        - **varname** : string - name of indexes to use
 662        - **reindex** : boolean (default False) - if True, set default codec
 663        before transformation
 664        - **fillvalue** : object value used for var extension
 665        - **fillextern** : boolean(default True) - if True, fillvalue is converted
 666        to internal value
 667        - **inplace** : boolean (default True) - if True, filter is apply to self,
 668        - **canonical** : boolean (default True) - if True, Field are ordered
 669        in canonical order
 670
 671        *Returns* : self or new Dataset'''
 672        ilis = self if inplace else copy(self)
 673        if not idxname:
 674            idxname = ilis.primaryname
 675        if reindex:
 676            ilis.reindex()
 677        keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname])
 678        if keysadd and len(keysadd) != 0:
 679            newlen = len(keysadd[0]) + len(ilis)
 680            for ind in range(ilis.lenindex):
 681                ilis._fullindex(ind, keysadd, idxname, varname, newlen,
 682                                fillvalue, fillextern)
 683        if canonical:
 684            ilis.setcanonorder()
 685        return ilis
 686
 687    def getduplicates(self, indexname=None, resindex=None, indexview=None):
 688        '''check duplicate cod in a list of indexes. Result is add in a new
 689        index or returned.
 690
 691        *Parameters*
 692
 693        - **indexname** : list of string (default none) - name of indexes to check
 694        (if None, all Field)
 695        - **resindex** : string (default None) - Add a new index named resindex
 696        with check result (False if duplicate)
 697        - **indexview** : list of str (default None) - list of fields to return
 698
 699        *Returns* : list of int - list of rows with duplicate cod '''
 700        if not indexname:
 701            indexname = self.lname
 702        duplicates = []
 703        for name in indexname:
 704            duplicates += self.nindex(name).getduplicates()
 705        if resindex and isinstance(resindex, str):
 706            newidx = self.field([True] * len(self), name=resindex)
 707            for item in duplicates:
 708                newidx[item] = False
 709            self.addindex(newidx)
 710        dupl = tuple(set(duplicates))
 711        if not indexview:
 712            return dupl
 713        return [tuple(self.record(ind, indexview)) for ind in dupl]
 714
 715    def iscanonorder(self):
 716        '''return True if primary indexes have canonical ordered keys'''
 717        primary = self.primary
 718        canonorder = Cutil.canonorder(
 719            [len(self.lidx[idx].codec) for idx in primary])
 720        return canonorder == [self.lidx[idx].keys for idx in primary]
 721
 722    def isinrecord(self, record, extern=True):
 723        '''Check if record is present in self.
 724
 725        *Parameters*
 726
 727        - **record** : list - value for each Field
 728        - **extern** : if True, compare record values to external representation
 729        of self.value, else, internal
 730
 731        *Returns boolean* : True if found'''
 732        if extern:
 733            return record in Cutil.transpose(self.extidxext)
 734        return record in Cutil.transpose(self.extidx)
 735
 736    def idxrecord(self, record):
 737        '''return rec array (without variable) from complete record (with variable)'''
 738        return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))]
 739
 740    def keytoval(self, listkey, extern=True):
 741        '''
 742        convert a keys list (key for each index) to a values list (value for each index).
 743
 744        *Parameters*
 745
 746        - **listkey** : key for each index
 747        - **extern** : boolean (default True) - if True, compare rec to val else to values
 748
 749        *Returns*
 750
 751        - **list** : value for each index'''
 752        return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)]
 753
 754    def loc(self, rec, extern=True, row=False):
 755        '''
 756        Return record or row corresponding to a list of idx values.
 757
 758        *Parameters*
 759
 760        - **rec** : list - value for each idx
 761        - **extern** : boolean (default True) - if True, compare rec to val,
 762        else to values
 763        - **row** : Boolean (default False) - if True, return list of row,
 764        else list of records
 765
 766        *Returns*
 767
 768        - **object** : variable value or None if not found'''
 769        locrow = None
 770        try:
 771            if len(rec) == self.lenindex:
 772                locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern))
 773                                               for i in range(self.lenindex)]))
 774            elif len(rec) == self.lenidx:
 775                locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern))
 776                                               for i in range(self.lenidx)]))
 777        except:
 778            pass
 779        if locrow is None:
 780            return None
 781        if row:
 782            return locrow
 783        return [self.record(locr, extern=extern) for locr in locrow]
 784
 785    def mix(self, other, fillvalue=None):
 786        '''add other Field not included in self and add other's values'''
 787        sname = set(self.lname)
 788        oname = set(other.lname)
 789        newself = copy(self)
 790        copother = copy(other)
 791        for nam in oname - sname:
 792            newself.addindex({nam: [fillvalue] * len(newself)})
 793        for nam in sname - oname:
 794            copother.addindex({nam: [fillvalue] * len(copother)})
 795        return newself.add(copother, name=True, solve=False)
 796
 797    def merging(self, listname=None):
 798        ''' add a new Field build with Field define in listname.
 799        Values of the new Field are set of values in listname Field'''
 800        #self.addindex(Field.merging([self.nindex(name) for name in listname]))
 801        self.addindex(Sfield.merging([self.nindex(name) for name in listname]))
 802
 803    def orindex(self, other, first=False, merge=False, update=False):
 804        ''' Add other's index to self's index (with same length)
 805
 806        *Parameters*
 807
 808        - **other** : self class - object to add
 809        - **first** : Boolean (default False) - If True insert indexes
 810        at the first row, else at the end
 811        - **merge** : Boolean (default False) - create a new index
 812        if merge is False
 813        - **update** : Boolean (default False) - if True, update actual
 814        values if index name is present (and merge is True)
 815
 816        *Returns* : none '''
 817        if len(self) != 0 and len(self) != len(other) and len(other) != 0:
 818            raise DatasetError("the sizes are not equal")
 819        otherc = copy(other)
 820        for idx in otherc.lindex:
 821            self.addindex(idx, first=first, merge=merge, update=update)
 822        return self
 823
 824    def record(self, row, indexname=None, extern=True):
 825        '''return the record at the row
 826
 827        *Parameters*
 828
 829        - **row** : int - row of the record
 830        - **extern** : boolean (default True) - if True, return val record else
 831        value record
 832        - **indexname** : list of str (default None) - list of fields to return
 833        *Returns*
 834
 835        - **list** : val record or value record'''
 836        if indexname is None:
 837            indexname = self.lname
 838        if extern:
 839            record = [idx.val[row] for idx in self.lindex]
 840            #record = [idx.values[row].to_obj() for idx in self.lindex]
 841            #record = [idx.valrow(row) for idx in self.lindex]
 842        else:
 843            record = [idx.values[row] for idx in self.lindex]
 844        return [record[self.lname.index(name)] for name in indexname]
 845
 846    def recidx(self, row, extern=True):
 847        '''return the list of idx val or values at the row
 848
 849        *Parameters*
 850
 851        - **row** : int - row of the record
 852        - **extern** : boolean (default True) - if True, return val rec else value rec
 853
 854        *Returns*
 855
 856        - **list** : val or value for idx'''
 857        if extern:
 858            return [idx.values[row].to_obj() for idx in self.lidx]
 859            # return [idx.valrow(row) for idx in self.lidx]
 860        return [idx.values[row] for idx in self.lidx]
 861
 862    def recvar(self, row, extern=True):
 863        '''return the list of var val or values at the row
 864
 865        *Parameters*
 866
 867        - **row** : int - row of the record
 868        - **extern** : boolean (default True) - if True, return val rec else value rec
 869
 870        *Returns*
 871
 872        - **list** : val or value for var'''
 873        if extern:
 874            return [idx.values[row].to_obj() for idx in self.lvar]
 875            # return [idx.valrow(row) for idx in self.lvar]
 876        return [idx.values[row] for idx in self.lvar]
 877
 878    def setcanonorder(self, reindex=False):
 879        '''Set the canonical index order : primary - secondary/unique - variable.
 880        Set the canonical keys order : ordered keys in the first columns.
 881
 882        *Parameters*
 883        - **reindex** : boolean (default False) - if True, set default codec after
 884        transformation
 885
 886        *Return* : self'''
 887        order = self.primaryname
 888        order += self.secondaryname
 889        order += self.lvarname
 890        order += self.lunicname
 891        self.swapindex(order)
 892        self.sort(reindex=reindex)
 893        # self.analysis.actualize()
 894        return self
 895
 896    def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False):
 897        '''Add a filter index with boolean values
 898
 899        - **filt** : list of boolean - values of the filter idx to add
 900        - **first** : boolean (default False) - If True insert index at the first row,
 901        else at the end
 902        - **filtname** : string (default FILTER) - Name of the filter Field added
 903
 904        *Returns* : self'''
 905        if not filt:
 906            filt = [True] * len(self)
 907        idx = self.field(filt, name=filtname)
 908        idx.reindex()
 909        if not idx.cod in ([True, False], [False, True], [True], [False]):
 910            raise DatasetError('filt is not consistent')
 911        if unique:
 912            for name in self.lname:
 913                if name[:len(FILTER)] == FILTER:
 914                    self.delindex(FILTER)
 915        self.addindex(idx, first=first)
 916        return self
 917
 918    def sort(self, order=None, reverse=False, func=str, reindex=True):
 919        '''Sort data following the index order and apply the ascending or descending
 920        sort function to values.
 921
 922        *Parameters*
 923
 924        - **order** : list (default None)- new order of index to apply. If None or [],
 925        the sort function is applied to the existing order of indexes.
 926        - **reverse** : boolean (default False)- ascending if True, descending if False
 927        - **func**    : function (default str) - parameter key used in the sorted function
 928        - **reindex** : boolean (default True) - if True, apply a new codec order (key = func)
 929
 930        *Returns* : self'''
 931        if not order:
 932            order = list(range(self.lenindex))
 933        orderfull = order + list(set(range(self.lenindex)) - set(order))
 934        if reindex:
 935            for i in order:
 936                self.lindex[i].reindex(codec=sorted(
 937                    self.lindex[i].codec, key=func))
 938        newidx = Cutil.transpose(sorted(Cutil.transpose(
 939            [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]),
 940            reverse=reverse))
 941        for i in range(self.lenindex):
 942            self.lindex[orderfull[i]].set_keys(newidx[i])
 943        return self
 944
 945    """
 946    def swapindex(self, order):
 947        '''
 948        Change the order of the index .
 949
 950        *Parameters*
 951
 952        - **order** : list of int or list of name - new order of index to apply.
 953
 954        *Returns* : self '''
 955        if self.lenindex != len(order):
 956            raise DatasetError('length of order and Dataset different')
 957        if not order or isinstance(order[0], int):
 958            self.lindex = [self.lindex[ind] for ind in order]
 959        elif isinstance(order[0], str):
 960            self.lindex = [self.nindex(name) for name in order]
 961        return self
 962    """
 963
 964    def tostdcodec(self, inplace=False, full=True):
 965        '''Transform all codec in full or default codec.
 966
 967        *Parameters*
 968
 969        - **inplace** : boolean  (default False) - if True apply transformation
 970        to self, else to a new Dataset
 971        - **full** : boolean (default True)- full codec if True, default if False
 972
 973
 974        *Return Dataset* : self or new Dataset'''
 975        lindex = [idx.tostdcodec(inplace=False, full=full)
 976                  for idx in self.lindex]
 977        if inplace:
 978            self.lindex = lindex
 979            return self
 980        return self.__class__(lindex, self.lvarname)
 981
 982    def updateindex(self, listvalue, index, extern=True):
 983        '''update values of an index.
 984
 985        *Parameters*
 986
 987        - **listvalue** : list - index values to replace
 988        - **index** : integer - index row to update
 989        - **extern** : if True, the listvalue has external representation, else internal
 990
 991        *Returns* : none '''
 992        self.lindex[index].setlistvalue(listvalue, extern=extern)
 993
 994    def valtokey(self, rec, extern=True):
 995        '''convert a record list (value or val for each idx) to a key list
 996        (key for each index).
 997
 998        *Parameters*
 999
1000        - **rec** : list of value or val for each index
1001        - **extern** : if True, the rec value has external representation, else internal
1002
1003        *Returns*
1004
1005        - **list of int** : record key for each index'''
1006        return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)]

Sdataset is a child class of Cdataset where internal value can be different from external value (list is converted in tuple and dict in json-object).

One attribute is added: 'field' to define the 'field' class.

The methods defined in this class are :

constructor (@classmethod)

dynamic value - module analysis (getters @property)

dynamic value (getters @property)

  • Cdataset.keys
  • Cdataset.iindex
  • Cdataset.indexlen
  • Cdataset.lenindex
  • Cdataset.lname
  • Cdataset.tiindex

global value (getters @property)

selecting - infos methods

add - update methods

structure management - methods

exports methods (observation.dataset_interface.DatasetInterface)

  • Dataset.json
  • Dataset.plot
  • Dataset.to_obj
  • Dataset.to_csv
  • Dataset.to_dataframe
  • Dataset.to_file
  • Dataset.to_ntv
  • Dataset.to_obj
  • Dataset.to_xarray
  • Dataset.view
  • Dataset.vlist
  • Dataset.voxel
Sdataset(listidx=None, name=None, reindex=True)
153    def __init__(self, listidx=None, name=None, reindex=True):
154        '''
155        Dataset constructor.
156
157        *Parameters*
158
159        - **listidx** :  list (default None) - list of Field data
160        - **name** :  string (default None) - name of the dataset
161        - **reindex** : boolean (default True) - if True, default codec for each Field'''
162
163        self.field = self.field_class
164        Cdataset.__init__(self, listidx, name, reindex=reindex)

Dataset constructor.

Parameters

  • listidx : list (default None) - list of Field data
  • name : string (default None) - name of the dataset
  • reindex : boolean (default True) - if True, default codec for each Field
@classmethod
def from_csv( cls, filename='dataset.csv', header=True, nrow=None, decode_str=True, decode_json=True, optcsv={'quoting': 2}):
166    @classmethod
167    def from_csv(cls, filename='dataset.csv', header=True, nrow=None, decode_str=True,
168                 decode_json=True, optcsv={'quoting': csv.QUOTE_NONNUMERIC}):
169        '''
170        Dataset constructor (from a csv file). Each column represents index values.
171
172        *Parameters*
173
174        - **filename** : string (default 'dataset.csv'), name of the file to read
175        - **header** : boolean (default True). If True, the first raw is dedicated to names
176        - **nrow** : integer (default None). Number of row. If None, all the row else nrow
177        - **optcsv** : dict (default : quoting) - see csv.reader options'''
178        if not optcsv:
179            optcsv = {}
180        if not nrow:
181            nrow = -1
182        with open(filename, newline='', encoding="utf-8") as file:
183            reader = csv.reader(file, **optcsv)
184            irow = 0
185            for row in reader:
186                if irow == nrow:
187                    break
188                if irow == 0:
189                    idxval = [[] for i in range(len(row))]
190                    idxname = [''] * len(row)
191                if irow == 0 and header:
192                    idxname = row
193                else:
194                    for i in range(len(row)):
195                        if decode_json:
196                            try:
197                                idxval[i].append(json.loads(row[i]))
198                            except:
199                                idxval[i].append(row[i])
200                        else:
201                            idxval[i].append(row[i])
202                irow += 1
203        lindex = [cls.field_class.from_ntv(
204            {name: idx}, decode_str=decode_str) for idx, name in zip(idxval, idxname)]
205        return cls(listidx=lindex, reindex=True)

Dataset constructor (from a csv file). Each column represents index values.

Parameters

  • filename : string (default 'dataset.csv'), name of the file to read
  • header : boolean (default True). If True, the first raw is dedicated to names
  • nrow : integer (default None). Number of row. If None, all the row else nrow
  • optcsv : dict (default : quoting) - see csv.reader options
@classmethod
def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
207    @classmethod
208    def from_file(cls, filename, forcestring=False, reindex=True, decode_str=False):
209        '''
210        Generate Object from file storage.
211
212         *Parameters*
213
214        - **filename** : string - file name (with path)
215        - **forcestring** : boolean (default False) - if True,
216        forces the UTF-8 data format, else the format is calculated
217        - **reindex** : boolean (default True) - if True, default codec for each Field
218        - **decode_str**: boolean (default False) - if True, string are loaded in json data
219
220        *Returns* : new Object'''
221        with open(filename, 'rb') as file:
222            btype = file.read(1)
223        if btype == bytes('[', 'UTF-8') or btype == bytes('{', 'UTF-8') or forcestring:
224            with open(filename, 'r', newline='', encoding="utf-8") as file:
225                bjson = file.read()
226        else:
227            with open(filename, 'rb') as file:
228                bjson = file.read()
229        return cls.from_ntv(bjson, reindex=reindex, decode_str=decode_str)

Generate Object from file storage.

Parameters

  • filename : string - file name (with path)
  • forcestring : boolean (default False) - if True, forces the UTF-8 data format, else the format is calculated
  • reindex : boolean (default True) - if True, default codec for each Field
  • decode_str: boolean (default False) - if True, string are loaded in json data

Returns : new Object

def merge(self, fillvalue=nan, reindex=False, simplename=False):
231    def merge(self, fillvalue=math.nan, reindex=False, simplename=False):
232        '''
233        Merge method replaces Dataset objects included into its constituents.
234
235        *Parameters*
236
237        - **fillvalue** : object (default nan) - value used for the additional data
238        - **reindex** : boolean (default False) - if True, set default codec after transformation
239        - **simplename** : boolean (default False) - if True, new Field name are
240        the same as merged Field name else it is a composed name.
241
242        *Returns*: merged Dataset '''
243        ilc = copy(self)
244        delname = []
245        row = ilc[0]
246        if not isinstance(row, list):
247            row = [row]
248        merged, oldname, newname = self.__class__._mergerecord(
249            self.ext(row, ilc.lname), simplename=simplename, fillvalue=fillvalue,
250            reindex=reindex)
251        delname.append(oldname)
252        for ind in range(1, len(ilc)):
253            oldidx = ilc.nindex(oldname)
254            for name in newname:
255                ilc.addindex(self.field(oldidx.codec, name, oldidx.keys))
256            row = ilc[ind]
257            if not isinstance(row, list):
258                row = [row]
259            rec, oldname, newname = self.__class__._mergerecord(
260                self.ext(row, ilc.lname), simplename=simplename)
261            if oldname and newname != [oldname]:
262                delname.append(oldname)
263            for name in newname:
264                oldidx = merged.nindex(oldname)
265                fillval = self.field.s_to_i(fillvalue)
266                merged.addindex(
267                    self.field([fillval] * len(merged), name, oldidx.keys))
268            merged += rec
269        for name in set(delname):
270            if name:
271                merged.delindex(name)
272        if reindex:
273            merged.reindex()
274        ilc.lindex = merged.lindex
275        return ilc

Merge method replaces Dataset objects included into its constituents.

Parameters

  • fillvalue : object (default nan) - value used for the additional data
  • reindex : boolean (default False) - if True, set default codec after transformation
  • simplename : boolean (default False) - if True, new Field name are the same as merged Field name else it is a composed name.

Returns: merged Dataset

@classmethod
def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
277    @classmethod
278    def ext(cls, idxval=None, idxname=None, reindex=True, fast=False):
279        '''
280        Dataset constructor (external index).
281
282        *Parameters*
283
284        - **idxval** : list of Field or list of values (see data model)
285        - **idxname** : list of string (default None) - list of Field name (see data model)'''
286        if idxval is None:
287            idxval = []
288        if not isinstance(idxval, list):
289            return None
290        val = []
291        for idx in idxval:
292            if not isinstance(idx, list):
293                val.append([idx])
294            else:
295                val.append(idx)
296        lenval = [len(idx) for idx in val]
297        if lenval and max(lenval) != min(lenval):
298            raise DatasetError('the length of Iindex are different')
299        length = lenval[0] if lenval else 0
300        idxname = [None] * len(val) if idxname is None else idxname
301        for ind, name in enumerate(idxname):
302            if name is None or name == '$default':
303                idxname[ind] = 'i'+str(ind)
304        lindex = [cls.field_class(codec, name, lendefault=length, reindex=reindex,
305                                  fast=fast) for codec, name in zip(val, idxname)]
306        return cls(lindex, reindex=False)

Dataset constructor (external index).

Parameters

  • idxval : list of Field or list of values (see data model)
  • idxname : list of string (default None) - list of Field name (see data model)
consistent

True if all the record are different

extidx

idx values (see data model)

extidxext

idx val (see data model)

idxname

list of idx name

idxlen

list of idx codec length

iidx

list of keys for each idx

lenidx

number of idx

lidx

list of idx

lisvar

list of boolean : True if Field is var

lvar

list of var

lunic

list of unic index

lvarrow

list of var row

lidxrow

list of idx row

primary

list of primary idx

secondary

list of secondary idx

setidx

list of codec for each idx

zip

return a zip format for transpose(extidx) : tuple(tuple(rec))

def addindex(self, index, first=False, merge=False, update=False):
474    def addindex(self, index, first=False, merge=False, update=False):
475        '''add a new index.
476
477        *Parameters*
478
479        - **index** : Field - index to add (can be index Ntv representation)
480        - **first** : If True insert index at the first row, else at the end
481        - **merge** : create a new index if merge is False
482        - **update** : if True, update actual values if index name is present (and merge is True)
483
484        *Returns* : none '''
485        idx = self.field.ntv(index)
486        idxname = self.lname
487        if len(idx) != len(self) and len(self) > 0:
488            raise DatasetError('sizes are different')
489        if not idx.name in idxname:
490            if first:
491                self.lindex.insert(0, idx)
492            else:
493                self.lindex.append(idx)
494        elif not merge:  # si idx.name in idxname
495            while idx.name in idxname:
496                idx.name += '(2)'
497            if first:
498                self.lindex.insert(0, idx)
499            else:
500                self.lindex.append(idx)
501        elif update:  # si merge et si idx.name in idxname
502            self.lindex[idxname.index(idx.name)].setlistvalue(idx.values)

add a new index.

Parameters

  • index : Field - index to add (can be index Ntv representation)
  • first : If True insert index at the first row, else at the end
  • merge : create a new index if merge is False
  • update : if True, update actual values if index name is present (and merge is True)

Returns : none

def append(self, record, unique=False):
504    def append(self, record, unique=False):
505        '''add a new record.
506
507        *Parameters*
508
509        - **record** :  list of new index values to add to Dataset
510        - **unique** :  boolean (default False) - Append isn't done if unique
511        is True and record present
512
513        *Returns* : list - key record'''
514        if self.lenindex != len(record):
515            raise DatasetError('len(record) not consistent')
516        record = self.field.l_to_i(record)
517        if self.isinrecord(self.idxrecord(record), False) and unique:
518            return None
519        return [self.lindex[i].append(record[i]) for i in range(self.lenindex)]

add a new record.

Parameters

  • record : list of new index values to add to Dataset
  • unique : boolean (default False) - Append isn't done if unique is True and record present

Returns : list - key record

def applyfilter( self, reverse=False, filtname='$filter', delfilter=True, inplace=True):
521    def applyfilter(self, reverse=False, filtname=FILTER, delfilter=True, inplace=True):
522        '''delete records with defined filter value.
523        Filter is deleted after record filtering.
524
525        *Parameters*
526
527        - **reverse** :  boolean (default False) - delete record with filter's
528        value is reverse
529        - **filtname** : string (default FILTER) - Name of the filter Field added
530        - **delfilter** :  boolean (default True) - If True, delete filter's Field
531        - **inplace** : boolean (default True) - if True, filter is apply to self,
532
533        *Returns* : self or new Dataset'''
534        if not filtname in self.lname:
535            return None
536        if inplace:
537            ilis = self
538        else:
539            ilis = copy(self)
540        ifilt = ilis.lname.index(filtname)
541        ilis.sort([ifilt], reverse=not reverse, func=None)
542        lisind = ilis.lindex[ifilt].recordfromvalue(reverse)
543        if lisind:
544            minind = min(lisind)
545            for idx in ilis.lindex:
546                del idx.keys[minind:]
547        if inplace:
548            self.delindex(filtname)
549        else:
550            ilis.delindex(filtname)
551            if delfilter:
552                self.delindex(filtname)
553        ilis.reindex()
554        return ilis

delete records with defined filter value. Filter is deleted after record filtering.

Parameters

  • reverse : boolean (default False) - delete record with filter's value is reverse
  • filtname : string (default FILTER) - Name of the filter Field added
  • delfilter : boolean (default True) - If True, delete filter's Field
  • inplace : boolean (default True) - if True, filter is apply to self,

Returns : self or new Dataset

def coupling(self, derived=True, level=0.1):
556    def coupling(self, derived=True, level=0.1):
557        '''Transform idx with low dist in coupled or derived indexes (codec extension).
558
559        *Parameters*
560
561        - **level** : float (default 0.1) - param threshold to apply coupling.
562        - **derived** : boolean (default : True). If True, indexes are derived,
563        else coupled.
564
565        *Returns* : None'''
566        ana = self.analysis
567        child = [[]] * len(ana)
568        childroot = []
569        level = level * len(self)
570        for idx in range(self.lenindex):
571            if derived:
572                iparent = ana.fields[idx].p_distomin.index
573            else:
574                iparent = ana.fields[idx].p_distance.index
575            if iparent == -1:
576                childroot.append(idx)
577            else:
578                child[iparent].append(idx)
579        for idx in childroot:
580            self._couplingidx(idx, child, derived, level, ana)

Transform idx with low dist in coupled or derived indexes (codec extension).

Parameters

  • level : float (default 0.1) - param threshold to apply coupling.
  • derived : boolean (default : True). If True, indexes are derived, else coupled.

Returns : None

def delrecord(self, record, extern=True):
602    def delrecord(self, record, extern=True):
603        '''remove a record.
604
605        *Parameters*
606
607        - **record** :  list - index values to remove to Dataset
608        - **extern** : if True, compare record values to external representation
609        of self.value, else, internal
610
611        *Returns* : row deleted'''
612        self.reindex()
613        reckeys = self.valtokey(record, extern=extern)
614        if None in reckeys:
615            return None
616        row = self.tiindex.index(reckeys)
617        for idx in self:
618            del idx[row]
619        return row

remove a record.

Parameters

  • record : list - index values to remove to Dataset
  • extern : if True, compare record values to external representation of self.value, else, internal

Returns : row deleted

def full( self, reindex=False, idxname=None, varname=None, fillvalue='-', fillextern=True, inplace=True, canonical=True):
654    def full(self, reindex=False, idxname=None, varname=None, fillvalue='-',
655             fillextern=True, inplace=True, canonical=True):
656        '''tranform a list of indexes in crossed indexes (value extension).
657
658        *Parameters*
659
660        - **idxname** : list of string - name of indexes to transform
661        - **varname** : string - name of indexes to use
662        - **reindex** : boolean (default False) - if True, set default codec
663        before transformation
664        - **fillvalue** : object value used for var extension
665        - **fillextern** : boolean(default True) - if True, fillvalue is converted
666        to internal value
667        - **inplace** : boolean (default True) - if True, filter is apply to self,
668        - **canonical** : boolean (default True) - if True, Field are ordered
669        in canonical order
670
671        *Returns* : self or new Dataset'''
672        ilis = self if inplace else copy(self)
673        if not idxname:
674            idxname = ilis.primaryname
675        if reindex:
676            ilis.reindex()
677        keysadd = Cutil.idxfull([ilis.nindex(name) for name in idxname])
678        if keysadd and len(keysadd) != 0:
679            newlen = len(keysadd[0]) + len(ilis)
680            for ind in range(ilis.lenindex):
681                ilis._fullindex(ind, keysadd, idxname, varname, newlen,
682                                fillvalue, fillextern)
683        if canonical:
684            ilis.setcanonorder()
685        return ilis

tranform a list of indexes in crossed indexes (value extension).

Parameters

  • idxname : list of string - name of indexes to transform
  • varname : string - name of indexes to use
  • reindex : boolean (default False) - if True, set default codec before transformation
  • fillvalue : object value used for var extension
  • fillextern : boolean(default True) - if True, fillvalue is converted to internal value
  • inplace : boolean (default True) - if True, filter is apply to self,
  • canonical : boolean (default True) - if True, Field are ordered in canonical order

Returns : self or new Dataset

def getduplicates(self, indexname=None, resindex=None, indexview=None):
687    def getduplicates(self, indexname=None, resindex=None, indexview=None):
688        '''check duplicate cod in a list of indexes. Result is add in a new
689        index or returned.
690
691        *Parameters*
692
693        - **indexname** : list of string (default none) - name of indexes to check
694        (if None, all Field)
695        - **resindex** : string (default None) - Add a new index named resindex
696        with check result (False if duplicate)
697        - **indexview** : list of str (default None) - list of fields to return
698
699        *Returns* : list of int - list of rows with duplicate cod '''
700        if not indexname:
701            indexname = self.lname
702        duplicates = []
703        for name in indexname:
704            duplicates += self.nindex(name).getduplicates()
705        if resindex and isinstance(resindex, str):
706            newidx = self.field([True] * len(self), name=resindex)
707            for item in duplicates:
708                newidx[item] = False
709            self.addindex(newidx)
710        dupl = tuple(set(duplicates))
711        if not indexview:
712            return dupl
713        return [tuple(self.record(ind, indexview)) for ind in dupl]

check duplicate cod in a list of indexes. Result is add in a new index or returned.

Parameters

  • indexname : list of string (default none) - name of indexes to check (if None, all Field)
  • resindex : string (default None) - Add a new index named resindex with check result (False if duplicate)
  • indexview : list of str (default None) - list of fields to return

Returns : list of int - list of rows with duplicate cod

def iscanonorder(self):
715    def iscanonorder(self):
716        '''return True if primary indexes have canonical ordered keys'''
717        primary = self.primary
718        canonorder = Cutil.canonorder(
719            [len(self.lidx[idx].codec) for idx in primary])
720        return canonorder == [self.lidx[idx].keys for idx in primary]

return True if primary indexes have canonical ordered keys

def isinrecord(self, record, extern=True):
722    def isinrecord(self, record, extern=True):
723        '''Check if record is present in self.
724
725        *Parameters*
726
727        - **record** : list - value for each Field
728        - **extern** : if True, compare record values to external representation
729        of self.value, else, internal
730
731        *Returns boolean* : True if found'''
732        if extern:
733            return record in Cutil.transpose(self.extidxext)
734        return record in Cutil.transpose(self.extidx)

Check if record is present in self.

Parameters

  • record : list - value for each Field
  • extern : if True, compare record values to external representation of self.value, else, internal

Returns boolean : True if found

def idxrecord(self, record):
736    def idxrecord(self, record):
737        '''return rec array (without variable) from complete record (with variable)'''
738        return [record[self.lidxrow[i]] for i in range(len(self.lidxrow))]

return rec array (without variable) from complete record (with variable)

def keytoval(self, listkey, extern=True):
740    def keytoval(self, listkey, extern=True):
741        '''
742        convert a keys list (key for each index) to a values list (value for each index).
743
744        *Parameters*
745
746        - **listkey** : key for each index
747        - **extern** : boolean (default True) - if True, compare rec to val else to values
748
749        *Returns*
750
751        - **list** : value for each index'''
752        return [idx.keytoval(key, extern=extern) for idx, key in zip(self.lindex, listkey)]

convert a keys list (key for each index) to a values list (value for each index).

Parameters

  • listkey : key for each index
  • extern : boolean (default True) - if True, compare rec to val else to values

Returns

  • list : value for each index
def loc(self, rec, extern=True, row=False):
754    def loc(self, rec, extern=True, row=False):
755        '''
756        Return record or row corresponding to a list of idx values.
757
758        *Parameters*
759
760        - **rec** : list - value for each idx
761        - **extern** : boolean (default True) - if True, compare rec to val,
762        else to values
763        - **row** : Boolean (default False) - if True, return list of row,
764        else list of records
765
766        *Returns*
767
768        - **object** : variable value or None if not found'''
769        locrow = None
770        try:
771            if len(rec) == self.lenindex:
772                locrow = list(set.intersection(*[set(self.lindex[i].loc(rec[i], extern))
773                                               for i in range(self.lenindex)]))
774            elif len(rec) == self.lenidx:
775                locrow = list(set.intersection(*[set(self.lidx[i].loc(rec[i], extern))
776                                               for i in range(self.lenidx)]))
777        except:
778            pass
779        if locrow is None:
780            return None
781        if row:
782            return locrow
783        return [self.record(locr, extern=extern) for locr in locrow]

Return record or row corresponding to a list of idx values.

Parameters

  • rec : list - value for each idx
  • extern : boolean (default True) - if True, compare rec to val, else to values
  • row : Boolean (default False) - if True, return list of row, else list of records

Returns

  • object : variable value or None if not found
def mix(self, other, fillvalue=None):
785    def mix(self, other, fillvalue=None):
786        '''add other Field not included in self and add other's values'''
787        sname = set(self.lname)
788        oname = set(other.lname)
789        newself = copy(self)
790        copother = copy(other)
791        for nam in oname - sname:
792            newself.addindex({nam: [fillvalue] * len(newself)})
793        for nam in sname - oname:
794            copother.addindex({nam: [fillvalue] * len(copother)})
795        return newself.add(copother, name=True, solve=False)

add other Field not included in self and add other's values

def merging(self, listname=None):
797    def merging(self, listname=None):
798        ''' add a new Field build with Field define in listname.
799        Values of the new Field are set of values in listname Field'''
800        #self.addindex(Field.merging([self.nindex(name) for name in listname]))
801        self.addindex(Sfield.merging([self.nindex(name) for name in listname]))

add a new Field build with Field define in listname. Values of the new Field are set of values in listname Field

def orindex(self, other, first=False, merge=False, update=False):
803    def orindex(self, other, first=False, merge=False, update=False):
804        ''' Add other's index to self's index (with same length)
805
806        *Parameters*
807
808        - **other** : self class - object to add
809        - **first** : Boolean (default False) - If True insert indexes
810        at the first row, else at the end
811        - **merge** : Boolean (default False) - create a new index
812        if merge is False
813        - **update** : Boolean (default False) - if True, update actual
814        values if index name is present (and merge is True)
815
816        *Returns* : none '''
817        if len(self) != 0 and len(self) != len(other) and len(other) != 0:
818            raise DatasetError("the sizes are not equal")
819        otherc = copy(other)
820        for idx in otherc.lindex:
821            self.addindex(idx, first=first, merge=merge, update=update)
822        return self

Add other's index to self's index (with same length)

Parameters

  • other : self class - object to add
  • first : Boolean (default False) - If True insert indexes at the first row, else at the end
  • merge : Boolean (default False) - create a new index if merge is False
  • update : Boolean (default False) - if True, update actual values if index name is present (and merge is True)

Returns : none

def record(self, row, indexname=None, extern=True):
824    def record(self, row, indexname=None, extern=True):
825        '''return the record at the row
826
827        *Parameters*
828
829        - **row** : int - row of the record
830        - **extern** : boolean (default True) - if True, return val record else
831        value record
832        - **indexname** : list of str (default None) - list of fields to return
833        *Returns*
834
835        - **list** : val record or value record'''
836        if indexname is None:
837            indexname = self.lname
838        if extern:
839            record = [idx.val[row] for idx in self.lindex]
840            #record = [idx.values[row].to_obj() for idx in self.lindex]
841            #record = [idx.valrow(row) for idx in self.lindex]
842        else:
843            record = [idx.values[row] for idx in self.lindex]
844        return [record[self.lname.index(name)] for name in indexname]

return the record at the row

Parameters

  • row : int - row of the record
  • extern : boolean (default True) - if True, return val record else value record
  • indexname : list of str (default None) - list of fields to return Returns

  • list : val record or value record

def recidx(self, row, extern=True):
846    def recidx(self, row, extern=True):
847        '''return the list of idx val or values at the row
848
849        *Parameters*
850
851        - **row** : int - row of the record
852        - **extern** : boolean (default True) - if True, return val rec else value rec
853
854        *Returns*
855
856        - **list** : val or value for idx'''
857        if extern:
858            return [idx.values[row].to_obj() for idx in self.lidx]
859            # return [idx.valrow(row) for idx in self.lidx]
860        return [idx.values[row] for idx in self.lidx]

return the list of idx val or values at the row

Parameters

  • row : int - row of the record
  • extern : boolean (default True) - if True, return val rec else value rec

Returns

  • list : val or value for idx
def recvar(self, row, extern=True):
862    def recvar(self, row, extern=True):
863        '''return the list of var val or values at the row
864
865        *Parameters*
866
867        - **row** : int - row of the record
868        - **extern** : boolean (default True) - if True, return val rec else value rec
869
870        *Returns*
871
872        - **list** : val or value for var'''
873        if extern:
874            return [idx.values[row].to_obj() for idx in self.lvar]
875            # return [idx.valrow(row) for idx in self.lvar]
876        return [idx.values[row] for idx in self.lvar]

return the list of var val or values at the row

Parameters

  • row : int - row of the record
  • extern : boolean (default True) - if True, return val rec else value rec

Returns

  • list : val or value for var
def setcanonorder(self, reindex=False):
878    def setcanonorder(self, reindex=False):
879        '''Set the canonical index order : primary - secondary/unique - variable.
880        Set the canonical keys order : ordered keys in the first columns.
881
882        *Parameters*
883        - **reindex** : boolean (default False) - if True, set default codec after
884        transformation
885
886        *Return* : self'''
887        order = self.primaryname
888        order += self.secondaryname
889        order += self.lvarname
890        order += self.lunicname
891        self.swapindex(order)
892        self.sort(reindex=reindex)
893        # self.analysis.actualize()
894        return self

Set the canonical index order : primary - secondary/unique - variable. Set the canonical keys order : ordered keys in the first columns.

Parameters

  • reindex : boolean (default False) - if True, set default codec after transformation

Return : self

def setfilter(self, filt=None, first=False, filtname='$filter', unique=False):
896    def setfilter(self, filt=None, first=False, filtname=FILTER, unique=False):
897        '''Add a filter index with boolean values
898
899        - **filt** : list of boolean - values of the filter idx to add
900        - **first** : boolean (default False) - If True insert index at the first row,
901        else at the end
902        - **filtname** : string (default FILTER) - Name of the filter Field added
903
904        *Returns* : self'''
905        if not filt:
906            filt = [True] * len(self)
907        idx = self.field(filt, name=filtname)
908        idx.reindex()
909        if not idx.cod in ([True, False], [False, True], [True], [False]):
910            raise DatasetError('filt is not consistent')
911        if unique:
912            for name in self.lname:
913                if name[:len(FILTER)] == FILTER:
914                    self.delindex(FILTER)
915        self.addindex(idx, first=first)
916        return self

Add a filter index with boolean values

  • filt : list of boolean - values of the filter idx to add
  • first : boolean (default False) - If True insert index at the first row, else at the end
  • filtname : string (default FILTER) - Name of the filter Field added

Returns : self

def sort(self, order=None, reverse=False, func=<class 'str'>, reindex=True):
918    def sort(self, order=None, reverse=False, func=str, reindex=True):
919        '''Sort data following the index order and apply the ascending or descending
920        sort function to values.
921
922        *Parameters*
923
924        - **order** : list (default None)- new order of index to apply. If None or [],
925        the sort function is applied to the existing order of indexes.
926        - **reverse** : boolean (default False)- ascending if True, descending if False
927        - **func**    : function (default str) - parameter key used in the sorted function
928        - **reindex** : boolean (default True) - if True, apply a new codec order (key = func)
929
930        *Returns* : self'''
931        if not order:
932            order = list(range(self.lenindex))
933        orderfull = order + list(set(range(self.lenindex)) - set(order))
934        if reindex:
935            for i in order:
936                self.lindex[i].reindex(codec=sorted(
937                    self.lindex[i].codec, key=func))
938        newidx = Cutil.transpose(sorted(Cutil.transpose(
939            [self.lindex[orderfull[i]].keys for i in range(self.lenindex)]),
940            reverse=reverse))
941        for i in range(self.lenindex):
942            self.lindex[orderfull[i]].set_keys(newidx[i])
943        return self

Sort data following the index order and apply the ascending or descending sort function to values.

Parameters

  • order : list (default None)- new order of index to apply. If None or [], the sort function is applied to the existing order of indexes.
  • reverse : boolean (default False)- ascending if True, descending if False
  • func : function (default str) - parameter key used in the sorted function
  • reindex : boolean (default True) - if True, apply a new codec order (key = func)

Returns : self

def tostdcodec(self, inplace=False, full=True):
964    def tostdcodec(self, inplace=False, full=True):
965        '''Transform all codec in full or default codec.
966
967        *Parameters*
968
969        - **inplace** : boolean  (default False) - if True apply transformation
970        to self, else to a new Dataset
971        - **full** : boolean (default True)- full codec if True, default if False
972
973
974        *Return Dataset* : self or new Dataset'''
975        lindex = [idx.tostdcodec(inplace=False, full=full)
976                  for idx in self.lindex]
977        if inplace:
978            self.lindex = lindex
979            return self
980        return self.__class__(lindex, self.lvarname)

Transform all codec in full or default codec.

Parameters

  • inplace : boolean (default False) - if True apply transformation to self, else to a new Dataset
  • full : boolean (default True)- full codec if True, default if False

Return Dataset : self or new Dataset

def updateindex(self, listvalue, index, extern=True):
982    def updateindex(self, listvalue, index, extern=True):
983        '''update values of an index.
984
985        *Parameters*
986
987        - **listvalue** : list - index values to replace
988        - **index** : integer - index row to update
989        - **extern** : if True, the listvalue has external representation, else internal
990
991        *Returns* : none '''
992        self.lindex[index].setlistvalue(listvalue, extern=extern)

update values of an index.

Parameters

  • listvalue : list - index values to replace
  • index : integer - index row to update
  • extern : if True, the listvalue has external representation, else internal

Returns : none

def valtokey(self, rec, extern=True):
 994    def valtokey(self, rec, extern=True):
 995        '''convert a record list (value or val for each idx) to a key list
 996        (key for each index).
 997
 998        *Parameters*
 999
1000        - **rec** : list of value or val for each index
1001        - **extern** : if True, the rec value has external representation, else internal
1002
1003        *Returns*
1004
1005        - **list of int** : record key for each index'''
1006        return [idx.valtokey(val, extern=extern) for idx, val in zip(self.lindex, rec)]

convert a record list (value or val for each idx) to a key list (key for each index).

Parameters

  • rec : list of value or val for each index
  • extern : if True, the rec value has external representation, else internal

Returns

  • list of int : record key for each index
Inherited Members
tab_dataset.dataset_interface.DatasetInterface
json
plot
to_csv
to_dataframe
to_file
to_ntv
to_xarray
voxel
view
vlist
tab_dataset.cdataset.Cdataset
indexlen
iindex
keys
lenindex
lunicname
lunicrow
lname
tiindex
ntv
from_ntv
add
to_analysis
reindex
delindex
nindex
renameindex
reorder
setname
swapindex
check_relation
check_relationship
tab_dataset.cdataset.DatasetAnalysis
analysis
anafields
partitions
complete
dimension
lvarname
primaryname
secondaryname
indexinfos
field_partition
relation
tree
indicator
class Ndataset(Sdataset):
1008class Ndataset(Sdataset):
1009    # %% Ndataset
1010    '''    
1011    `Ndataset` is a child class of Cdataset where internal value are NTV entities.
1012    
1013    All the methods are the same as `Sdataset`.
1014    '''
1015    field_class = Nfield

Ndataset is a child class of Cdataset where internal value are NTV entities.

All the methods are the same as Sdataset.

Inherited Members
Sdataset
Sdataset
from_csv
from_file
merge
ext
consistent
extidx
extidxext
idxname
idxlen
iidx
lenidx
lidx
lisvar
lvar
lunic
lvarrow
lidxrow
primary
secondary
setidx
zip
addindex
append
applyfilter
coupling
delrecord
full
getduplicates
iscanonorder
isinrecord
idxrecord
keytoval
loc
mix
merging
orindex
record
recidx
recvar
setcanonorder
setfilter
sort
tostdcodec
updateindex
valtokey
tab_dataset.dataset_interface.DatasetInterface
json
plot
to_csv
to_dataframe
to_file
to_ntv
to_xarray
voxel
view
vlist
tab_dataset.cdataset.Cdataset
indexlen
iindex
keys
lenindex
lunicname
lunicrow
lname
tiindex
ntv
from_ntv
add
to_analysis
reindex
delindex
nindex
renameindex
reorder
setname
swapindex
check_relation
check_relationship
tab_dataset.cdataset.DatasetAnalysis
analysis
anafields
partitions
complete
dimension
lvarname
primaryname
secondaryname
indexinfos
field_partition
relation
tree
indicator