python.observation.essearch

How to use observation.essearch ?

Create the MongoDB database: You can easily create an account on MongoDB. Once you have a database, follow the guidelines here to connect to it using pymongo. All you need in order to be able to use this module is to be able to connect to the Collection with pymongo.
Fill the database with your data: Construct an observation or a list of observations containing your data using dedicated functions from python.observation.Observation. You can then use insert_mongo(collection, observation) to insert it in the database.
Write a request using observation.ESSearch: An ESSearch instance must be created with either a MongoDB Collection (passed as argument collection) or a list of observations (passed as argument data). Criteria for the query are then added one by one using ESSearch.addCondition or ESSearch.orCondition, or all together with ESSearch.addConditions or passed as argument parameters of ESSearch.

A condition is composed of:

a path giving which element is concerned by the condition;
an operand which is the item of the comparison (if omitted, the existence of the path is tested);
a comparator which can be applied on the operand, for example '>=' or 'within' (defaults to equality in most cases);
optional parameters detailed in ESSearch.addCondition documentation, like inverted to add a not.

Execute the research with ESSearch.execute(). Put the parameter single to True if you want the return to be a single observation instead of a list of observations.

Example of python code using observation.essearch module:

from pymongo import MongoClient
from observation.essearch import ESSearch
import datetime

client = Mongoclient(<Mongo-auth>)
collection = client[<base>][<collection>]

# In this example, we search for measures of property PM25 taken between 2022/01/01 
# and 2022/31/12 and we ensure the measure is an Observation.
# We execute with argument single = True to merge the result in one single
# Observation.

# Option 1
srch = ESSearch(collection)
srch.addCondition('datation', datetime.datetime(2022, 1, 1), '>=')
srch.addCondition('datation', datetime.datetime(2022, 12, 31), '<=')
srch.addCondition('property', 'PM25')
srch.addCondition(path = 'type', comparator = '==', operand = 'observation')
result = srch.execute(single = True)

# Option 2 (equivalent to option 1 but on one line)
result = ESSearch(collection,
                [['datation', datetime.datetime(2022, 1, 1), '>='], 
                ['datation', datetime.datetime(2022, 12, 31), '<='], 
                ['property', 'PM25'], 
                {'path': 'type', 'comparator': '==', 'operand': 'observation'}] 
                ).execute(single = True)

View Source

   1'''
   2# How to use observation.essearch ?
   3
   41. **Create the MongoDB database:**
   5You can easily create an account on MongoDB. Once you have a database, follow the guidelines [here](https://www.mongodb.com/docs/atlas/tutorial/connect-to-your-cluster/#connect-to-your-atlas-cluster) to connect to it using pymongo.
   6All you need in order to be able to use this module is to be able to connect to the Collection with pymongo.
   7
   82. **Fill the database with your data:**
   9Construct an observation or a list of observations containing your data using dedicated functions from `python.observation.Observation`. 
  10You can then use `insert_mongo(collection, observation)` to insert it in the database.
  11
  123. **Write a request using observation.ESSearch:**
  13An `ESSearch` instance must be created with either a MongoDB Collection (passed as argument **collection**) or a list of observations (passed as argument **data**).
  14Criteria for the query are then added one by one using `ESSearch.addCondition` or `ESSearch.orCondition`, or all together with `ESSearch.addConditions` or passed as argument **parameters** of ESSearch.
  15
  16A condition is composed of:
  17- a **path** giving which element is concerned by the condition;
  18- an **operand** which is the item of the comparison (if omitted, the existence of the path is tested);
  19- a **comparator** which can be applied on the operand, for example '>=' or 'within' (defaults to equality in most cases);
  20- optional parameters detailed in `ESSearch.addCondition` documentation, like **inverted** to add a *not*.
  21
  22Execute the research with `ESSearch.execute()`. Put the parameter **single** to True if you want the return to be a single observation
  23instead of a list of observations.
  24
  25Example of python code using observation.essearch module:
  26```python
  27from pymongo import MongoClient
  28from observation.essearch import ESSearch
  29import datetime
  30
  31client = Mongoclient(<Mongo-auth>)
  32collection = client[<base>][<collection>]
  33
  34# In this example, we search for measures of property PM25 taken between 2022/01/01 
  35# and 2022/31/12 and we ensure the measure is an Observation.
  36# We execute with argument single = True to merge the result in one single
  37# Observation.
  38
  39# Option 1
  40srch = ESSearch(collection)
  41srch.addCondition('datation', datetime.datetime(2022, 1, 1), '>=')
  42srch.addCondition('datation', datetime.datetime(2022, 12, 31), '<=')
  43srch.addCondition('property', 'PM25')
  44srch.addCondition(path = 'type', comparator = '==', operand = 'observation')
  45result = srch.execute(single = True)
  46
  47# Option 2 (equivalent to option 1 but on one line)
  48result = ESSearch(collection,
  49                [['datation', datetime.datetime(2022, 1, 1), '>='], 
  50                ['datation', datetime.datetime(2022, 12, 31), '<='], 
  51                ['property', 'PM25'], 
  52                {'path': 'type', 'comparator': '==', 'operand': 'observation'}] 
  53                ).execute(single = True)
  54```
  55'''
  56import datetime
  57import shapely.geometry
  58from pymongo.collection import Collection
  59from pymongo.cursor import Cursor
  60from pymongo.command_cursor import CommandCursor
  61import bson
  62
  63from observation.esobservation import Observation
  64from observation.field import Field
  65from observation.util import util
  66from observation.timeslot import TimeSlot
  67
  68dico_alias_mongo = { # dictionnary of the different names accepted for each comparator and each given type. <key>:<value> -> <accepted name>:<name in MongoDB>
  69    # any type other than those used as keys is considered non valid
  70    str : {
  71        None:"$eq",
  72        "eq":"$eq", "=":"$eq", "==":"$eq", "$eq":"$eq",
  73        "in":"$in", "$in":"$in",
  74        "regex":"$regex", "$regex":"$regex",
  75        "oid":"$oid","$oid":"$oid"
  76    },
  77    int : {
  78        None:"$eq",
  79        "eq":"$eq", "=":"$eq", "==":"$eq", "$eq":"$eq",
  80        "gte":"$gte", ">=":"$gte", "=>":"$gte", "$gte":"$gte",
  81        "gt":"$gt", ">":"$gt", "$gt":"$gt",
  82        "lte":"$lte", "<=":"$lte", "=<":"$lte", "$lte":"$lte",
  83        "lt":"$lt", "<":"$lt", "$lt":"$lt",
  84        "in":"$in", "$in":"$in"
  85    },
  86    datetime.datetime : {
  87        None:"$eq",
  88        "eq":"$eq", "=":"$eq", "==":"$eq", "$eq":"$eq",
  89        "gte":"$gte", ">=":"$gte", "=>":"$gte", "$gte":"$gte",
  90        "gt":"$gt", ">":"$gt", "$gt":"$gt",
  91        "lte":"$lte", "<=":"$lte", "=<":"$lte", "$lte":"$lte",
  92        "lt":"$lt", "<":"$lt", "$lt":"$lt",
  93        "in":"$in", "$in":"$in"
  94    },
  95    TimeSlot : {
  96        None:"within",
  97        "eq":"within", "=":"within", "==":"within", "$eq":"within", "within":"within", "within":"within",
  98        "contains":"intersects", "$contains":"intersects",
  99        "in":"within", "$in":"within", "within":"within", "$within":"within",
 100        "disjoint":"disjoint", "$disjoint":"disjoint",
 101        "intersects":"intersects", "$intersects":"intersects"
 102    },
 103    list : { # lists are interpreted as geometries
 104        None:"$geoIntersects",
 105        "eq":"equals", "=":"equals", "==":"equals", "$eq":"equals", "equals":"equals", "$equals":"equals",
 106        "$geowithin":"$geoWithin", "geowithin":"$geoWithin", "$geoWithin":"$geoWithin", "geoWithin":"$geoWithin", "within":"$geoWithin", "$within":"$geoWithin",
 107        "disjoint":"disjoint", "$disjoint":"disjoint",
 108        "intersects":"$geoIntersects", "$intersects":"$geoIntersects", "geoIntersects":"$geoIntersects", "$geointersects":"$geoIntersects", "geoIntersects":"$geoIntersects", "$geoIntersects":"$geoIntersects",
 109        "touches":"touches", "$touches":"touches",
 110        "overlaps":"overlaps", "$overlaps":"overlaps",
 111        "contains":"contains", "$contains":"contains",
 112        "$geoNear":"$geoNear", "$geonear":"$geoNear", "geonear":"$geoNear", "geoNear":"$geoNear",
 113        
 114        "in":"$in", "$in":"$in" # only in case where a list is not a geometry
 115    },
 116    #bson.objectid.ObjectId : {
 117    bson.ObjectId : {
 118        None:"$eq",
 119        "eq":"$eq", "=":"$eq", "==":"$eq", "$eq":"$eq",
 120        "in":"$in", "$in":"$in"
 121    }
 122}
 123dico_alias_mongo[float] = dico_alias_mongo[int]
 124
 125_geoeq    = lambda x, y: x.equals(y)
 126_geowith  = lambda x, y: x.within(y)
 127_geodis   = lambda x, y: x.disjoint(y)
 128_geointer = lambda x, y: x.intersects(y)
 129_geotou   = lambda x, y: x.touches(y)
 130_geoover  = lambda x, y: x.overlaps(y)
 131_geocont  = lambda x, y: x.contains(y)
 132_geonear  = lambda x, y: True
 133
 134_defeq    = lambda x, y: x == y
 135_defsupeq = lambda x, y: x >= y
 136_defsup   = lambda x, y: x > y
 137_definfeq = lambda x, y: x <= y
 138_definf   = lambda x, y: x < y
 139_defin    = lambda x, y: x in y
 140
 141_timinfeq = lambda x, y: x.bounds[0] <= y # at least one element of the TimeSlot is lte y
 142_timinf   = lambda x, y: x.bounds[0] < y  # at least one element of the TimeSlot is lt y
 143_timsupeq = lambda x, y: x.bounds[1] >= y # at least one element of the TimeSlot is gte y
 144_timsup   = lambda x, y: x.bounds[1] > y  # at least one element of the TimeSlot is gt y
 145_timeq    = lambda x, y: x == TimeSlot(y)
 146# To have all elements verify a comparison instead of just one, combine with parameter inverted.
 147# For example : not (at least one element of the TimeSlot is lte y) <=> all elements of the TimeSlot are gt y
 148
 149dico_alias_python = {
 150    TimeSlot : { # only used in python filtering part
 151        TimeSlot : { # comparison of a TimeSlot with a timeSlot
 152            None:"equals",
 153            "eq":"equals", "=":"equals", "==":"equals", "$eq":"equals", "equals":"equals", "$equals":"equals",
 154            "contains":"contains", "$contains":"contains",
 155            "in":"within", "$in":"within", "within":"within", "$within":"within",
 156            "disjoint":"disjoint", "$disjoint":"disjoint",
 157            "intersects":"intersects", "$intersects":"intersects"
 158        },
 159            
 160        datetime.datetime : { # comparison of a datetime and a TimeSlot
 161            None:_timeq,
 162            "eq":_timeq, "=":_timeq, "==":_timeq, "$eq":_timeq, "equals":_timeq, "$equals":_timeq,
 163            "$gte":_timsupeq, "gte":_timsupeq, ">=":_timsupeq, "=>":_timsupeq,
 164            "$gt":_timsup, "gt":_timsup, ">":_timsup,
 165            "$lte":_timinfeq, "lte":_timinfeq, "<=":_timinfeq, "=<":_timinfeq,
 166            "$lt":_timinf, "lt":_timinf, "<":_timinf
 167        }
 168    },
 169    'geometry' : { # lists are interpreted as geometries
 170        None:_geointer,
 171        "eq":_geoeq, "=":_geoeq, "==":_geoeq, "$eq":_geoeq, "equals":_geoeq, "$equals":_geoeq,
 172        "$geowithin":_geowith, "geowithin":_geowith, "$geoWithin":_geowith, "geoWithin":_geowith, "within":_geowith, "$within":_geowith,
 173        "disjoint":_geodis, "$disjoint":_geodis,
 174        "intersects":_geointer, "$intersects":_geointer, "geoIntersects":_geointer, "$geointersects":_geointer, "geoIntersects":_geointer, "$geoIntersects":_geointer,
 175        "touches":_geotou, "$touches":_geotou,
 176        "overlaps":_geoover, "$overlaps":_geoover,
 177        "contains":_geocont, "$contains":_geocont,
 178        "$geoNear":_geonear, "$geonear":_geonear, "geonear":_geonear, "geoNear":_geonear
 179    },
 180    'default' : {
 181        None:_defeq,
 182        "eq":_defeq, "=":_defeq, "==":_defeq, "$eq":_defeq,
 183        "gte":_defsupeq, ">=":_defsupeq, "=>":_defsupeq, "$gte":_defsupeq,
 184        "gt":_defsup, ">":_defsup, "$gt":_defsup,
 185        "lte":_definfeq, "<=":"$lte", "=<":_definfeq,
 186        "lt":_definf, "<":_definf, "$lt":_definf,
 187        "in":_defin, "$in":_defin
 188    }
 189}
 190
 191def insert_from_doc(collection, document , info=True):
 192    '''Inserts all observations from a document into a collection, where each line of the document corresponds to an observation.'''
 193    with open(document, 'r') as doc:
 194        for line in doc:
 195            try: insert_to_mongo(collection, line, info)
 196            except: pass
 197
 198def insert_to_mongo(collection, obj, info=False): # Mieux avec panda ?
 199    '''Takes an observation or a list of observations and inserts them into a MongoDB collection, with info by default.'''
 200    # Faire une fonction pour permettre l'ajout direct de fichiers csv.
 201    inserted_list = []
 202    if isinstance(obj, list):
 203        pile = obj
 204    elif isinstance(obj, Observation):
 205        pile = [obj]
 206    else:
 207        pile = [Observation.from_obj(obj)]
 208    for obs in pile:
 209        if obs.id: obs_hash = obs.id
 210        else: obs_hash = hash(obs)
 211        metadata = {'id': obs_hash}
 212        if obs.name : metadata['name']  = obs.name
 213        if obs.param: metadata['param'] = obs.param
 214        if info: metadata['information'] = Observation._info(True, True)
 215        if len(obs.lname) == 1: # a special case is needed because lists with one element are replaced by the element itself so iteration doesn't work
 216            for line in obs:
 217                inserted_list.append({obs.lname[0]: util.json(line, encoded=False, typevalue=None, simpleval=False, geojson=True),
 218                                        '_metadata': metadata})
 219        elif len(obs.lname) > 1:
 220            for line in obs:
 221                inserted_list.append({obs.lname[i]: util.json(line[i], encoded=False, typevalue=None, simpleval=False, geojson=True) 
 222                                                                for i in range(len(line))} | {'_metadata': metadata})
 223    if inserted_list != []: collection.insert_many(inserted_list)
 224
 225def empty_request(collection):
 226    """
 227    Empty request to get an idea of what the database contains.
 228    Currently returns the count of elements in the collection and the name of each column.
 229    """
 230    count = 0
 231    column_names = []
 232    cursor = collection.find()
 233    for doc in cursor:
 234        count += 1
 235        for column_name in doc:
 236            if column_name not in column_names:
 237                column_names.append(column_name)
 238    return {'count': count, 'column_names': column_names}
 239
 240class ESSearch:
 241    """
 242    An `ESSearch` is defined as an ensemble of conditions to be used to execute a MongoDB request or any iterable containing only observations.
 243
 244    *Attributes (for @property, see methods)* :
 245
 246    - **input** : input on which the query is done. One of or a list of these : 
 247        - pymongo.collection.Collection
 248        - pymongo.cursor.Cursor
 249        - pymongo.command_cursor.CommandCursor
 250        - Observation (can be defined from a str or a dict)
 251    - **parameters** : list of list of conditions for queries, to be interpreted as : parameters = [[cond_1 AND cond_2 AND cond_3] OR [cond_4 AND cond_5 AND cond_6]] where conds are criteria for queries
 252    - **hide** : list of paths to hide from the output
 253    - **heavy** : boolean indicating whether the request should search for nested values or not. Does not work with geoJSON.
 254    - **sources** : attribute used to indicate the sources of the data in param
 255
 256    The methods defined in this class are (documentations in methods definitions):
 257    
 258    *setter*
 259
 260    - `ESSearch.addInput`
 261    - `ESSearch.removeInputs`
 262    - `ESSearch.setHide`
 263    - `ESSearch.setHeavy`
 264    - `ESSearch.clear`
 265    
 266    *dynamic value (getter @property)*
 267
 268    - `ESSearch.request`
 269    - `ESSearch.cursor`
 270    
 271    *parameters for query - update methods*
 272
 273    - `ESSearch.addConditions`
 274    - `ESSearch.addCondition`
 275    - `ESSearch.orCondition`
 276    - `ESSearch.removeCondition`
 277    - `ESSearch.clearConditions`
 278
 279    *query method*
 280
 281    - `ESSearch.execute`
 282    """
 283    def __init__(self,
 284                    input = None,
 285                    parameters = None,
 286                    hide = [],
 287                    heavy = False,
 288                    sources = None, 
 289                    **kwargs
 290                    ):
 291        '''
 292        ESSearch constructor. Parameters can also be defined and updated using class methods.
 293
 294        *Arguments*
 295
 296        - **input** : input on which the query is done. Must be one of or a list of these (can be nested):
 297            - pymongo.collection.Collection
 298            - pymongo.cursor.Cursor
 299            - pymongo.command_cursor.CommandCursor
 300            - Observation
 301            - str corresponding to a json Observation
 302            - dict corresponding to a json Observation
 303        - **parameters** :  dict, list (default None) - list of list or list of dictionnaries whose keys are arguments of ESSearch.addCondition method
 304        ex: parameters = [
 305            {'name' : 'datation', 'operand' : datetime.datetime(2022, 9, 19, 1), 'comparator' : '>='},
 306            {'name' : 'property', 'operand' : 'PM2'}
 307        ]
 308        - **hide** : list (default []) - List of strings where strings correspond to paths to remove from the output
 309        - **heavy** :  bool (default False) - Must be True when values are defined directly and inside dictionnaries simultaneously.
 310        - **sources** : (default None) - Optional parameter indicating the sources of the data in case when a query is executed with parameter single = True.
 311        - **kwargs** :  other parameters are used as arguments for ESSearch.addCondition method.
 312        '''
 313        self.parameters = [[]]                                          # self.parameters
 314        if isinstance(hide, list): self.hide = hide                     # self.hide
 315        else: raise TypeError("hide must be a list.")
 316
 317        if isinstance(heavy, bool): self.heavy = heavy                  # self.heavy
 318        else: raise TypeError("heavy must be a bool.")
 319        self.sources = sources                                          # self.sources
 320
 321        self.input = [[], []]                                           # self.input : formatted as [[Mongo Objects], [Observations]] (list of two lists)
 322        if isinstance(input, list): pile = input
 323        else: pile = [input]
 324        while not len(pile) == 0:
 325            obj = pile.pop()
 326            if isinstance(obj, list):
 327                pile += obj
 328            elif isinstance(obj, (Collection, Cursor, CommandCursor)):
 329                self.input[0].append(obj)
 330            elif isinstance(obj,  Observation):
 331                self.input[1].append(obj)
 332            elif isinstance(obj, (str, dict)):
 333                try:
 334                    self.input[1].append(Observation.from_obj(obj))
 335                except:
 336                    raise ValueError("Cannot convert " + str(obj) + " to an Observation.")
 337            elif obj is not None:
 338                raise TypeError("Unsupported type for input " + str(obj))
 339
 340        if parameters: self.addConditions(parameters)
 341        if kwargs: self.addCondition(**kwargs)
 342
 343    def __repr__(self):
 344        return "ESSearch(input = " + str(self.input) + ", parameters = " + str(self.parameters) + ")"
 345
 346    def __str__(self):
 347        return str(self.parameters)
 348
 349    def __iter__(self):
 350        self.n = -1
 351        return self
 352
 353    def __next__(self):
 354        if self.n < len(self.parameters)-1:
 355            self.n += 1
 356            return self.parameters[self.n]
 357        else:
 358            raise StopIteration
 359
 360    def __getitem__(self, key):
 361        return self.parameters[key]
 362
 363    def addInput(self, input):
 364        """
 365        Adds one or many inputs on which the query is to be executed given by argument input.
 366        Inputs can be:
 367            - pymongo.collection.Collection
 368            - pymongo.cursor.Cursor
 369            - pymongo.command_cursor.CommandCursor
 370            - Observation
 371            - str corresponding to a json Observation
 372            - dict corresponding to a json Observation
 373        or a list of any of these.
 374        """
 375        added_input = [[], []]
 376        if isinstance(input, list): pile = input
 377        else: pile = [input]
 378        while not len(pile) == 0: # using a stack (LIFO) to allow easy treatment of nested data.
 379            obj = pile.pop()
 380            if isinstance(obj, list):
 381                pile += obj
 382            elif isinstance(obj, (Collection, Cursor, CommandCursor)):
 383                added_input[0].append(obj)
 384            elif isinstance(obj,  Observation):
 385                added_input[1].append(obj)
 386            elif isinstance(obj, (str, dict)):
 387                try:
 388                    added_input[1].append(Observation.from_obj(obj))
 389                except:
 390                    raise ValueError("Cannot convert " + str(obj) + " to an Observation.")
 391            elif obj is not None:
 392                raise TypeError("Unsupported type for input " + str(obj))
 393        self.input[0] += added_input[0] # self.input is updated with the new inputs
 394        self.input[1] += added_input[1]
 395
 396    def removeInputs(self):
 397        """
 398        Removes all inputs from self.
 399        """
 400        self.input = [[], []]
 401
 402    def setHide(self, hide):
 403        '''
 404        Sets self.hide to a value given by argument hide.
 405        '''
 406        if isinstance(hide, list): self.hide = hide
 407        else: raise TypeError("hide must be a list.")
 408        
 409    def setHeavy(self, heavy):
 410        '''
 411        Sets self.heavy to a value given by argument heavy.
 412        '''
 413        if isinstance(heavy, list): self.heavy = heavy
 414        else: raise TypeError("heavy must be a bool.")
 415
 416    def setSources(self, sources):
 417        '''
 418        Sets self.sources to a value given by argument sources.
 419        '''
 420        self.sources = sources
 421
 422    def addConditions(self, parameters):
 423        '''
 424        Takes multiple parameters and applies self.addCondition() on each of them.
 425        '''
 426        if isinstance(parameters, dict): # case when one single condition is added
 427            self.addCondition(**parameters)
 428        elif isinstance(parameters, (list, tuple)): # case when several conditions are added
 429            for parameter in parameters:
 430                if isinstance(parameter, dict): self.addCondition(**parameter)
 431                elif isinstance(parameters, (list, tuple)): self.addCondition(*parameter)
 432                else: self.addCondition(parameter)
 433        else: raise TypeError("parameters must be either a dict or a list of dict.")
 434            
 435    def addCondition(self, path, operand = None, comparator = None, or_position = -1, **kwargs):
 436        '''
 437        Takes parameters and inserts corresponding query condition in self.parameters.
 438
 439        *Parameters*
 440
 441        - **path** :  str (required argument) - name of an IIndex, which corresponds to an Dataset column name, or name of a metadata element.
 442                    (ex: 'datation', 'location', 'property')
 443
 444        - **operand** :  - (default None) - Object used for the comparison.
 445                    (ex: if we search for observations made in Paris, operand is 'Paris')
 446
 447        - **comparator**:  str (default None) - str giving the comparator to use. (ex: '>=', 'in')
 448        
 449        - **or_position** :  int (default -1) - position in self.parameters in which the condition is to be inserted.
 450
 451        - **formatstring** :  str (default None) - str to use to automatically change str to datetime before applying condition.
 452                    Does not update the data base. If value is set to 'default', format is assumed to be Isoformat.
 453        
 454        - **inverted** :  bool (default None) - to add a "not" in the condition.
 455                    To use in case where every element of a MongoDB array (equivalent to python list) must verify the condition (by default, condition is verified when at least one element of the array verifies it).
 456        
 457        - **unwind** :  int (default None) - int corresponding to the number of additional {"$unwind" : "$" + path} to be added in the beginning of the query.
 458        
 459        - **regex_options** :  str (default None) - str associated to regex options (i, m, x and s). See [this link](https://www.mongodb.com/docs/manual/reference/operator/query/regex/) for more details.
 460
 461        no comparator => default comparator associated with operand type in dico_alias_mongo is used (mainly equality)
 462        no operand => only the existence of something located at path is tested
 463        '''
 464
 465        ## 1. Check if arguments given are valid.
 466
 467        if not isinstance(path, str): raise TypeError("name must be a str.")
 468        if comparator is not None and not isinstance(comparator, str): raise TypeError("comparator must be a str.")
 469        if or_position is not None and not isinstance(or_position, int): raise TypeError("or_position must be an int.")
 470
 471        for item in kwargs: # checking if parameters in kwarg do exist
 472            if item not in {'formatstring', 'inverted', 'unwind', 'regex_options', 'distanceField', 'distanceMultiplier', 'includeLocs', 'key', 'maxDistance', 'minDistance', 'near', 'query', 'spherical'}:
 473                raise ValueError("Unknown parameter : ", item)
 474
 475        if isinstance(operand, datetime.datetime) and (operand.tzinfo is None or operand.tzinfo.utcoffset(operand) is None):
 476            operand = operand.replace(tzinfo=datetime.timezone.utc)
 477
 478        if operand: # checking if comparator can be applied on the operand
 479            try: comparator = dico_alias_mongo[type(operand)][comparator]
 480            except: raise ValueError("Incompatible values for comparator and operand. Ensure parameters are in the correct order.")
 481        elif comparator:
 482            raise ValueError("operand must be defined when comparator is used.")
 483        
 484        ## 2. Add the condition to self.parameters
 485
 486        condition = {"comparator" : comparator, "operand" : operand, "path" : path} | kwargs
 487
 488        if or_position >= len(self.parameters):
 489            self.parameters.append([condition])
 490        else:
 491            self.parameters[or_position].append(condition)
 492
 493    def orCondition(self, *args, **kwargs):
 494        '''
 495        Adds a condition in a new sublist in self.parameters. Separations in sublists correspond to "or" in the query.
 496        '''
 497        self.addCondition(or_position = len(self.parameters), *args, **kwargs)
 498
 499    def removeCondition(self, or_position = None, condnum = None):
 500        '''
 501        Removes a condition from self.parameters. By default, last element added is removed.
 502        Otherwise, the removed condition is the one at self.parameters[or_position][condnum].
 503
 504        To remove all conditions, use ESSearch.clearConditions() method.
 505        '''
 506        if self.parameters == [[]]: return
 507        if or_position is None:
 508            if condnum is None: # by default : remove the very last added condition.
 509                if len(self.parameters[-1]) > 1: self.parameters[-1].pop(-1)
 510                else: self.parameters.pop(-1)
 511            else:
 512                if len(self.parameters[-1]) > 1 or condnum > 1: self.parameters[-1].pop(condnum)
 513                else: self.parameters.pop(-1)
 514        else:
 515            if condnum is None or (len(self.parameters[or_position]) == 1 and condnum == 0): self.parameters.pop(or_position) # if or_position is not None and condnum is, the whole sublist at or_position is removed.
 516            else: self.parameters[or_position].pop(condnum)
 517        if self.parameters == []: # ensure self.parameters returns to its default value after being emptied
 518            self.parameters = [[]]
 519
 520    def clearConditions(self):
 521        '''
 522        Removes all conditions from self.parameters.
 523        To remove all attributes, use ESSearch.clear() method.
 524        '''
 525        self.parameters = [[]]
 526
 527    def clear(self):
 528        '''
 529        Resets self.
 530        (Creating a new Observation would be smarter than using this function.)
 531        '''
 532        self = ESSearch()
 533
 534    def _cond(self, or_pos, operand, comparator, path, inverted = False, formatstring = None, unwind = None, regex_options = None, **kwargs):
 535        '''
 536        Takes parameters and adds corresponding MongoDB expression to self._match.
 537        self._unwind and self._set are updated when necessary.
 538        '''
 539        if unwind:
 540            if isinstance(unwind, str):
 541                self._unwind.append(unwind)
 542            elif isinstance(unwind, int):
 543                for _ in range(unwind): self._unwind.append(path)
 544            elif isinstance(unwind, tuple): # format : (<path>, <unwind quantity>)
 545                for _ in range(unwind[1]): self._unwind.append(unwind[0])
 546            else: raise TypeError("unwind must be a tuple, a str or an int.")
 547
 548        if self.heavy and operand is not None:
 549            if path not in self._heavystages: self._heavystages.add(path) # peut-être mieux de laisser l'utilisateur choisir manuellement
 550            path = "_" + path + ".v"
 551
 552        if operand is None: # no operand => we only test if there is something located at path
 553            comparator = "$exists"
 554            operand = 1
 555        else:
 556            try: comparator = dico_alias_mongo[type(operand)][comparator] #global variable
 557            except:
 558                if formatstring:
 559                    try: comparator = dico_alias_mongo[datetime.datetime][comparator]
 560                    except: raise ValueError("Comparator not allowed.")
 561                elif isinstance(operand, shapely.geometry.base.BaseGeometry):
 562                    operand = {"type" : operand.geom_type, "coordinates" : list(operand.exterior.coords)}
 563                else: raise ValueError("Comparator not allowed.")
 564
 565        ##if path in {"$year", "$month", "$dayOfMonth", "$hour", "$minute", "$second", "$millisecond", "$dayOfYear", "$dayOfWeek"}:
 566        ##    self._set |= {path[1:]: {'datation' : path}} #à tester
 567        ##    path = datation
 568        ##    self._project |= {name[1:]:0}
 569
 570        if isinstance(operand, TimeSlot): #equals->within, contains->intersects, within, disjoint, intersects
 571            self._filtered = True
 572            if comparator == "within":
 573                self._cond(or_pos, operand[0].start, "$gte", path, False)
 574                self._cond(or_pos, operand[-1].end, "$lte", path, False)
 575            elif comparator == "intersects":
 576                self._cond(or_pos, operand[0].start, "$lte", path, False) # pourquoi False et pas inverted ici ??
 577                self._cond(or_pos, operand[-1].end, "$gte", path, False)
 578            return
 579
 580        if formatstring:
 581            if formatstring == "default":
 582                if isinstance(operand, str):
 583                    operand = datetime.datetime.fromisoformat(operand)
 584                self._set |= {path : {"$convert": {"input" : "$" + path, "to" : "date", "onError" : "$" + path}}}
 585            else:
 586                if isinstance(operand, str):
 587                    datetime.datetime.strptime(operand, formatstring)
 588                self._set |= {path : {"$dateFromString" : {"dateString" : "$" + path, "format": formatstring, "onError": "$" + path}}}
 589
 590        if comparator in {"$geoIntersects", "$geoWithin"}:  # operand :
 591                                                            # [x, y] or [[x, y]] -> Point ;
 592                                                            # [[x1, y1], [x2, y2]] -> LineString ;
 593                                                            # [[x1, y1], [x2, y2], [x3, y3], ...] or [[x1, y1], [x2, y2], [x3, y3], ..., [x1, y1]] or [[[x1, y1], [x2, y2], [x3, y3], ..., [x1, y1]]] -> Polygon.
 594            if isinstance(operand, list):
 595                if not isinstance(operand[0], list):
 596                    geom_type = "Point"
 597                    coordinates = operand
 598                elif not isinstance(operand[0][0], list):
 599                    if len(operand) == 1:
 600                        geom_type = "Point"
 601                        coordinates = operand[0]
 602                    elif len(operand) == 2:
 603                        geom_type = "LineString"
 604                        coordinates = operand
 605                    elif len(operand) > 2:
 606                        if not operand[-1] == operand[0]:
 607                            operand.append(operand[0])
 608                        geom_type = "Polygon"
 609                        coordinates = [operand]
 610                    else: raise ValueError("Unable to define a geometry from " + str(operand))
 611                else:
 612                    geom_type = "Polygon"
 613                    coordinates = operand
 614                operand = {"$geometry" : {"type" : geom_type, "coordinates" : coordinates}}
 615            elif isinstance(operand, dict) and '$geometry' not in operand:
 616                operand = {"$geometry" : operand}
 617        elif comparator == "$geoNear": # $geoNear is a MongoDB stage
 618            self._geonear = self._geonear | kwargs
 619            if 'distanceField' not in self._geonear: raise ValueError("distanceField missing in MongoDB stage $geoNear.")
 620            return
 621        elif isinstance(operand, list): # lists are interpreted as geometries. An additional filtering is necessary for geometry-specific functions
 622            self._filtered = True
 623            return
 624        
 625        if comparator == "$regex" and regex_options:
 626            cond_0 = {"$regex" : operand, "$options" : regex_options}
 627        else:
 628            cond_0 = {comparator : operand}
 629        
 630        if inverted:
 631            if path in self._match[or_pos]:
 632                if "$nor" in self._match[or_pos][path]:
 633                    self._match[or_pos][path]["$nor"].append(cond_0)
 634                elif "not" in self._match[or_pos][path]:
 635                    self._match[or_pos][path]["$nor"] = [self._match[or_pos][path]["$not"], cond_0]
 636                    del self._match[or_pos][path]["$not"]
 637                else:
 638                    self._match[or_pos][path]["$not"] = cond_0
 639            else:
 640                self._match[or_pos][path] = {"$not" : cond_0}
 641        else:
 642            if path not in self._match[or_pos]:
 643                self._match[or_pos][path] = cond_0
 644            else:
 645                self._match[or_pos][path] |= cond_0
 646
 647    def _fullSearchMongo(self):
 648        """
 649        Takes self.parameters and returns a MongoDB Aggregation query.
 650        """
 651        ## ESSearch._fullSearchMongo() 1: Declare private variables
 652
 653        request = []
 654        self._match = []
 655        self._unwind = []
 656        self._heavystages = set() # two additional set stages to treat nested objects
 657        self._set = {}
 658        self._geonear = {}
 659        self._match = []
 660        self._project = {"_id" : 0}
 661        for el in self.hide: self._project |= {el : 0}
 662        
 663        ## ESSearch._fullSearchMongo() 2: Update private variables for each condition
 664
 665        for i in range(len(self.parameters)): # rewriting conditions in MongoDB format
 666            self._match.append({})
 667            for cond in self.parameters[i]:
 668                self._cond(or_pos = i, **cond)
 669
 670        ## ESSearch._fullSearchMongo() 3: Case 1 : find request
 671
 672        if not self._unwind and not self.heavy and not self._set and not self._geonear: # collection.find() request
 673            if self._match:
 674                j = 0
 675                for i in range(len(self._match)):
 676                    if self._match[i] and j != i: # removing empty elements in place
 677                        self._match[j] = self._match[i]
 678                        j += 1
 679                if j == 0: # when there is no $or
 680                    if self._match[0]: match = self._match[0]
 681                else: # when there is a $or
 682                    match = {"$or": self._match[:j]}
 683            return 'find', match
 684
 685        ## ESSearch._fullSearchMongo() 4: Case 2 : aggregate request
 686
 687        else:
 688            if self._unwind:                                                    # Mongo $unwind stage
 689                for unwind in self._unwind:
 690                    request.append({"$unwind" : "$" + unwind})
 691            if self._heavystages:                                               # Additional Mongo $set stage if self.heavy is True
 692                heavy = {}
 693                for path in self._heavystages:
 694                    heavy |= {"_"+path:{"$cond":{"if":{"$eq":[{"$type":"$"+path},"object"]},"then":{"$objectToArray":"$"+path},"else": {"v":"$"+path}}}}
 695                    self._project |= {'_' + path: 0}
 696                request.append({"$set" : heavy})
 697            if self._set: request.append({"$set" : self._set})                  # Mongo $set stage
 698            if self._geonear: request.append({"$geoNear" : self._geonear})      # Mongo $geoNear stage
 699            if self._match:                                                     # Mongo $match stage
 700                j = 0
 701                for i in range(len(self._match)):
 702                    if self._match[i] and j != i:
 703                        self._match[j] = self._match[i]
 704                        j += 1
 705                if j == 0: # when there is no $or
 706                    if self._match[0]: request.append({"$match" : self._match[0]})
 707                else: # when there is a $or
 708                    request.append({"$match" : {"$or": self._match[:j]}})
 709            if self._unwind:                                                    # Second Mongo $set stage when unwind not empty
 710                dico = {}
 711                for unwind in self._unwind:
 712                    if not unwind in dico: dico[unwind] = ["$" + unwind]
 713                    else: dico[unwind] = [dico[unwind]]
 714                request.append({"$set" : dico})
 715            if self._project: request.append({"$project" : self._project})      # Mongo $project stage
 716            return 'aggregation', request
 717
 718    @property
 719    def request(self):
 720        '''
 721        Getter returning the content of the query or aggregation query to be executed with ESSearch.execute().
 722        '''
 723        request_type, request_content = self._fullSearchMongo()
 724
 725        if request_type == 'find':
 726            return 'collection.find(' + str(request_content) + ', ' + str(self._project) + ')'
 727        else:
 728            return 'collection.aggregate(' + str(request_content) + ')'
 729
 730    @property
 731    def cursor(self):
 732        '''
 733        Getter returning the cursors of the query or aggregation query result on all collections and cursors contained in self.input.
 734        '''
 735        request_type, request_content = self._fullSearchMongo()
 736        
 737        cursor_list = []
 738        for item in self.input[0]: # Determine the result cursor for each element of the input on which a Mongo query makes sense
 739            if isinstance(item, (Collection, Cursor, CommandCursor)):
 740                if request_type == 'find':
 741                    cursor_list.append(item.find(request_content, self._project))
 742                else:
 743                    cursor_list.append(item.aggregate(request_content))
 744        if len(cursor_list) == 1:
 745            return cursor_list[0]
 746        else:
 747            return cursor_list
 748
 749
 750    def execute(self, returnmode = 'observation', fillvalue = None, name = None, param = None):
 751        '''
 752        Executes the request and returns its result, either in one or many Observations.
 753
 754        *Parameter*
 755
 756        - **returnmode** : str (default None) - Parameter giving the format of the output:
 757            - 'unchanged' : output is returned as it is in the database, some operations like operations sepcific to TimeSlot object are not performed;
 758            - 'observation': Each element is returned as an observation, but original observations aren't recreated;
 759            - 'idfused': observations whose ids are the same are merged together; 
 760            - 'single': return a single observation merging all observations together.
 761        - **fillvalue** :  (default None) - Value to use to fill gaps when observations are merged together.
 762        - **name** : str (default None) - name of the output observation when returnmode is 'single'.
 763        - **param** : dict (default None) - param of the output observation when returnmode is 'single'.
 764        '''
 765        if returnmode not in {'unchanged', 'observation', 'idfused', 'single'}: raise ValueError("returnmode must have one of these values: 'unchanged', 'observation', 'idfused', 'single'.")
 766        if returnmode == 'single':
 767            if name  is not None and not isinstance(name, str)  : raise TypeError("name should be a string.")
 768            if param is not None and not isinstance(param, dict): raise TypeError("param should be a dictionnary.")
 769        self._filtered = False # Boolean put to True inside of self._cond() if an additional filtering specific to TimeSlot and shapely geometries is necessary.
 770        
 771        ## Construction of a result list where data are in the format given by returnmode
 772        
 773        ## ESSearch.execute() 1: Query is executed on each Mongo Collection or Cursor of self.input
 774
 775        result = []
 776        for data in self.input[0]:
 777            request_type, request_content = self._fullSearchMongo()
 778            if request_type == 'find':
 779                cursor = data.find(request_content, self._project)
 780            else:
 781                cursor = data.aggregate(request_content)
 782
 783            if returnmode == 'observation': # Only in this case is an observation created directly.
 784                for item in cursor:
 785                    if self._filtered: # Additional filtering for objects like TimeSlot who need it
 786                        for conds in self.parameters:
 787                            checks_parameters, checks_conds = True, True
 788                            for cond in conds:
 789                                if cond['path'] in item:
 790                                    try: checks_conds = checks_conds and self._condcheck(item[cond['path']], cond) # checking for each condition if it is satisfied
 791                                    except: checks_conds = False
 792                                else:
 793                                    checks_conds = False
 794                            checks_parameters = checks_parameters or checks_conds
 795                    dic = {}
 796                    if '_metadata' in item:
 797                        if 'name'  in item['_metadata']: dic['name']  = item['_metadata']['name']
 798                        if 'param' in item['_metadata']: dic['param'] = item['_metadata']['param']
 799                        del item['_metadata']
 800                    dic['idxdic'] = {key: [item[key]] for key in item}
 801                    if not self._filtered or (self._filtered and checks_parameters): result.append(Observation.dic(**dic))
 802
 803            elif returnmode == 'single':
 804                for item in cursor:
 805                    if '_metadata' in item : del item['_metadata']
 806                    result.append(item)
 807
 808            else: # returnmode == 'unchanged' or returnmode == 'idfused'
 809                for item in cursor:
 810                    if item:
 811                        result.append(item)
 812
 813        ## ESSearch.execute() 2: Operations for cases 'idfused' and 'single' are performed on output objects.
 814        # (more efficient to do it like this than after a conversion to Observation)
 815
 816        if returnmode == 'single':
 817            arg = {} # argument to be given to Observation.dic() merging all observations together
 818            for i in range(len(result)):
 819                for column_name in arg: # for columns already in the new Observation
 820                    if column_name not in result[i]:
 821                        arg[column_name].append(fillvalue)
 822                for column_name in result[i]: # for columns missing in the new Observation
 823                    if column_name not in arg:
 824                        arg[column_name] = [fillvalue] * i + [result[i][column_name]] # an empty column filled with fillvalue is added if a new column name is encountered
 825                    else:
 826                        arg[column_name].append(result[i][column_name])
 827            if self._filtered: result = [self._filtered_observation(Observation.dic(arg))]
 828            else: result = [Observation.dic(arg)]
 829
 830        elif returnmode == 'idfused':
 831            hashs_dic = {}
 832            for item in result:
 833                id = str(item['_metadata']['id']) # will throw an error if item has no id. Should items with no id be let as is or merged together?
 834                if id in hashs_dic: # one line is added to hashs_dic[id] for each element of result having this id
 835                    del item['_metadata'] # Two items with the same id should have the same metadata.
 836                    for column_name in hashs_dic[id]['idxdic']:
 837                        if column_name not in item:
 838                            hashs_dic[id]['idxdic'][column_name].append(fillvalue)
 839                    for column_name in item:
 840                        if column_name not in hashs_dic[id]['idxdic']:
 841                            hashs_dic[id]['idxdic'][column_name] = [fillvalue] * i + [item[column_name]] # a filled column is added if a new column name is encountered
 842                        else:
 843                            hashs_dic[id]['idxdic'][column_name].append(item[column_name])
 844                else:
 845                    dic = {}
 846                    if 'name'  in item['_metadata']: dic['name']  = item['_metadata']['name']
 847                    if 'param' in item['_metadata']: dic['param'] = item['_metadata']['param']
 848                    del item['_metadata']
 849                    dic['idxdic'] = {key: [item[key]] for key in item}
 850                    hashs_dic[id] = dic
 851            result = []
 852            for id in hashs_dic: # an Observation is added to result for each id
 853                obs_out = Observation.dic(**hashs_dic[id])
 854                if obs_out:
 855                    if self._filtered: result.append(self._filtered_observation(obs_out)) # finalement, semble plus pertinent de faire ce filtrage directemt sur la sortie Mongo, car même si un à un les tests de condition sont faits à de nombreuses reprises, au global ce ne sont jamais les mêmes combinaisons de test
 856                    else: result.append(obs_out)
 857        # At this point, result is a list of observations.
 858
 859        ## ESSearch.execute() 3: Other inputs (pure observations) are treated purely in python with the self._filtered_observation() method
 860
 861        for data in self.input[1]: # data which are not taken from a Mongo database and already are observations are treated here.
 862            result.append(self._filtered_observation(data, False))
 863
 864        ## ESSearch.execute() 4: Operations for cases 'idfused' and 'single' are performed again with the added observations
 865
 866        if len(result) >= 1:
 867            if returnmode == 'single':
 868                result = self._fusion(result, fillvalue)
 869                if name is None: name = "ESSearch query result on " + str(datetime.datetime.now()) # default value for name
 870                if param is None: # default value for param
 871                    if self.sources is not None:
 872                        sources = self.sources
 873                    else:
 874                        sources = []
 875                        for item in self.input: # informations about the inputs are added to param
 876                            if isinstance(item, Observation):
 877                                if item.name is not None:
 878                                    sources.append('Observation: ' + item.name)
 879                                else:
 880                                    sources.append('data')
 881                            elif isinstance(item, Collection):
 882                                sources.append('MongoDB collection: ' + item.name + ' from database: ' + item.database.name)
 883                            elif isinstance(item, Cursor):
 884                                sources.append('Pymongo cursor: ' + item.cursor_id + ' from collection ' + item.collection.name + 
 885                                                ' from database: ' + item.collection.database.name)
 886                            elif isinstance(item, CommandCursor):
 887                                sources.append('Pymongo commandcursor: ' + item.cursor_id + ' from collection ' + item.collection.name + 
 888                                                ' from database: ' + item.collection.database.name)
 889                            else: # should not happen
 890                                sources.append('data')
 891                    param = {'date': str(datetime.datetime.now()), 'project': 'essearch', 'type': 'dim3', 
 892                            'context': {'origin': 'ESSearch query', 'sources ': sources, 
 893                            'ESSearch_parameters': str(self.parameters)}}
 894                result.param = param
 895
 896            elif returnmode == 'idfused' and len(result) != 1:
 897                hashs_dic = {} # This dictionnary is in the format {id: [observations]}
 898                for item in result:
 899                    if item.id in hashs_dic:
 900                        hashs_dic[item.id].append(item)
 901                    else:
 902                        hashs_dic[item.id] = [item]
 903                result = []
 904                for id in hashs_dic:
 905                    obs_out = self._fusion(hashs_dic[id], fillvalue)
 906                    if hashs_dic[id][0].name : obs_out.name = hashs_dic[id][0].name # name and param associated to the id are put back
 907                    if hashs_dic[id][0].param: obs_out.name = hashs_dic[id][0].param
 908                    if obs_out: result.append(obs_out)
 909
 910        ## ESSearch.execute() 5: Return result
 911
 912        return result
 913
 914    def _filtered_observation(self, obs, is_from_mongo = True): # Vérifier que cette fonction n'est pas moins efficace que de tout déplier / filtrer / tout replier
 915        # Regarder si on ne peut pas faire la même chose en mieux avec des fonctions de numpy ou panda.
 916        '''
 917        Takes an Observation and returns a filtered Observation with self.parameters as a filter.
 918        '''
 919        # self.parameters = [[cond1 AND cond 2 AND cond 3] OR [cond4 AND cond5 AND cond6]]
 920        # dico = {"data": [["datation", [date1, date2, date3], [0,1,0,2,2,1]], ["location", [loc1, loc2, loc3], [0,1,2,0,1,1]]]} # dico n'est plus utilisé
 921        if len(obs) == 0 or self.parameters == [[]]: return obs
 922
 923        ## ESSearch._filtered_observation() 0: This function is done with an iteration over self.parameters 
 924
 925        final_filter = [False] * obs.lencomplete
 926        for i in range(len(self.parameters)): # for each group of conditions separated by a or
 927            if self.parameters[i] != []:
 928
 929        ## ESSearch._filtered_observation() 1: Checking if no column is missing and if conditions on metadata are verified
 930
 931                conds = {} # conds is a dict which associates columns to the condition which concern them (inside of [cond1 AND cond 2 AND cond 3] : only AND)
 932                relevant = True # no column given by cond["path"] is missing from obs
 933                for cond in self.parameters[i]:
 934                    next_relevant = False
 935                    for j in range(len(obs.lindex)):
 936                        if cond["path"] == obs.lindex[j].name:
 937                            if j in conds: conds[j].append(cond)
 938                            else: conds[j] = [cond]
 939                            next_relevant = True
 940                        elif cond["path"][:9] == "_metadata": # metadata are id, name and param
 941                            if not is_from_mongo: # This step is considered to be done already for data taken out of Mongo
 942                                if cond["path"][10:] == 'name' : next_relevant = next_relevant and self._condcheck(obs.name, cond)
 943                                if cond["path"][10:] == 'id'   : next_relevant = next_relevant and self._condcheck(obs.id, cond)
 944                                if cond["path"][10:] == 'param': next_relevant = next_relevant and self._condcheck(obs.param, cond)
 945                            else:
 946                                next_relevant = True
 947                    relevant = relevant and next_relevant
 948                if not relevant: continue # if a column on which a condition is applied is missing, the set of conditions given by the element of self.parameters is considered not verified.
 949
 950        ## ESSearch._filtered_observation() 2: Condition is applied on all elements of the Observation to create a boolean filter
 951
 952                full_filter = [True] * obs.lencomplete
 953                for j in range(obs.lenidx): # iteration over the columns
 954                    filter = []
 955                    for item in obs.lindex[j].cod:
 956                        boolean = True
 957                        for cond in conds[j]:
 958                            try: boolean = boolean and self._condcheck(item, cond)
 959                            except: pass #boolean = False # pose problème pour les regex et autres opérations non implémentées...
 960                        filter.append(boolean) # condition is rewritten as a boolean filter for the incoming data
 961                    next_full_filter = util.tovalues(obs.lindex[j].keys, filter) # filter changed from filter on optimize codec to filter on full codec. (filter on optimize codec was the just how we calculated it, this isn't an actual method of the module)
 962                    full_filter = [full_filter[k] and next_full_filter[k] for k in range(len(full_filter))] # full filter is updated each time
 963
 964        ## ESSearch._filtered_observation() 3: or_position is taken into account
 965
 966                final_filter = [final_filter[j] or full_filter[j] for j in range(len(full_filter))]
 967
 968        ## ESSearch._filtered_observation() 4: Application of the final filter on the incoming Observation
 969
 970        obs.setfilter(final_filter)
 971        obs.applyfilter()
 972        return obs
 973
 974    def _condcheck(self, item, cond = None): # ajouter la gestion des regex
 975        '''
 976        Takes an item and returns a Boolean if it verifies criteria given by parameter.
 977        '''
 978        #cond = {"comparator" : comparator, "operand" : operand, "path" : path} and sometimes can contain other things
 979
 980        # 0. Basic cases
 981
 982        if cond is None: return True
 983        if 'inverted' in cond and cond['inverted']: return not self._condcheck(item, cond | {'inverted' : False})
 984        if cond["comparator"] is None and cond["operand"] is None: return True
 985
 986        # 1. formatstring applied if formatstring there is
 987
 988        if "formatstring" in cond:
 989            if not isinstance(item, (datetime.datetime, TimeSlot)):
 990                item = datetime.datetime.strptime(item, cond["formatstring"])
 991            if not isinstance(cond["operand"], (datetime.datetime, TimeSlot)):
 992                cond["operand"] = datetime.datetime.strptime(cond["operand"], cond["formatstring"])
 993
 994        # 2. Cases TimeSlot, geometry and nested property need specific treatment
 995
 996        if isinstance(item, TimeSlot):
 997            if isinstance(cond["operand"], TimeSlot): # if comparator is one of the specific operators for TimeSlot
 998                cond["comparator"] = dico_alias_python[TimeSlot][TimeSlot][cond["comparator"]]
 999                return item.link(cond["operand"])[0] == cond["comparator"]
1000            else: # if operand is a datetime
1001                try: return dico_alias_python[TimeSlot][datetime.datetime][cond["comparator"]](item, cond["operand"])
1002                except: raise ValueError("Comparator not supported for TimeSlot.")
1003
1004        elif isinstance(item, list) or isinstance(item, shapely.geometry.base.BaseGeometry):
1005            if isinstance(item, list): # lists are interpreted as geometries
1006                if len(item) == 1: item = shapely.geometry.Point(item[0])
1007                elif (len(item) > 1 and not isinstance(item[0], list)): item = shapely.geometry.Point(item)
1008                elif len(item) == 2: item = shapely.geometry.LineString(item)
1009                elif len(item) > 2:
1010                    if not item[-1] == item[0]:
1011                        item.append(item[0])
1012                    item = shapely.geometry.Polygon([item])
1013            if isinstance(cond["operand"], list): # lists are interpreted as geometries
1014                if len(cond["operand"]) == 1: cond["operand"] = shapely.geometry.Point(cond["operand"][0])
1015                elif (len(cond["operand"]) > 1 and not isinstance(cond["operand"][0], list)):
1016                    cond["operand"] = shapely.geometry.Point(cond["operand"])
1017                elif len(cond["operand"]) == 2: cond["operand"] = shapely.geometry.LineString(cond["operand"])
1018                elif len(cond["operand"]) > 2:
1019                    if not item[-1] == item[0]:
1020                        item.append(item[0])
1021                    item = shapely.geometry.Polygon([item])
1022            return dico_alias_mongo['geometry'][cond["comparator"]](item, cond["operand"])
1023
1024        elif cond["path"] == "property" and isinstance(item, dict): # assuming that property contains dicts and that the query targets one of its values
1025            for val in item.values():
1026                if self._condcheck(val, cond | {"path" : None}):
1027                    return True
1028            return False
1029
1030        # 3. General comparison for remaining cases
1031
1032        try: return dico_alias_mongo['default'][cond["comparator"]](item, cond["operand"])
1033        except:
1034            #raise ValueError("Comparator not supported.")
1035            return True
1036
1037    def _fusion(self, obsList, fillvalue = None): # Idéalement, utiliser une méthode fusion de Observation.
1038        '''
1039        Takes a list of observations and returns one observation merging them together in one single observation.
1040        '''
1041        ## ESSearch._fusion() 0: Basic cases
1042
1043        if   len(obsList) == 0: return Observation()
1044        elif len(obsList) == 1: return Observation(obsList[0])
1045        else: # Fusion of a list with more than one element
1046            lidx = []
1047
1048        ## ESSearch._fusion() 1: Determination of the names of the columns
1049
1050            new_lname = set()
1051            for obs in obsList:
1052                new_lname |= set(obs.lname)
1053            new_lname = list(new_lname)
1054            
1055        ## ESSearch._fusion() 2: Fill the columns with the values of the observations to merge
1056
1057            for i in range(len(new_lname)): # for each column of the new Observation
1058                values = []
1059                for obs in obsList: # for each Observation in the list
1060                    if new_lname[i] in obs.lname: values += obs.lindex[obs.lname.index(new_lname[i])].values # values of the column are added to the new column
1061                    else: values += [fillvalue] * len(obs) # when there is no value, filled with fillvalue
1062                codec = util.tocodec(values)
1063                lidx.append(Field(codec, new_lname[i], util.tokeys(values, codec)))
1064
1065        ## ESSearch._fusion() 3: Build the actual Observation
1066
1067            return Observation(lidx)
1068            # Il y aurait peut-être moyen d'optimiser un peu en remplaçant Field, util.tokeys et l'appel final d'Observation par des manipulations directes sur les objets.
1069            # faire la fusion directement sur les keys au lieu de la faire sur les codec ?
1070            # Avec la méthode actuelle, certaines opérations sont faites plusieurs fois.

dico_alias_mongo = {<class 'str'>: {None: '$eq', 'eq': '$eq', '=': '$eq', '==': '$eq', '$eq': '$eq', 'in': '$in', '$in': '$in', 'regex': '$regex', '$regex': '$regex', 'oid': '$oid', '$oid': '$oid'}, <class 'int'>: {None: '$eq', 'eq': '$eq', '=': '$eq', '==': '$eq', '$eq': '$eq', 'gte': '$gte', '>=': '$gte', '=>': '$gte', '$gte': '$gte', 'gt': '$gt', '>': '$gt', '$gt': '$gt', 'lte': '$lte', '<=': '$lte', '=<': '$lte', '$lte': '$lte', 'lt': '$lt', '<': '$lt', '$lt': '$lt', 'in': '$in', '$in': '$in'}, <class 'datetime.datetime'>: {None: '$eq', 'eq': '$eq', '=': '$eq', '==': '$eq', '$eq': '$eq', 'gte': '$gte', '>=': '$gte', '=>': '$gte', '$gte': '$gte', 'gt': '$gt', '>': '$gt', '$gt': '$gt', 'lte': '$lte', '<=': '$lte', '=<': '$lte', '$lte': '$lte', 'lt': '$lt', '<': '$lt', '$lt': '$lt', 'in': '$in', '$in': '$in'}, <class 'observation.timeslot.TimeSlot'>: {None: 'within', 'eq': 'within', '=': 'within', '==': 'within', '$eq': 'within', 'within': 'within', 'contains': 'intersects', '$contains': 'intersects', 'in': 'within', '$in': 'within', '$within': 'within', 'disjoint': 'disjoint', '$disjoint': 'disjoint', 'intersects': 'intersects', '$intersects': 'intersects'}, <class 'list'>: {None: '$geoIntersects', 'eq': 'equals', '=': 'equals', '==': 'equals', '$eq': 'equals', 'equals': 'equals', '$equals': 'equals', '$geowithin': '$geoWithin', 'geowithin': '$geoWithin', '$geoWithin': '$geoWithin', 'geoWithin': '$geoWithin', 'within': '$geoWithin', '$within': '$geoWithin', 'disjoint': 'disjoint', '$disjoint': 'disjoint', 'intersects': '$geoIntersects', '$intersects': '$geoIntersects', 'geoIntersects': '$geoIntersects', '$geointersects': '$geoIntersects', '$geoIntersects': '$geoIntersects', 'touches': 'touches', '$touches': 'touches', 'overlaps': 'overlaps', '$overlaps': 'overlaps', 'contains': 'contains', '$contains': 'contains', '$geoNear': '$geoNear', '$geonear': '$geoNear', 'geonear': '$geoNear', 'geoNear': '$geoNear', 'in': '$in', '$in': '$in'}, <class 'bson.objectid.ObjectId'>: {None: '$eq', 'eq': '$eq', '=': '$eq', '==': '$eq', '$eq': '$eq', 'in': '$in', '$in': '$in'}, <class 'float'>: {None: '$eq', 'eq': '$eq', '=': '$eq', '==': '$eq', '$eq': '$eq', 'gte': '$gte', '>=': '$gte', '=>': '$gte', '$gte': '$gte', 'gt': '$gt', '>': '$gt', '$gt': '$gt', 'lte': '$lte', '<=': '$lte', '=<': '$lte', '$lte': '$lte', 'lt': '$lt', '<': '$lt', '$lt': '$lt', 'in': '$in', '$in': '$in'}}

dico_alias_python = {<class 'observation.timeslot.TimeSlot'>: {<class 'observation.timeslot.TimeSlot'>: {None: 'equals', 'eq': 'equals', '=': 'equals', '==': 'equals', '$eq': 'equals', 'equals': 'equals', '$equals': 'equals', 'contains': 'contains', '$contains': 'contains', 'in': 'within', '$in': 'within', 'within': 'within', '$within': 'within', 'disjoint': 'disjoint', '$disjoint': 'disjoint', 'intersects': 'intersects', '$intersects': 'intersects'}, <class 'datetime.datetime'>: {None: <function <lambda>>, 'eq': <function <lambda>>, '=': <function <lambda>>, '==': <function <lambda>>, '$eq': <function <lambda>>, 'equals': <function <lambda>>, '$equals': <function <lambda>>, '$gte': <function <lambda>>, 'gte': <function <lambda>>, '>=': <function <lambda>>, '=>': <function <lambda>>, '$gt': <function <lambda>>, 'gt': <function <lambda>>, '>': <function <lambda>>, '$lte': <function <lambda>>, 'lte': <function <lambda>>, '<=': <function <lambda>>, '=<': <function <lambda>>, '$lt': <function <lambda>>, 'lt': <function <lambda>>, '<': <function <lambda>>}}, 'geometry': {None: <function <lambda>>, 'eq': <function <lambda>>, '=': <function <lambda>>, '==': <function <lambda>>, '$eq': <function <lambda>>, 'equals': <function <lambda>>, '$equals': <function <lambda>>, '$geowithin': <function <lambda>>, 'geowithin': <function <lambda>>, '$geoWithin': <function <lambda>>, 'geoWithin': <function <lambda>>, 'within': <function <lambda>>, '$within': <function <lambda>>, 'disjoint': <function <lambda>>, '$disjoint': <function <lambda>>, 'intersects': <function <lambda>>, '$intersects': <function <lambda>>, 'geoIntersects': <function <lambda>>, '$geointersects': <function <lambda>>, '$geoIntersects': <function <lambda>>, 'touches': <function <lambda>>, '$touches': <function <lambda>>, 'overlaps': <function <lambda>>, '$overlaps': <function <lambda>>, 'contains': <function <lambda>>, '$contains': <function <lambda>>, '$geoNear': <function <lambda>>, '$geonear': <function <lambda>>, 'geonear': <function <lambda>>, 'geoNear': <function <lambda>>}, 'default': {None: <function <lambda>>, 'eq': <function <lambda>>, '=': <function <lambda>>, '==': <function <lambda>>, '$eq': <function <lambda>>, 'gte': <function <lambda>>, '>=': <function <lambda>>, '=>': <function <lambda>>, '$gte': <function <lambda>>, 'gt': <function <lambda>>, '>': <function <lambda>>, '$gt': <function <lambda>>, 'lte': <function <lambda>>, '<=': '$lte', '=<': <function <lambda>>, 'lt': <function <lambda>>, '<': <function <lambda>>, '$lt': <function <lambda>>, 'in': <function <lambda>>, '$in': <function <lambda>>}}

def insert_from_doc(collection, document, info=True): View Source

192def insert_from_doc(collection, document , info=True):
193    '''Inserts all observations from a document into a collection, where each line of the document corresponds to an observation.'''
194    with open(document, 'r') as doc:
195        for line in doc:
196            try: insert_to_mongo(collection, line, info)
197            except: pass

Inserts all observations from a document into a collection, where each line of the document corresponds to an observation.

def insert_to_mongo(collection, obj, info=False): View Source

199def insert_to_mongo(collection, obj, info=False): # Mieux avec panda ?
200    '''Takes an observation or a list of observations and inserts them into a MongoDB collection, with info by default.'''
201    # Faire une fonction pour permettre l'ajout direct de fichiers csv.
202    inserted_list = []
203    if isinstance(obj, list):
204        pile = obj
205    elif isinstance(obj, Observation):
206        pile = [obj]
207    else:
208        pile = [Observation.from_obj(obj)]
209    for obs in pile:
210        if obs.id: obs_hash = obs.id
211        else: obs_hash = hash(obs)
212        metadata = {'id': obs_hash}
213        if obs.name : metadata['name']  = obs.name
214        if obs.param: metadata['param'] = obs.param
215        if info: metadata['information'] = Observation._info(True, True)
216        if len(obs.lname) == 1: # a special case is needed because lists with one element are replaced by the element itself so iteration doesn't work
217            for line in obs:
218                inserted_list.append({obs.lname[0]: util.json(line, encoded=False, typevalue=None, simpleval=False, geojson=True),
219                                        '_metadata': metadata})
220        elif len(obs.lname) > 1:
221            for line in obs:
222                inserted_list.append({obs.lname[i]: util.json(line[i], encoded=False, typevalue=None, simpleval=False, geojson=True) 
223                                                                for i in range(len(line))} | {'_metadata': metadata})
224    if inserted_list != []: collection.insert_many(inserted_list)

Takes an observation or a list of observations and inserts them into a MongoDB collection, with info by default.

def empty_request(collection): View Source

226def empty_request(collection):
227    """
228    Empty request to get an idea of what the database contains.
229    Currently returns the count of elements in the collection and the name of each column.
230    """
231    count = 0
232    column_names = []
233    cursor = collection.find()
234    for doc in cursor:
235        count += 1
236        for column_name in doc:
237            if column_name not in column_names:
238                column_names.append(column_name)
239    return {'count': count, 'column_names': column_names}

Empty request to get an idea of what the database contains. Currently returns the count of elements in the collection and the name of each column.