[Architecture] Outcome for Event Publisher API (BAM/Siddhi) Review

Buddhika Chamith buddhikac at wso2.com
Mon Mar 26 01:52:13 PDT 2012


On Mon, Mar 26, 2012 at 2:45 PM, Suhothayan Sriskandarajah <suho at wso2.com>wrote:

>
>
> On Mon, Mar 26, 2012 at 2:10 PM, Srinath Perera <srinath at wso2.com> wrote:
>
>> How about following?
>>
>> 1. Stream definitions are stored in AgentServer. Registered through a
>> service call to Agent Server
>> 2. Users can define stream definitions either in AgentServer via a
>> config file OR the Agent can define them before send the events
>>
>
> I think in this case providing a UI to the server will be much user
> friendly.
> because this will help the users to add, delete and edit steam definitions.
> But as the first step we can start with config file.
>

+1.


>
> 3. If an agent sends the definition for type "Foo" and the server
>> already has the exact definition (need to do structure comparison)
>> AgentServer will accept the definition
>> 4. If the structure of "Foo" the AgentServer sends and "Foo"
>> AgentServer has are different, Agent server send an error when agent
>> try to register. (We can do the versioning in the future)
>>
>> --Srinath
>>
>> Regards
> Suho
>
>>
>>
>> On Mon, Mar 26, 2012 at 1:38 PM, Buddhika Chamith <buddhikac at wso2.com>
>> wrote:
>> > Hi,
>> >
>> > As I understand there are two separate concerns here.
>> >
>> > 1. Where to store the stream definitions?
>> > 2. Where to define the stream definitions?
>> >
>> > For the first one we are pretty much agreed that it should on the agent
>> > server as it is the solution that makes sense. The second one translates
>> > some thing like as follows from the point of view of BAM as I
>> understand.
>> >
>> > For case of client defined stream definitions :
>> >      1. The publisher should be configured to publish to BAM with stream
>> > definitions.
>> >      2. Publisher registers the stream definition and publish some
>> events.
>> >      3. Now that the definitions are present at the BAM side analytics
>> > depending on this stream can be defined and deployed.
>> >
>> > For case of server side defined stream definitions:
>> >      1. Configure a stream definition at BAM side.
>> >      2. Now it's possible to define some analytics on it even no data
>> for
>> > this stream is not present at the moment.
>> >      3. Later some publisher sends some events conforming to the stream
>> > definition which then would be processed.
>> >
>> > So from the user point of view the second case would be more intuitive
>> since
>> > the user now doesn't need to setup publishers and publish some events
>> before
>> > defining his analytics. Now comes the problem what happens if a client
>> sends
>> > a different format for a stream defined at the server. I think these
>> should
>> > be considered as errors and proper error should be returned to client.
>> > However if there may be some cases where the client defined stream
>> > definitions would be useful we could provide for both scenarios. Same
>> logic
>> > can be used to register and persist stream definitions whether it's
>> being
>> > registered at server or from a client. Now from a BAM user perspective
>> he
>> > may choose to define his analytics from a stream already created by an
>> > incoming data set or a stream defined by him.
>> >
>> >
>> > Regards
>> > Buddhika
>> >
>> >
>> > On Tue, Mar 20, 2012 at 1:44 PM, Suhothayan Sriskandarajah <
>> suho at wso2.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Mar 20, 2012 at 12:59 PM, Amila Suriarachchi <amila at wso2.com>
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Mar 20, 2012 at 12:15 PM, Suhothayan Sriskandarajah
>> >>> <suho at wso2.com> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 20, 2012 at 11:52 AM, Amila Suriarachchi <amila at wso2.com
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> After the last conversation, I believe we need to define the stream
>> at
>> >>>>> the server side. i.e Some one should have defined the stream at the
>> server
>> >>>>> side and client should know the stream definitions and send the
>> message
>> >>>>> accordingly.
>> >>>>>
>> >>>>> Current design look like this.
>> >>>>>
>> >>>>> BAM data publisher ---> Agent --------------- send message using
>> thrift
>> >>>>> -----------> Agent Server -----------> CEP/BAM server.
>> >>>>>
>> >>>>> Client can define a stream and send data (events). But who is going
>> to
>> >>>>> process them? The only thing it can do is store both stream
>> definition and
>> >>>>> events.
>> >>>>>
>> >>>>> For CEP case client defining stream is not useful since some one
>> should
>> >>>>> have already define the stream and should have written the CEP
>> queries based
>> >>>>> on that to process the events. Even on the BAM side BAM analysers
>> should be
>> >>>>> stream aware. In a production scenario someone should have written
>> the
>> >>>>> Analysers assuming the stream definitions.
>> >>>>>
>> >>>>> Therefore I think defining and storing stream definitions should
>> happen
>> >>>>> at the Agent Server component. Then both BAM analysers and CEP
>> buckets if
>> >>>>> required retrieved it from there.  Client should know the stream
>> ids and
>> >>>>> definitions and simple send the data accordingly.
>> >>>>>
>> >>>>> This is similar to sending message to a topic. Publisher should know
>> >>>>> the message format and publish the message. Subscribes expect the
>> message in
>> >>>>> some format and have to logic to process it.
>> >>>>>
>> >>>>> WDYT?
>> >>>>
>> >>>>
>> >>>> I believe its better to give the flexibility to the client to also
>> set
>> >>>> the stream definitions.
>> >>>> Now the agent server cannot persist the stream definition and hence
>> >>>> client has to defined the streams every time.
>> >>>> But if we can make the Actual backend server(CEP/BAM) to persist the
>> >>>> stream definitions
>> >>>> then only for the first time client need to send the event stream
>> >>>> definition, and on the 2nd day he doesn't need to send the
>> definitions as
>> >>>> the Agent Server can request its Backend to give the stream
>> definition for
>> >>>> the streams it dont know about.
>> >>>
>> >>>
>> >>> I think Agent Server Component has to persists the stream
>> definitions. If
>> >>> BAM or CEP components required it can get the definitions from there.
>> >>>
>> >>> What I am thinking is instead of client defining event streams at
>> >>> runtime, some one has to define the stream ids and stream definitions
>> at the
>> >>> Agent Server component. There can be a GUI for Agent component to do
>> that.
>> >>>
>> >>> Now at the consumer side both CEP and BAM components can subscribes to
>> >>> event streams and get the events as well as the stream definitions.
>> >>> At the publisher side (i.e client ) they need to publish the events
>> to a
>> >>> stream through the Agent (Agent send the message to Agent server). If
>> the
>> >>> publisher needs stream definition again it can be retrieved through
>> the
>> >>> Agent as well.
>> >>>
>> >>> This solves the problems like same publisher send events to same
>> stream
>> >>> in different times, and different publishers send events to same
>> stream id.
>> >>>
>> >>>>
>> >>>>
>> >>>> This also handles the case where user sending events which are not
>> >>>> defined or redefining the same stream in a different way, here the
>> Agent
>> >>>> server will get to know the erroneous situation when it inquires
>> about the
>> >>>> stream definition from the Backend server and it can send an
>> appropriate
>> >>>> error to the client.
>> >>>
>> >>>
>> >>> There are two ways to handle the errors. If we assume client define
>> the
>> >>> stream then it has to be in the way you mentioned. In here if two
>> clients
>> >>> send two different stream definitions for same stream id, which
>> client we
>> >>> going to assume wrong?
>> >>>
>> >>> But if you think other way around i.e server define the stream, client
>> >>> can retrieve the stream definition from the server and validate at the
>> >>> client side.
>> >>>
>> >>
>> >> Here we are assuming event streams will come to both BAM and CEP only
>> via
>> >> the Agent server. But there are use cases, like CEP defining an output
>> >> stream and sending events via JMS, in this case is if the agent server
>> has a
>> >> different stream definition for the same stream id there might be
>> issues.
>> >>
>> >> Thanks
>> >> Suho
>> >>
>> >>
>> >>> thanks,
>> >>> Amila.
>> >>>
>> >>>>
>> >>>>
>> >>>> Thanks
>> >>>> Suho
>> >>>>>
>> >>>>>
>> >>>>> thanks,
>> >>>>> Amila.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Mar 20, 2012 at 10:54 AM, Suhothayan Sriskandarajah
>> >>>>> <suho at wso2.com> wrote:
>> >>>>>>
>> >>>>>> Review notes and decisions made at the meeting
>> >>>>>>
>> >>>>>> 1. Rename TypeDef to EventStreamDefinition
>> >>>>>> 2. For the simplest case, user need not to be bothered about Agent
>> but
>> >>>>>> rather should only work with DataPublisher
>> >>>>>> 3. When publishing events its streamId should be mentioned
>> explicitly.
>> >>>>>> 4. DataPublisher should not have connect()
>> >>>>>> 5. Use a url to specify the endpoint
>> >>>>>> 6. As suggested at the meeting Agent Client is not using the Event
>> >>>>>> Stream Definition for event conversion but rather its inferring the
>> >>>>>> definition from the Data Object Arrays
>> >>>>>>     (E.g metaData,   CorrelationData and   PayloadDate Object
>> Arrays)
>> >>>>>> hence no changes needed there.
>> >>>>>> 7. StreamId need to be unique per tenant
>> >>>>>> 8. There should be a mechanism to persist Event Stream Definitions
>> at
>> >>>>>> the server side (need more discussion)
>> >>>>>> 9. Event Stream Definition could be a json like in Apache Avro
>> >>>>>> 10. This should be designed to support multiple backends E.g
>> Thrift,
>> >>>>>> Avro
>> >>>>>>
>> >>>>>>
>> >>>>>> From the discussions we had, I have written a sample using the
>> >>>>>> modified API
>> >>>>>> Please comment and improve this, if I have missed any.
>> >>>>>>
>> >>>>>> For the simple case where we have only one DataPublisher per JVM
>> >>>>>>
>> >>>>>>      //according to the convention the authentication port will be
>> >>>>>> 7611+100= 7711 and its host will be the same as of its endpoint
>> >>>>>>      DataPublisher dataPublisher = new
>> >>>>>> DataPublisher("tcp://localhost:7611", "admin at wso2", "admin");
>> >>>>>>      dataPublisher.eventStreamDefinition("{'" +
>> >>>>>>                                          "
>> streamId':'StockQuart'," +
>> >>>>>>                                          "  'metaData':[" +
>> >>>>>>                                          "
>> >>>>>> {'name':'ipAdd','type':'STRING'}" +
>> >>>>>>                                          "  ]," +
>> >>>>>>                                          "  'payloadData':[" +
>> >>>>>>                                          "
>> >>>>>> {'name':'symbol','type':'string'}," +
>> >>>>>>                                          "
>> >>>>>> {'name':'price','type':'double'}," +
>> >>>>>>                                          "
>> >>>>>> {'name':'volume','type':'int'}," +
>> >>>>>>                                          "
>> >>>>>> {'name':'max','type':'double'}," +
>> >>>>>>                                          "
>> >>>>>> {'name':'min','type':'double'}" +
>> >>>>>>                                          "  ]" +
>> >>>>>>                                          "}");
>> >>>>>>
>> >>>>>>      //for sending event data to a stream, and in this case its
>> >>>>>> correlation data is null
>> >>>>>>      dataPublisher.publish("StockQuart",new EventData(new
>> >>>>>> Object[]{"127.0.0.1"},null,new
>> Object[]{"IBM",96.8,300,120.6,70.4});
>> >>>>>>      dataPublisher.publish("StockQuart",new
>> >>>>>> EventData(System.currentTimeMillis(),new
>> Object[]{"127.0.0.1"},null, new
>> >>>>>> Object[] {"WSO2",100.8,200,110.4,74.7});
>> >>>>>>
>> >>>>>>      //for publishing the full event, this contains the StreamId
>> >>>>>> within itself
>> >>>>>>      dataPublisher.publish(new
>> >>>>>> Event("StockQuart",System.currentTimeMillis(),new
>> >>>>>> Object[]{"127.0.0.1"},null, new
>> Object[]{"WSO2",100.8,200,110.4,74.7});
>> >>>>>>      dataPublisher.stop();
>> >>>>>>
>> >>>>>> Note, Since Event has a field StreamId and since we need this for
>> >>>>>> backend processing, I'm letting the user to define EventData and
>> allowing
>> >>>>>> them to separately defining
>> >>>>>> the event stream for unfriendliness as in,
>> >>>>>>         dataPublisher.publish("StockQuart",new EventData());
>> >>>>>> and within the publish method I'm building the actual Event object.
>> >>>>>>
>> >>>>>> When we need to use many data publishers in a single JVM,
>> >>>>>>
>> >>>>>>         Agent agent= new Agent()
>> >>>>>>         DataPublisher dataPublisher1 = new
>> >>>>>> DataPublisher("tcp://localhost:7611", "admin at wso2", "admin",
>> agent);
>> >>>>>>
>> >>>>>>         //or for assigning Agent separately and using different
>> >>>>>> authenticator url from the convention
>> >>>>>>         DataPublisher dataPublisher2 = new
>> >>>>>> DataPublisher("tcp://localhost:7611","tcp://localhost:7111",
>> "admin at wso2",
>> >>>>>> "admin");
>> >>>>>>         dataPublisher2.setAgent(agent));
>> >>>>>>
>> >>>>>> Regards
>> >>>>>> Suho
>> >>>>>>
>> >>>>>> --
>> >>>>>> S. Suhothayan
>> >>>>>> Software Engineer,
>> >>>>>> Data Technologies Team,
>> >>>>>> WSO2, Inc. http://wso2.com
>> >>>>>> lean.enterprise.middleware.
>> >>>>>>
>> >>>>>> email: suho at wso2.com cell: (+94) 779 756 757
>> >>>>>> blog: http://suhothayan.blogspot.com/
>> >>>>>> twitter: http://twitter.com/suhothayan
>> >>>>>> linked-in: http://lk.linkedin.com/in/suhothayan
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Amila Suriarachchi
>> >>>>>
>> >>>>> Software Architect
>> >>>>> WSO2 Inc. ; http://wso2.com
>> >>>>> lean . enterprise . middleware
>> >>>>>
>> >>>>> phone : +94 71 3082805
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> S. Suhothayan
>> >>>> Software Engineer,
>> >>>> Data Technologies Team,
>> >>>> WSO2, Inc. http://wso2.com
>> >>>> lean.enterprise.middleware.
>> >>>>
>> >>>> email: suho at wso2.com cell: (+94) 779 756 757
>> >>>> blog: http://suhothayan.blogspot.com/
>> >>>> twitter: http://twitter.com/suhothayan
>> >>>> linked-in: http://lk.linkedin.com/in/suhothayan
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Amila Suriarachchi
>> >>>
>> >>> Software Architect
>> >>> WSO2 Inc. ; http://wso2.com
>> >>> lean . enterprise . middleware
>> >>>
>> >>> phone : +94 71 3082805
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> S. Suhothayan
>> >> Software Engineer,
>> >> Data Technologies Team,
>> >> WSO2, Inc. http://wso2.com
>> >> lean.enterprise.middleware.
>> >>
>> >> email: suho at wso2.com cell: (+94) 779 756 757
>> >> blog: http://suhothayan.blogspot.com/
>> >> twitter: http://twitter.com/suhothayan
>> >> linked-in: http://lk.linkedin.com/in/suhothayan
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> ============================
>> Srinath Perera, Ph.D.
>>    http://www.cs.indiana.edu/~hperera/
>>    http://srinathsview.blogspot.com/
>>
>
>
>
> --
> *S. Suhothayan
> *
> Software Engineer,
> Data Technologies Team,
>  *WSO2, Inc. **http://wso2.com
>  <http://wso2.com/>*
> *lean.enterprise.middleware.*
>
> *email: **suho at wso2.com* <suho at wso2.com>* cell: (+94) 779 756 757
> blog: **http://suhothayan.blogspot.com/* <http://suhothayan.blogspot.com/>
> *
> twitter: **http://twitter.com/suhothayan* <http://twitter.com/suhothayan>*
> linked-in: **http://lk.linkedin.com/in/suhothayan*
> *
> *
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.wso2.org/pipermail/architecture/attachments/20120326/215181fb/attachment-0001.html>


More information about the Architecture mailing list