Project Example: wissen.de Software Architecture VO/KU (707.023/707.024) Roman Kern KMI, TU Graz January 24, 2014 Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 1 / 59
Outline 1 Example - wissen.de 2 Software Architecture Overview 3 Software Architecture Styles 4 Project Management Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 2 / 59
Example - wissen.de Example - wissen.de Project type: web site Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 3 / 59
Example - wissen.de Wissen.de - Web Site of the Year 2012 Has just been elected Web Site of the Year 2012 Winner in the category: Education Both the best and the most popular web-site http://www.websitedesjahres.de/ Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 4 / 59
Example - wissen.de Wissen.de - The Web Site Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 5 / 59
Example - wissen.de Wissen.de - The Web Site Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 6 / 59
Example - wissen.de Wissen.de - Host Wissen.de is a web-site hosted by wissenmedia Wissenmedia is owned by Bertelsmann SE & Co. KGaA Wissenmedia owns brands: Brockhaus, Bertelsmann, WAHRIG, CHRONIK, JollyBooks The brand Brockhaus is over 200 years old and is known by 93% people (in Germany) Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 7 / 59
Example - wissen.de Wissen.de - Scope Wissen.de is a free service Content is added and curated by editors Does not follow the Wikipedia model Free content is not taken from Brockhaus wissen.de articles differ from printed articles In their style and their life-cycle Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 8 / 59
Example - wissen.de Wissen.de - Motivation The project started out as an innovation platform: Be innovative in terms of business models Wissen.de is just a single portal to a complex system Another example is a cooperation with a set-top box manufacturer Be innovative in terms of technologies Try out new functionality Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 9 / 59
Example - wissen.de Wissen.de - Software Architecture In terms of software architecture this puts an emphasis an specific quality attributes: Flexibility quickly try out new features Evolvability add new features without interfering with existing infrastructure Scalability need to manage millions of articles (more than the German Wikipedia) need to serve many users Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 10 / 59
Example - wissen.de Wissen.de - Software Architecture Focus on specific quality attributes has implications on others: Configurability Need to be high as well Testability Suffers, as the system is changing at a high pace Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 11 / 59
Example - wissen.de Wissen.de - Software Architecture More importantly, the architecture needs to be flexible And foresee possible directions Typically use YAGNI ( You ain t gonna need it ) - as a guideline Complexity The system has a high level of complexity very hard for new developers in the project Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 12 / 59
Example - wissen.de Wissen.de - Software Architecture High flexibility is achieved by Loose coupling Individual components do not depend on other components Generic interfaces and protocols Thus components can be easily swapped out and replaced But this have an impact on: Performance System needs to be as generic as possible no option to fine-tune algorithms Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 13 / 59
Example - wissen.de Wissen.de - Backend Wissen.de is only one of multiple web sites The whole infrastructure contains many sub-systems and components Another part is the interface to the other systems (e.g. editor systems) It is embedded into an existing landscape of tools integrability Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 14 / 59
Example - wissen.de Wissen.de - Team Developed by separate teams Teams are from different companies Know-Center, Key-Tec, EDELWEISS72, wissenmedia, arvato, Nionex... Teams are geographically dispersed Graz, Munich, Gütersloh Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 15 / 59
Wissen.de - Overview Example - wissen.de Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 16 / 59
Example - wissen.de Wissen.de - Detail Will focus on the backend part only It is run on multiple (virtual) machines Used by multiple components Main tasks: Store articles, index articles, present articles Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 17 / 59
Example - wissen.de Articles Articles are stored as XML Combination of data and meta-data Meta-data are title, date, category,... Data is XML, not restricted to a single format Links between articles Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 18 / 59
Software Architecture Overview Software Architecture Overview Main components Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 19 / 59
Software Architecture Overview Main Architecture Web service as main interface Client-server architecture Main architecture: n-tier style Typical example for a heterogeneous architecture style Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 20 / 59
Software Architecture Overview n-tier Architecture Conceptual architecture: 3-tier Database layer Application logic layer Presentation layer Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 21 / 59
Software Architecture Overview Architecture: 3-layer applications Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 22 / 59
Software Architecture Overview n-tier Architecture Implementation architecture: 2-tier Framework library Presentation libraries E.g. web-service library, command line library Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 23 / 59
Software Architecture Overview Database Layer Object-relational mapping (ORM) No direct interaction with the relational database Schema can be derived from the business objects Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 24 / 59
Example of ORM Software Architecture Overview @ E n t i t y @Table ( name = ARTIFACT ) @NamedQueries ( { @NamedQuery ( name = a r t i f a c t B y I d, q u e r y = SELECT x FROM ARTIFACT x WHERE x. ARTIFACT ID = : a r t i f a c t I d P a r a m AND }) @XmlJavaTypeAdapter ( X m l A r t i c l e. Adapter. c l a s s ) p u b l i c c l a s s A r t i f a c t implements S e r i a l i z a b l e { @EmbeddedId p r i v a t e A r t i f a c t P K i d ; @ManyToOne @MapsId @JoinColumn ( name = BOOK ID, referencedcolumnname = BOOK ID ) p r i v a t e Book book ; @ManyToOne( f e t c h = FetchType. LAZY) @JoinColumn ( name = CONTENT ID, referencedcolumnname = CONTENT p r i v a t e Content c o n t e n t ; Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 25 / 59
Software Architecture Overview Presentation Layer XSLT scripts to transform the output into the target media Not only articles are transformed E.g. search results, error messages Different output target media E.g. mobile version, version for set-top boxes, product specific renderings Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 26 / 59
Presentation Layer Software Architecture Overview Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 27 / 59
Presentation Layer Software Architecture Overview Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 28 / 59
Example of XSLT Software Architecture Overview < x s l : s t y l e s h e e t v e r s i o n= 2. 0 xmlns= h t t p : / /www. w3. org /1999/ xhtml xmlns : x s l= h t t p : / /www. w3. org /1999/XSL/ Transform xpath d e f a u l t namespace= h t t p : / /www. w3. org /1999/ xhtml > < x s l : t e m p l a t e match= / > <html xmlns= h t t p : / /www. w3. org /1999/ xhtml > <head> < l i n k type= t e x t / c s s r e l= s t y l e s h e e t h r e f= { $ u r l p r e f i x } </ head> <body> <d i v c l a s s= a r t i k e l i n h a l t > < x s l : copy o f s e l e c t= $ t e x t b a u s t e i n h e a d e r /> <! H i e r w i r d d e r e i g e n t l i c h e A r t i k e l g e r e n d e r t > < x s l : apply t e m p l a t e s /> <! I n h a l t s u e b e r s i c h t f u e r ge chunkte A r t i k e l > < x s l : i f t e s t= not ( / [ 1 ] / ws : k o n t e x t /ws : ws i n t e r n ) and / < x s l : c a l l t e m p l a t e name= chunks t a b l e /> </ x s l : i f> < x s l : copy o f s e l e c t= $ t e x t b a u s t e i n f o o t e r /> </ d i v> </ body> Roman </ Kern html> (KMI, TU Graz) Project Example: wissen.de January 24, 2014 29 / 59
Software Architecture Overview Web Service Interface Multiple interfaces, for different use cases (e.g. read-only access, administrative access,...) Stateless Hybrid of REST and RPC style service Output is either XML or JSON Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 30 / 59
Software Architecture Styles Software Architecture Styles Patterns found in the architecture Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 31 / 59
Software Architecture Styles Information Extraction - Preprocessing Task: Transform an XML into a textual representation Three stages: Input XML transformed into XHTML transformed into plain text Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 32 / 59
Software Architecture Styles Information Extraction - Preprocessing Style: Pipeline Batch-sequential, the next filter starts once the previous has finished The output of the previous filter is the input to the next Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 33 / 59
Pipes and filters Software Architecture Styles Figure: Pipe and filters style Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 34 / 59
Software Architecture Styles Information Extraction - Execute Task: Extract information out of text Multiple sub-tasks: Split the text into sentences Split a sentence into token (words) Mark certain words as stop-words (should be ignored) Assign word groups to individual tokens Detect named entities (E.g. person names) Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 35 / 59
Software Architecture Styles Information Extraction - Execute Realisation: Pipeline with shared repository First the text is filled into a special data-structure Each filter (sentence chunker, stop-word detection,...) modifies the data-structure Using so called annotations Each annotation is a span (start, end) with addition features Caveat: filters depend on the output of preceding filters Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 36 / 59
Software Architecture Styles Example of Information Extraction Pipeline p u b l i c L i s t <E x t r a c t e d I n f o r m a t i o n A n n o t a t i o n > p r o c e s s ( S t r i n g t e x t ) t AnnotatedDocument doc = new DefaultDocument ( ) ; doc. s e t T e x t ( t e x t ) ; f o r ( Annotator a n n o t a t o r : a n n o t a t o r s ) { a n n o t a t o r. a n n o t a t e ( doc ) ; } Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 37 / 59
Software Architecture Styles Event Framework Components can register to listen for events Components can trigger events Typically all events should be handled asynchronously (the sender is not blocked) Architectural style: publish-and-subscribe Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 38 / 59
Software Architecture Styles Notification Architectures Figure: Notification architecture Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 39 / 59
Software Architecture Styles Example of an Event Listener e v e n t L i s t e n e r = new E v e n t L i s t e n e r ( ) { @ O v e r r i d e p u b l i c v o i d onevent ( Event event, Task t a s k ) { i f ( e v e n t i n s t a n c e o f MediaAddedEvent ) { i s D i r t y = t r u e ; } } } ; eventmanager. r e g i s t e r E v e n t L i s t e n e r ( e v e n t L i s t e n e r ) ; // somewhere e l s e eventmanager. f i r e E v e n t A s y n c ( new MediaAddedEvent ( name ) ) ; Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 40 / 59
Software Architecture Styles Cluster Communication Need to scale out (horizontally) to cope with the demand Add redundancy to increase the availability instead of a single machine, have a cluster of machines Works transparently with the event framework Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 41 / 59
Software Architecture Styles Cluster Communication Dynamically detect all cluster members on start-up (discovery) Communication is based on either broadcast/multicasts (UDP) or direct communication (TCP) All cluster nodes need to know each other Architectural style: peer-to-peer Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 42 / 59
Peer to peer Software Architecture Styles Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 43 / 59
Software Architecture Styles Cluster Communication - Synchronous Only asynchronous communication facilities Create synchronous communication via callbacks Each synchronous message contains a unique id and sent asynchronously Once the message has been processed by the remote note, a notification is sent back passing the id Processing then can be continued at the sender side Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 44 / 59
Software Architecture Styles Indexing The search index needs to be updated once articles have been changed The component responsible to update the content of articles fires an event as soon as an article has changed The index components listens for these events Decoupling of components, as one component does not know the other components Disadvantage: no direct control of the process flow, hard to track the progress of operations Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 45 / 59
Software Architecture Styles Request Tracking Track long running operations For example: batch import of articles, which might take hours Idea: collect all information regarding an operation in one place, called task Store this information in the database Notify user once the operation is done Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 46 / 59
Software Architecture Styles Request Tracking - Task Task consists of ID: Unique ID of the task Status: running, finished Result: success, failed, cancelled Messages: List of messages for the user Attributes: Track the progress ( progress bar in the UI) Properties: Store internal state information Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 47 / 59
Software Architecture Styles Request Tracking - Task A single task might spawn multiple machines Synchronisation via the database Administration console list all tasks Helps to detect the root of problems Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 48 / 59
Software Architecture Styles Logging Common logging infrastructure Logging is also collected in the tasks Logging output also contains the task-id Log output is collected in files Log files are rotated Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 49 / 59
Software Architecture Styles Error Handling Each layer produces its own type system of errors The presentation layer is responsible to report the error to the user For each error an unique ID is generated The ID is reported to the user and logged Thus no internal state is reported to the outside Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 50 / 59
Software Architecture Styles Monitoring Monitor the current state of the system Web-based tool to monitor the state Current resource consumption, e.g. memory used List of recent error logs Support of administrative/analysis operations Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 51 / 59
Monitoring Software Architecture Styles Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 52 / 59
Software Architecture Styles Runtime Performance Improve performance by use of caching Caches need to be in-sync across the multiple machines Therefore all changes need to be reported to all machines The event framework propagates these changes to all nodes and components Changes in the file-system need to be detected as well Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 53 / 59
Software Architecture Styles Improve Flexibility Improve flexibility by increase of configurability Level of configurability rises with the power of the configuration language Highest level if the configuration itself is some sort of programming language Interpreter architectural style Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 54 / 59
Project Management Project Management Topics related to the development Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 55 / 59
Project Management Support Infrastructure Version control system Bug/Issue tracking system Continuous integration system Documentation system Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 56 / 59
Project Management Rollout Development system Virtual machine with all tools installed Staging system Replica of the production system Production system Only versions are deployed on the production system, which have been tested on the staging system Only a few people are allowed to deploy on the production system Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 57 / 59
Project Management Project Management Agile project development Short cycles, working software Project communication via periodic conference calls Additionally e-mail and via issue tracker Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 58 / 59
Project Management The End Next: Project example EEXCESS Roman Kern (KMI, TU Graz) Project Example: wissen.de January 24, 2014 59 / 59