CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

Similar documents
Module 3 Greedy Strategy

Module 3 Greedy Strategy

Coding for Efficiency

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Lecture5: Lossless Compression Techniques

Introduction to Source Coding

A Brief Introduction to Information Theory and Lossless Coding

CSE 100: RED-BLACK TREES

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

Communication Theory II

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

Entropy, Coding and Data Compression

Information Theory and Communication Optimal Codes

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Topic 23 Red Black Trees

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Chapter 7: Sorting 7.1. Original

Information Theory and Huffman Coding

Solutions to Assignment-2 MOOC-Information Theory

Error Detection and Correction: Parity Check Code; Bounds Based on Hamming Distance

6.004 Computation Structures Spring 2009

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

ECE/OPTI533 Digital Image Processing class notes 288 Dr. Robert A. Schowengerdt 2003

MITOCW 6. AVL Trees, AVL Sort

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Computing and Communications 2. Information Theory -Channel Capacity

Chapter 3 Convolutional Codes and Trellis Coded Modulation

CSE 573 Problem Set 1. Answers on 10/17/08

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Binary trees. Application: AVL trees and the golden ratio. The golden ratio. φ 1=1/φ 1. φ =1+

Wednesday, February 1, 2017

CS : Data Structures

Sirindhorn International Institute of Technology Thammasat University

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science

Basics of Error Correcting Codes

Problem Sheet 1 Probability, random processes, and noise

Punctured vs Rateless Codes for Hybrid ARQ

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Error Protection: Detection and Correction

Information flow over wireless networks: a deterministic approach

Binary Search Tree (Part 2 The AVL-tree)

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Hybrid Coding (JPEG) Image Color Transform Preparation

Exercises to Chapter 2 solutions

5.4 Imperfect, Real-Time Decisions

CSS 343 Data Structures, Algorithms, and Discrete Math II. Balanced Search Trees. Yusuf Pisan

Computer Science 1001.py. Lecture 25 : Intro to Error Correction and Detection Codes

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

FREDRIK TUFVESSON ELECTRICAL AND INFORMATION TECHNOLOGY

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

PRIORITY QUEUES AND HEAPS. Lecture 19 CS2110 Spring 2014

ENGG2310-B Principles of Communication Systems Last Lecture

COMP Online Algorithms. Paging and k-server Problem. Shahin Kamali. Lecture 11 - Oct. 11, 2018 University of Manitoba

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

Some algorithmic and combinatorial problems on permutation classes

Lecture 13 February 23

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

CS188: Section Handout 1, Uninformed Search SOLUTIONS

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

ARTIFICIAL INTELLIGENCE (CS 370D)

Speech Coding in the Frequency Domain

Theory and Practice of Artificial Intelligence

CSE 373 DECEMBER 4 TH ALGORITHM DESIGN

6. FUNDAMENTALS OF CHANNEL CODER

DCSP-3: Minimal Length Coding. Jianfeng Feng

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1.

Patterns and random permutations II

Game-Playing & Adversarial Search

Block Markov Encoding & Decoding

Logic Design I (17.341) Fall Lecture Outline

Data Storage Using a Non-integer Number of Bits per Cell

PRIORITY QUEUES AND HEAPS

Introduction to Coding Theory

THE use of balanced codes is crucial for some information

Midterm Examination Solutions

CSE 21 Practice Final Exam Winter 2016

depth parallel time width hardware number of gates computational work sequential time Theorem: For all, CRAM AC AC ThC NC L NL sac AC ThC NC sac

Random Binary Search Trees. EECS 214, Fall 2017

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1.

SPIHT Algorithm with Huffman Encoding for Image Compression and Quality Improvement over MIMO OFDM Channel

Ar#ficial)Intelligence!!

Physical-Layer Network Coding Using GF(q) Forward Error Correction Codes

Algebra. Recap: Elements of Set Theory.

5.4 Imperfect, Real-Time Decisions

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

From ProbLog to ProLogic

Digital Communication Systems ECS 452

Lecture 4: Wireless Physical Layer: Channel Coding. Mythili Vutukuru CS 653 Spring 2014 Jan 16, Thursday

The tenure game. The tenure game. Winning strategies for the tenure game. Winning condition for the tenure game

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

Lossless Image Compression Techniques Comparative Study

CSE 312 Midterm Exam May 7, 2014

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

Transcription:

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

Recap: Average Case Analysis of successful find in a BST N nodes Expected total depth of all BSTs with N nodes

Recap: Probability of having i nodes in the left subtree P N (i) : the probability that T L has i nodes D(N i): Expected total depth of BSTs with i nodes in T L T L T R D(N) = N 1 i=0 ( ) P N (i)d N i i nodes N-i-1 nodes What determines the number of nodes in the left subtree (T L )?

Towards a recurrence relation for average BST total depth What is D(N i) in terms of D(i) & D(N-i-1)? Hint: all nodes in each subtree are 1 deeper in tree T A. D(i) + D(N-i-1) B. D(i) + D(N-i-1) + 1 C. D(i) + D(N-i-1) + N T L i nodes T R N-i-1 nodes

Average total depth of a BST with N nodes N 1 D(N) = P N (i)d N i i=0 ( ) True or false: The term in the blue box is equal to the term in the red box A. True B. False

N*D(N) = (N+1) *D(N-1) + 2N 1 How does this help us again? A. We can solve it to yield a formula for D(N) that does not involve N! B. We can use it to compute D(N) directly C. I have no idea, I m totally lost

Through unwinding and some not-so-complicated algebra (which you can find in your reading, a.k.a. Paul s slides) we arrive at: No N! to be seen! Yay! And with a little more algebra, we can even show an approximation: Conclusion: The average time to find an element in a BST with no restrictions on shape is Θ(log N).

The importance of being balanced BSTs average time for find: Θ (log 2 N) What does this tell us? On an average things are not so bad provided assumptions 1 and 2 hold But the probabilistic assumptions we made often don t hold in practice Assumption #1 may not hold: we may search some keys many more times than others Assumption #2 may not hold: approximately sorted input is actually quite likely, leading to unbalanced trees with worstcase cost closer to O(N) when N is large We would like our search trees to be balanced

The importance of being balanced We would like our search trees to be balanced Two kinds of approaches Deterministic methods guarantee balance, but operations are somewhat complicated to implement (AVL trees, red black trees) Randomized methods (treaps, Randomized Search Trees) from our analysis: deliberate randomness in constructing the tree helps!! Operations are simpler to implement Balance not absolutely guaranteed, but achieved with high probability We will return to this topic later in the course

Changing gears: Data Compression Problem What is the encoding scheme that would result in the shortest binary representation? Why is this important? Encoding Scheme 101010010101001010101010 101010101001001010101001 110100101010101001001010 Binary representation

How do we encode data? Step 1: Figure out the alphabets that constitute the data (dictionary) Step 2: Determine the binary code word for each alphabet Step 3: Replace each alphabet by its code word For example if the alphabet was s, p, a, m we might define the following encoding: Alphabet s p a m Code word

Fixed length encoding In fixed length, each alphabet is represented using a fixed number of bits For example if the alphabet was s, p, a, m we might define the following encoding: Alphabet s 00 p 01 a 10 m 01 Code word For a dictionary consisting of M symbols, what is the minimum number of bits needed to encode each symbol (assume fixed length binary codes)? A. 2 M B. M C. M/2 D. ceil(log 2 M) E. None of these

Variable length codes ssssssssssssssss ssssspppamppam Text file Code A Symbol Codeword s 00 p 01 a 10 m 11 Symbol Frequency s 0.6 p 0.2 a 0.1 m 0.1 Code B Symbol Codeword s 0 p 1 a 10 m 11 Is code B better than code A? A. Yes B. No C. Depends

Variable length codes ssssssssssssssss ssssspppamppam Text file Code A Symbol Codeword s 00 p 01 a 10 m 11 Symbol Frequency s 0.6 p 0.2 a 0.1 m 0.1 Code B Symbol Codeword s 0 p 1 a 10 m 11 Average length (code A) = 2 bits/symbol Average length (code B) = 0.6 *1 +0.2 *1 + 0.1* 2+ 0.1*2 = 1.2 bits/symbol

Decoding variable length codes ssssssssssssssss ssssspppamppam Text file Code A Symbol Codeword s 00 p 01 a 10 m 11 Symbol Frequency s 0.6 p 0.2 a 0.1 m 0.1 Code B Symbol Codeword s 0 p 1 a 10 m 11 Decode the binary pattern 0110 using Code B? A. spa B. sms C. Not enough information to decode