High Performance Python, 2nd Edition

High Performance Python, 2nd Edition

Read it now on the O’Reilly learning platform with a 10-day free trial.

O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Book description

Your Python code may run correctly, but you need it to run faster. Updated for Python 3, this expanded edition shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By exploring the fundamental theory behind design choices, High Performance Python helps you gain a deeper understanding of Python’s implementation.

How do you take advantage of multicore architectures or clusters? Or build a system that scales up and down without losing reliability? Experienced Python programmers will learn concrete solutions to many issues, along with war stories from companies that use high-performance Python for social media analytics, productionized machine learning, and more.

Show and hide more

Publisher resources

Table of contents Product information

Table of contents

  1. Foreword
  2. Preface
    1. Who This Book Is For
    2. Who This Book Is Not For
    3. What You’ll Learn
    4. Python 3
    5. Changes from Python 2.7
    6. License
    7. How to Make an Attribution
    8. Errata and Feedback
    9. Conventions Used in This Book
    10. Using Code Examples
    11. O’Reilly Online Learning
    12. How to Contact Us
    13. Acknowledgments
    1. The Fundamental Computer System
      1. Computing Units
      2. Memory Units
      3. Communications Layers
      1. Idealized Computing Versus the Python Virtual Machine
      1. Good Working Practices
      2. Some Thoughts on Good Notebook Practice
      3. Getting the Joy Back into Your Work
      1. Profiling Efficiently
      2. Introducing the Julia Set
      3. Calculating the Full Julia Set
      4. Simple Approaches to Timing—print and a Decorator
      5. Simple Timing Using the Unix time Command
      6. Using the cProfile Module
      7. Visualizing cProfile Output with SnakeViz
      8. Using line_profiler for Line-by-Line Measurements
      9. Using memory_profiler to Diagnose Memory Usage
      10. Introspecting an Existing Process with PySpy
      11. Bytecode: Under the Hood
        1. Using the dis Module to Examine CPython Bytecode
        2. Different Approaches, Different Complexity
        1. No-op @profile Decorator
        1. A More Efficient Search
        2. Lists Versus Tuples
          1. Lists as Dynamic Arrays
          2. Tuples as Static Arrays
          1. How Do Dictionaries and Sets Work?
            1. Inserting and Retrieving
            2. Deletion
            3. Resizing
            4. Hash Functions and Entropy
            1. Iterators for Infinite Series
            2. Lazy Generator Evaluation
            3. Wrap-Up
            1. Introduction to the Problem
            2. Aren’t Python Lists Good Enough?
              1. Problems with Allocating Too Much
              1. Understanding perf
              2. Making Decisions with perf’s Output
              3. Enter numpy
              1. Memory Allocations and In-Place Operations
              2. Selective Optimizations: Finding What Needs to Be Fixed
              1. Pandas’s Internal Model
              2. Applying a Function to Many Rows of Data
              3. Building DataFrames and Series from Partial Results Rather than Concatenating
              4. There’s More Than One (and Possibly a Faster) Way to Do a Job
              5. Advice for Effective Pandas Development
              1. What Sort of Speed Gains Are Possible?
              2. JIT Versus AOT Compilers
              3. Why Does Type Information Help the Code Run Faster?
              4. Using a C Compiler
              5. Reviewing the Julia Set Example
              6. Cython
                1. Compiling a Pure Python Version Using Cython
                1. Cython Annotations to Analyze a Block of Code
                2. Adding Some Type Annotations
                1. Parallelizing the Solution with OpenMP on One Machine
                1. Numba to Compile NumPy for Pandas
                1. Garbage Collection Differences
                2. Running PyPy and Installing Modules
                1. Other Upcoming Projects
                1. Dynamic Graphs: PyTorch
                2. Basic GPU Profiling
                3. Performance Considerations of GPUs
                4. When to Use GPUs
                1. ctypes
                2. cffi
                3. f2py
                4. CPython Module
                1. Introduction to Asynchronous Programming
                2. How Does async/await Work?
                  1. Serial Crawler
                  2. Gevent
                  3. tornado
                  4. aiohttp
                  1. Serial
                  2. Batched Results
                  3. Full Async
                  1. An Overview of the multiprocessing Module
                  2. Estimating Pi Using the Monte Carlo Method
                  3. Estimating Pi Using Processes and Threads
                    1. Using Python Objects
                    2. Replacing multiprocessing with Joblib
                    3. Random Numbers in Parallel Systems
                    4. Using numpy
                    1. Queues of Work
                    1. Serial Solution
                    2. Naive Pool Solution
                    3. A Less Naive Pool Solution
                    4. Using Manager.Value as a Flag
                    5. Using Redis as a Flag
                    6. Using RawValue as a Flag
                    7. Using mmap as a Flag
                    8. Using mmap as a Flag Redux
                    1. File Locking
                    2. Locking a Value
                    1. Benefits of Clustering
                    2. Drawbacks of Clustering
                      1. $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
                      2. Skype’s 24-Hour Global Outage
                      1. Using IPython Parallel to Support Research
                      2. Parallel Pandas with Dask
                      1. Queues
                      2. Pub/sub
                      3. Distributed Prime Calculation
                      1. Docker’s Performance
                      2. Advantages of Docker
                      1. Objects for Primitives Are Expensive
                        1. The array Module Stores Many Primitive Objects Cheaply
                        2. Using Less RAM in NumPy with NumExpr
                        1. Trying These Approaches on 11 Million Tokens
                        1. Comparing DictVectorizer and FeatureHasher on a Real Problem
                        1. Very Approximate Counting with a 1-Byte Morris Counter
                        2. K-Minimum Values
                        3. Bloom Filters
                        4. LogLog Counter
                        5. Real-World Example
                        1. Streamlining Feature Engineering Pipelines with Feature-engine
                          1. Feature Engineering for Machine Learning
                          2. The Hard Task of Deploying Feature Engineering Pipelines
                          3. Leveraging the Power of Open Source Python Libraries
                          4. Feature-engine Smooths Building and Deployment of Feature Engineering Pipelines
                          5. Helping with the Adoption of a New Open Source Package
                          6. Developing, Maintaining, and Encouraging Contribution to Open Source Libraries
                          1. How Long Will It Take?
                          2. Discovery and Planning
                          3. Managing Expectations and Delivery
                          1. A Simple Example
                          2. Best Practices and Recommendations
                          3. Getting Help
                          1. Python at Adaptive Lab
                          2. SoMA’s Design
                          3. Our Development Methodology
                          4. Maintaining SoMA
                          5. Advice for Fellow Engineers
                          1. The Sweet Spot
                          2. Lessons in Optimizing
                          3. Conclusion
                          1. Cluster Design
                          2. Code Evolution in a Fast-Moving Start-Up
                          3. Building the Recommendation Engine
                          4. Reporting and Monitoring
                          5. Some Advice
                          1. Python’s Role at Smesh
                          2. The Platform
                          3. High Performance Real-Time String Matching
                          4. Reporting, Monitoring, Debugging, and Deployment
                          1. Prerequisites
                          2. The Database
                          3. The Web Application
                          4. OCR and Translation
                          5. Task Distribution and Workers
                          6. Conclusion
                          1. Python’s Role at Lanyrd
                          2. Making the Task Queue Performant
                          3. Reporting, Monitoring, Debugging, and Deployment
                          4. Advice to a Fellow Developer
                          Show and hide more

                          Product information

                          • Title: High Performance Python, 2nd Edition
                          • Author(s): Micha Gorelick, Ian Ozsvald
                          • Release date: April 2020
                          • Publisher(s): O'Reilly Media, Inc.
                          • ISBN: 9781492055020

                          You might also like

                          Check it out now on O’Reilly

                          Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.