High Performance Python, 2nd Edition

Read it now on the O’Reilly learning platform with a 10-day free trial.

O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Book description

Your Python code may run correctly, but you need it to run faster. Updated for Python 3, this expanded edition shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By exploring the fundamental theory behind design choices, High Performance Python helps you gain a deeper understanding of Python’s implementation.

How do you take advantage of multicore architectures or clusters? Or build a system that scales up and down without losing reliability? Experienced Python programmers will learn concrete solutions to many issues, along with war stories from companies that use high-performance Python for social media analytics, productionized machine learning, and more.

Get a better grasp of NumPy, Cython, and profilers
Learn how Python abstracts the underlying computer architecture
Use profiling to find bottlenecks in CPU time and memory usage
Write efficient programs by choosing appropriate data structures
Speed up matrix and vector computations
Use tools to compile Python down to machine code
Manage multiple I/O and computational operations concurrently
Convert multiprocessing code to run on local or remote clusters
Deploy code faster using tools like Docker

Show and hide more

Publisher resources

Table of contents Product information

Foreword
Preface
1. Who This Book Is For
2. Who This Book Is Not For
3. What You’ll Learn
4. Python 3
5. Changes from Python 2.7
6. License
7. How to Make an Attribution
8. Errata and Feedback
9. Conventions Used in This Book
10. Using Code Examples
11. O’Reilly Online Learning
12. How to Contact Us
13. Acknowledgments
1. The Fundamental Computer System
  1. Computing Units
  2. Memory Units
  3. Communications Layers
  1. Idealized Computing Versus the Python Virtual Machine
  1. Good Working Practices
  2. Some Thoughts on Good Notebook Practice
  3. Getting the Joy Back into Your Work
  1. Profiling Efficiently
  2. Introducing the Julia Set
  3. Calculating the Full Julia Set
  4. Simple Approaches to Timing—print and a Decorator
  5. Simple Timing Using the Unix time Command
  6. Using the cProfile Module
  7. Visualizing cProfile Output with SnakeViz
  8. Using line_profiler for Line-by-Line Measurements
  9. Using memory_profiler to Diagnose Memory Usage
  10. Introspecting an Existing Process with PySpy
  11. Bytecode: Under the Hood
    1. Using the dis Module to Examine CPython Bytecode
    2. Different Approaches, Different Complexity
    1. No-op @profile Decorator
    1. A More Efficient Search
    2. Lists Versus Tuples
      1. Lists as Dynamic Arrays
      2. Tuples as Static Arrays
      1. How Do Dictionaries and Sets Work?
        
        Inserting and Retrieving
        
        Deletion
        
        Resizing
        
        Hash Functions and Entropy
        
        Iterators for Infinite Series
        
        Lazy Generator Evaluation
        
        Wrap-Up
        
        Introduction to the Problem
        
        Aren’t Python Lists Good Enough?
        
        Problems with Allocating Too Much
        
        Understanding perf
        
        Making Decisions with perf’s Output
        
        Enter numpy
        
        Memory Allocations and In-Place Operations
        
        Selective Optimizations: Finding What Needs to Be Fixed
        
        Pandas’s Internal Model
        
        Applying a Function to Many Rows of Data
        
        Building DataFrames and Series from Partial Results Rather than Concatenating
        
        There’s More Than One (and Possibly a Faster) Way to Do a Job
        
        Advice for Effective Pandas Development
        
        What Sort of Speed Gains Are Possible?
        
        JIT Versus AOT Compilers
        
        Why Does Type Information Help the Code Run Faster?
        
        Using a C Compiler
        
        Reviewing the Julia Set Example
        
        Cython
        
        Compiling a Pure Python Version Using Cython
        
        Cython Annotations to Analyze a Block of Code
        
        Adding Some Type Annotations
        
        Parallelizing the Solution with OpenMP on One Machine
        
        Numba to Compile NumPy for Pandas
        
        Garbage Collection Differences
        
        Running PyPy and Installing Modules
        
        Other Upcoming Projects
        
        Dynamic Graphs: PyTorch
        
        Basic GPU Profiling
        
        Performance Considerations of GPUs
        
        When to Use GPUs
        
        ctypes
        
        cffi
        
        f2py
        
        CPython Module
        
        Introduction to Asynchronous Programming
        
        How Does async/await Work?
        
        Serial Crawler
        
        Gevent
        
        tornado
        
        aiohttp
        
        Serial
        
        Batched Results
        
        Full Async
        
        An Overview of the multiprocessing Module
        
        Estimating Pi Using the Monte Carlo Method
        
        Estimating Pi Using Processes and Threads
        
        Using Python Objects
        
        Replacing multiprocessing with Joblib
        
        Random Numbers in Parallel Systems
        
        Using numpy
        
        Queues of Work
        
        Serial Solution
        
        Naive Pool Solution
        
        A Less Naive Pool Solution
        
        Using Manager.Value as a Flag
        
        Using Redis as a Flag
        
        Using RawValue as a Flag
        
        Using mmap as a Flag
        
        Using mmap as a Flag Redux
        
        File Locking
        
        Locking a Value
        
        Benefits of Clustering
        
        Drawbacks of Clustering
        
        $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
        
        Skype’s 24-Hour Global Outage
        
        Using IPython Parallel to Support Research
        
        Parallel Pandas with Dask
        
        Queues
        
        Pub/sub
        
        Distributed Prime Calculation
        
        Docker’s Performance
        
        Advantages of Docker
        
        Objects for Primitives Are Expensive
        
        The array Module Stores Many Primitive Objects Cheaply
        
        Using Less RAM in NumPy with NumExpr
        
        Trying These Approaches on 11 Million Tokens
        
        Comparing DictVectorizer and FeatureHasher on a Real Problem
        
        Very Approximate Counting with a 1-Byte Morris Counter
        
        K-Minimum Values
        
        Bloom Filters
        
        LogLog Counter
        
        Real-World Example
        
        Streamlining Feature Engineering Pipelines with Feature-engine
        
        Feature Engineering for Machine Learning
        
        The Hard Task of Deploying Feature Engineering Pipelines
        
        Leveraging the Power of Open Source Python Libraries
        
        Feature-engine Smooths Building and Deployment of Feature Engineering Pipelines
        
        Helping with the Adoption of a New Open Source Package
        
        Developing, Maintaining, and Encouraging Contribution to Open Source Libraries
        
        How Long Will It Take?
        
        Discovery and Planning
        
        Managing Expectations and Delivery
        
        A Simple Example
        
        Best Practices and Recommendations
        
        Getting Help
        
        Python at Adaptive Lab
        
        SoMA’s Design
        
        Our Development Methodology
        
        Maintaining SoMA
        
        Advice for Fellow Engineers
        
        The Sweet Spot
        
        Lessons in Optimizing
        
        Conclusion
        
        Cluster Design
        
        Code Evolution in a Fast-Moving Start-Up
        
        Building the Recommendation Engine
        
        Reporting and Monitoring
        
        Some Advice
        
        Python’s Role at Smesh
        
        The Platform
        
        High Performance Real-Time String Matching
        
        Reporting, Monitoring, Debugging, and Deployment
        
        Prerequisites
        
        The Database
        
        The Web Application
        
        OCR and Translation
        
        Task Distribution and Workers
        
        Conclusion
        
        Python’s Role at Lanyrd
        
        Making the Task Queue Performant
        
        Reporting, Monitoring, Debugging, and Deployment
        
        Advice to a Fellow Developer
        
        Show and hide more
        Product information
        
        Title: High Performance Python, 2nd Edition
        
        Author(s): Micha Gorelick, Ian Ozsvald
        
        Release date: April 2020
        
        Publisher(s): O'Reilly Media, Inc.
        
        ISBN: 9781492055020
        
        You might also like
        
        Check it out now on O’Reilly
        
        Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

High Performance Python, 2nd Edition

Book description

Publisher resources

Table of contents

Product information

You might also like

Check it out now on O’Reilly