This repository provides example projects demonstrating the usage of the termux-web-scraper framework, which enables running Selenium-based web scraping tasks on Android devices using Termux.
Before you can use the Termux Web Scraper, you need to have the following installed on your Android device:
- Termux: You can download Termux from the F-Droid or Google Play store.
- Git: You can install Git in Termux by running
pkg install git.
You also need to:
- Disable Battery Optimization: Disable battery optimization for Termux to prevent it from being killed by the Android system.
- Acquire a Wakelock: Acquire a wakelock in Termux to prevent the device from sleeping while your scraper is running.
- Address Phantom Process Killing (Android 12+): On Android 12 and newer, you may need to disable phantom process killing to prevent Termux from being killed. You can do this by running the following command in an ADB shell:
./adb shell "settings put global settings_enable_monitor_phantom_procs false"
.
├── simple
│ ├── run.sh
│ └── ...
├── loop_error_ignore
│ ├── run.sh
│ └── ...
├── ...
This repository contains next example projects:
simple: A basic example that demonstrates how to perform a search on DuckDuckGo.loop_error_ignore: An example that shows how to run the scraper in a loop and gracefully handle errors.
-
Get started by launching Termux on your Android device and cloning the repository:
git clone https://github.com/kpliuta/termux-web-scraper-example.git
-
Then, navigate to a project like
simpleand run the script:cd termux-web-scraper-example/simple ./run.sh
The simple example demonstrates a basic web scraping scenario. It navigates to DuckDuckGo, searches for "Python Selenium Example", and saves a screenshot of the results.
The loop_error_ignore example showcases how to run the scraper in a continuous loop while ignoring any errors that may occur during the process. This is useful for long-running scraping tasks that need to be resilient to network issues or unexpected page changes. The script attempts to navigate to a non-existent URL, and the --loop-error-ignore flag ensures that the loop continues to run even when the navigation fails.
This project is licensed under the MIT License. See the LICENSE file for details.