فهرست منبع

Merge branch 'master' of ssh://github.com/richardg867/WaybackProxy

RichardG867 2 سال پیش
والد
کامیت
0769bec43b
6فایلهای تغییر یافته به همراه121 افزوده شده و 60 حذف شده
  1. 15 17
      Dockerfile
  2. 34 18
      README.md
  3. 11 0
      config.json
  4. 25 9
      config_handler.py
  5. 34 15
      startup.sh
  6. 2 1
      waybackproxy.py

+ 15 - 17
Dockerfile

@@ -8,23 +8,21 @@ FROM python:3
 
 MAINTAINER richardg867
 LABEL description = "HTTP Proxy for tunneling requests through the Internet Archive Wayback Machine"
+WORKDIR /app
+COPY . /app
 
-# Setup config.py
-ENV LISTEN_PORT=8888
-ENV DATE='20011025'
-ENV DATE_TOLERANCE=365
-ENV GEOCITIES_FIX=True
-ENV QUICK_IMAGES=True
-ENV WAYBACK_API=True
-ENV CONTENT_TYPE_ENCODING=True
-ENV SILENT=False
-ENV SETTINGS_PAGE=True
+# Setup config.json
+ARG LISTEN_PORT=8888
+ARG DATE=20011025
+ARG DATE_TOLERANCE=365
+ARG GEOCITIES_FIX=true
+ARG QUICK_IMAGES=true
+ARG WAYBACK_API=true
+ARG CONTENT_TYPE_ENCODING=true
+ARG SILENT=false
+ARG SETTINGS_PAGE=true
 
-ADD startup.sh /
-ADD error.html /
-ADD lrudict.py /
-ADD waybackproxy.py /
+EXPOSE ${LISTEN_PORT}
 
-EXPOSE 8080
-
-CMD [ "sh" , "/startup.sh" ]
+CMD [ "sh" , "/app/startup.sh" ]
+#CMD [ "python" , "/app/waybackproxy.py" ]

+ 34 - 18
README.md

@@ -6,35 +6,23 @@ WaybackProxy is a retro-friendly HTTP proxy which retrieves pages from the [Inte
 
 ## Setup
 
-1. Edit `config.py` to your liking
+1. Edit `config.json` to your liking
 2. Start `waybackproxy.py` (Python 3 is required)
 3. Set up your retro browser:
 	* If your browser supports proxy auto-configuration, set the auto-configuration URL to `http://ip:port/proxy.pac` where `ip` is the IP of the system running WaybackProxy and `port` is the proxy's port (8888 by default).
 	* If proxy auto-configuration is not supported or fails to work, set the browser to use an HTTP proxy at that IP and port instead.
 	* Transparent proxying is also supported for advanced users, with no configuration to WaybackProxy itself required.
 		* The easiest way to set up a transparent WaybackProxy is to run it on port 80 ([this cannot be done on Linux without security implications](https://unix.stackexchange.com/questions/87348/capabilities-for-a-script-on-linux)\), set up a fake DNS server - such as `dnsmasq -A "/#/ip"` where `ip` is the IP of the system running WaybackProxy - to redirect all requests to the proxy, and point client machines at that DNS server.
-4. Try it out! You can edit most settings that are in `config.py` by browsing to http://web.archive.org while on the proxy, although you must edit `config.py` to make them permanent.
+4. Try it out! You can edit most settings that are in `config.json` by browsing to http://web.archive.org while on the proxy, although you must edit `config.json` to make them permanent.
 5. Press Ctrl+C to stop the proxy
 
-## Known issues and limitations
-
-* The Wayback Machine itself is not 100% reliable. Known issues include:
-  * Pages newer than the specified date (setting a specific YYYYMMDD date instead of a wider YYYYMM or YYYY helps with that);
-  * Random broken images;
-  * Strange 404 errors caused by bad server responses or incorrect URL capitalization at archival time;
-  * Infinite redirect loops;
-  * Server errors when it's having a bad day.
-* WaybackProxy will work around some redirection scripts (example: `http://example.com/redirect?to=http://...`) which are not archived by the Wayback Machine, but the destination URLs are sometimes not archived either.
-* WaybackProxy is not a generic proxy. The POST and CONNECT methods are not implemented.
-* Transparent proxying mode requires HTTP/1.1 and therefore cannot be used with some really old (pre-1996) browsers. Use standard mode with such browsers.
-
 ## Docker Container
 
 A Dockerfile is included that allows you to run WaybackProxy from a docker container. 
 
 ### Environment Variables
 
-When deploying via Docker, the config.py script can be customized by specifying environment variables when creating the docker container. The environment variables match the example config.py script in this repository. Below is a complete list:
+When deploying via Docker, the config.json can be customized by specifying environment variables when creating the docker container. The environment variables match the example config.json in this repository. Below is a complete list:
 
 | Parameter        | Default | Description                            |
 |------------------|---------|----------------------------------------|
@@ -48,20 +36,48 @@ When deploying via Docker, the config.py script can be customized by specifying
 | `SILENT` | True | Disables logging to STDOUT if set to True |
 | `SETTINGS_PAGE` | True | Enables the settings page on http://web.archive.org if set to True |
 
-### Example docker commands
+### How to run in Docker
+
+#### Using Docker Registry
+
+To pull:
+
+```bash
+docker pull cttynul/waybackproxy:latest
+```
+To run:
+
+```bash
+docker run -d -e DATE=20011025 -p 8888:8888 cttynul/waybackproxy
+```
+
+#### Build locally
 
 To build:
 
 ```bash
-docker build --no-cache -t waybackproxy .
+docker build --no-cache -f Dockerfile -t waybackproxy .
 ```
 To run:
 
 ```bash
-docker run --rm -it -e DATE=20011225 -p 8888:8888 waybackproxy
+docker run -d -e DATE=20011025 -p 8888:8888 waybackproxy
 ```
 
+## Known issues and limitations
+
+* The Wayback Machine itself is not 100% reliable. Known issues include:
+  * Pages newer than the specified date (setting a specific YYYYMMDD date instead of a wider YYYYMM or YYYY helps with that);
+  * Random broken images;
+  * Strange 404 errors caused by bad server responses or incorrect URL capitalization at archival time;
+  * Infinite redirect loops;
+  * Server errors when it's having a bad day.
+* WaybackProxy will work around some redirection scripts (example: `http://example.com/redirect?to=http://...`) which are not archived by the Wayback Machine, but the destination URLs are sometimes not archived either.
+* WaybackProxy is not a generic proxy. The POST and CONNECT methods are not implemented.
+* Transparent proxying mode requires HTTP/1.1 and therefore cannot be used with some really old (pre-1996) browsers. Use standard mode with such browsers.
+
 ## Other links
 
 * [Donate to the Internet Archive](https://archive.org/donate/), they need your help to keep the Wayback Machine and its petabytes upon petabytes of data available to everyone for free with no ads.
 * [Check out 86Box](https://86box.github.io/), the emulator I use for testing WaybackProxy on older browsers.
+* [WaybackProxy](https://hub.docker.com/r/cttynul/waybackproxy) on Docker Hub

+ 11 - 0
config.json

@@ -0,0 +1,11 @@
+{
+    "LISTEN_PORT": 8888,
+    "DATE": "20011025",
+    "DATE_TOLERANCE": 365,
+    "GEOCITIES_FIX": true,
+    "QUICK_IMAGES": true,
+    "WAYBACK_API": true,
+    "CONTENT_TYPE_ENCODING": true,
+    "SILENT": false,
+    "SETTINGS_PAGE": true
+}

+ 25 - 9
config.py → config_handler.py

@@ -1,23 +1,24 @@
+import json
 # Listen port for the HTTP proxy.
-LISTEN_PORT = 8888
+global LISTEN_PORT
 
 # Date to get pages from Wayback. YYYYMMDD, YYYYMM and YYYY formats are
 # accepted, the more specific the better.
-DATE = '20011025' # <- Windows XP release date in case you're wondering
+global DATE 
 
 # Allow the client to load pages and assets up to X days after DATE.
 # Set to None to disable this restriction.
-DATE_TOLERANCE = 365
+global DATE_TOLERANCE
 
 # Send Geocities requests to oocities.org if set to True.
-GEOCITIES_FIX = True
+global GEOCITIES_FIX
 
 # Use the original Wayback Machine URL as a shortcut when loading images.
 # May result in faster page loads, but all images will point to
 # http://web.archive.org/... as a side effect. Set this value to 2 to enable an
 # experimental mode using authentication on top of the original URLs instead
 # (which is not supported by Internet Explorer and some other browsers).
-QUICK_IMAGES = True
+global QUICK_IMAGES
 
 # Use the Wayback Machine Availability API to find the closest available
 # snapshot to the desired date, instead of directly requesting that date. Helps
@@ -25,15 +26,30 @@ QUICK_IMAGES = True
 # is available at an earlier date. As a side effect, pages will take longer to
 # load due to the added API call. If enabled, this option will disable the
 # QUICK_IMAGES bypass mechanism built into the PAC file.
-WAYBACK_API = True
+global WAYBACK_API
 
 # Allow the Content-Type header to contain an encoding. Some old browsers
 # (Mosaic?) don't understand that and fail to load anything - set this to
 # False if you're using one of them.
-CONTENT_TYPE_ENCODING = True
+global CONTENT_TYPE_ENCODING
 
 # Disables logging if set to True.
-SILENT = False
+global SILENT
 
 # Enables the settings page on http://web.archive.org if set to True.
-SETTINGS_PAGE = True
+global SETTINGS_PAGE
+
+try:
+	with open("config.json") as f:
+		data = json.loads(f.read())
+		LISTEN_PORT = data["LISTEN_PORT"]
+		DATE = data["DATE"]
+		DATE_TOLERANCE = data["DATE_TOLERANCE"]
+		GEOCITIES_FIX = data["GEOCITIES_FIX"]
+		QUICK_IMAGES = data["QUICK_IMAGES"]
+		WAYBACK_API = data["WAYBACK_API"]
+		CONTENT_TYPE_ENCODING = data["CONTENT_TYPE_ENCODING"]
+		SILENT = data["SILENT"]
+		SETTINGS_PAGE = data["SETTINGS_PAGE"]
+except EnvironmentError as e:
+	print("Wops! Error opening config.json")

+ 34 - 15
startup.sh

@@ -1,16 +1,35 @@
 #!/bin/sh
-
-echo LISTEN_PORT=$LISTEN_PORT > /config.py
-echo DATE=$DATE >> /config.py
-echo DATE_TOLERANCE=$DATE_TOLERANCE >> /config.py
-echo GEOCITIES_FIX=$GEOCITIES_FIX  >> /config.py
-echo QUICK_IMAGES=$QUICK_IMAGES  >> /config.py
-echo WAYBACK_API=$WAYBACK_API  >> /config.py
-echo CONTENT_TYPE_ENCODING=$CONTENT_TYPE_ENCODING  >> /config.py
-echo SILENT=$SILENT  >> /config.py
-echo SETTINGS_PAGE=$SETTINGS_PAGE  >> /config.py
-
-echo config.py:
-cat /config.py
-
-python /waybackproxy.py
+if [ "${LISTEN_PORT}" ]; then
+    sed -i -e "s/\"LISTEN_PORT\":[^,]*/\"LISTEN_PORT\": ${LISTEN_PORT}/g" /app/config.json
+fi
+if [ "${DATE}" ]; then
+    sed -i -e "s/\"DATE\":[^,]*/\"DATE\": \"${DATE}\"/g" /app/config.json
+fi
+if [ "${DATE_TOLERANCE}" ]; then
+    sed -i -e "s/\"DATE_TOLERANCE\":[^,]*/\"DATE_TOLERANCE\": ${DATE_TOLERANCE}/g" /app/config.json
+fi
+if [ "${GEOCITIES_FIX}" ]; then
+    sed -i -e "s/\"GEOCITIES_FIX\":[^,]*/\"GEOCITIES_FIX\": $GEOCITIES_FIX/g" /app/config.json
+fi
+if [ "${QUICK_IMAGES}" ]; then
+    sed -i -e "s/\"QUICK_IMAGES\":[^,]*/\"QUICK_IMAGES\": $QUICK_IMAGES/g" /app/config.json
+fi
+if [ "${WAYBACK_API}" ]; then
+    sed -i -e "s/\"WAYBACK_API\":[^,]*/\"WAYBACK_API\": $WAYBACK_API/g" /app/config.json
+fi
+if [ "${QUICK_IMAGES}" ]; then
+    sed -i -e "s/\"QUICK_IMAGES\":[^,]*/\"QUICK_IMAGES\": $QUICK_IMAGES/g" /app/config.json
+fi
+if [ "${CONTENT_TYPE_ENCODING}" ]; then
+    sed -i -e "s/\"CONTENT_TYPE_ENCODING\":[^,]*/\"CONTENT_TYPE_ENCODING\": $CONTENT_TYPE_ENCODING/g" /app/config.json
+fi
+if [ "${SILENT}" ]; then
+    sed -i -e "s/\"SILENT\":[^,]*/\"SILENT\": $SILENT/g" /app/config.json
+fi
+if [ "${SETTINGS_PAGE}" ]; then
+    sed -i -e "s/\"SETTINGS_PAGE\":[^,]*/\"SETTINGS_PAGE\": $SETTINGS_PAGE/g" /app/config.json
+fi
+echo "[-] Using this config.json file:"
+cat /app/config.json
+echo "\n[-] Starting proxy server"
+python /app/waybackproxy.py

+ 2 - 1
waybackproxy.py

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 import base64, datetime, json, lrudict, re, socket, socketserver, string, sys, threading, traceback, urllib.request, urllib.error, urllib.parse
-from config import *
+from config_handler import *
 
 class ThreadingTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
 	"""TCPServer with ThreadingMixIn added."""
@@ -583,6 +583,7 @@ def main():
 	"""Starts the server."""
 	server = ThreadingTCPServer(('', LISTEN_PORT), Handler)
 	_print('[-] Now listening on port', LISTEN_PORT)
+	_print('[-] Date set to', DATE)
 	try:
 		server.serve_forever()
 	except KeyboardInterrupt: # Ctrl+C to stop