#archiveteam-bs 2017-04-12,Wed

↑back Search

Time Nickname Message
00:01 🔗 BlueMaxim has joined #archiveteam-bs
00:23 🔗 pizzaiolo has joined #archiveteam-bs
00:25 🔗 pizzaiolo has quit IRC (Client Quit)
00:42 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
00:43 🔗 BlueMaxim has joined #archiveteam-bs
02:31 🔗 pnJay has quit IRC (Leaving)
02:32 🔗 sep332_ has quit IRC (Read error: Operation timed out)
02:51 🔗 pizzaiol1 has left
02:59 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
03:01 🔗 dashcloud has joined #archiveteam-bs
04:01 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
04:02 🔗 BlueMaxim has joined #archiveteam-bs
04:27 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:34 🔗 Sk1d has joined #archiveteam-bs
06:27 🔗 wickedpla is now known as wp494
06:33 🔗 DudesonMc has joined #archiveteam-bs
06:53 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
06:53 🔗 kniffy has quit IRC (Read error: Operation timed out)
06:55 🔗 Stilett0 has joined #archiveteam-bs
07:07 🔗 kniffy has joined #archiveteam-bs
07:07 🔗 Jonison has joined #archiveteam-bs
07:12 🔗 GE has joined #archiveteam-bs
07:23 🔗 CHRONO is now known as notabot
07:23 🔗 notabot is now known as chrono
07:44 🔗 chrono is now known as CHRONO
07:58 🔗 schbirid has joined #archiveteam-bs
08:02 🔗 GE has quit IRC (Remote host closed the connection)
08:13 🔗 GE has joined #archiveteam-bs
08:29 🔗 wp494 has quit IRC (Read error: Connection reset by peer)
08:36 🔗 GE has quit IRC (Remote host closed the connection)
08:39 🔗 CHRONO has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in)
08:39 🔗 chrono- has joined #archiveteam-bs
08:42 🔗 chrono- is now known as chrono
08:42 🔗 chrono is now known as SENDQ
08:46 🔗 SENDQ is now known as CHRONO
08:51 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
08:53 🔗 Stilett0 has joined #archiveteam-bs
09:09 🔗 johtso has joined #archiveteam-bs
09:14 🔗 DudesonMc has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
10:35 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:46 🔗 BartoCH has quit IRC (Remote host closed the connection)
10:50 🔗 bsmith093 has quit IRC (Ping timeout: 260 seconds)
10:52 🔗 godane SketchCow: i'm uploading some old ezboard i grabbed from kbskorea
10:52 🔗 godane https://archive.org/details/kbskorea.net-bbs-ezboard-k_chuncheontv1-20151216
10:54 🔗 godane this a full list of ones i got in the past: https://archive.org/search.php?query=subject%3A%22kbskorea.net%22&sort=-publicdate
11:23 🔗 fie has joined #archiveteam-bs
11:53 🔗 BartoCH has joined #archiveteam-bs
12:28 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
12:58 🔗 RichardG has quit IRC (Read error: Operation timed out)
13:03 🔗 Lord_Nigh has joined #archiveteam-bs
13:04 🔗 Lord_Nigh has quit IRC (Excess Flood)
13:04 🔗 Lord_Nigh has joined #archiveteam-bs
13:12 🔗 midas2 has joined #archiveteam-bs
13:13 🔗 midas has quit IRC (Ping timeout: 244 seconds)
13:15 🔗 Jonison2 has joined #archiveteam-bs
13:18 🔗 Jonison has quit IRC (Ping timeout: 260 seconds)
13:20 🔗 midas2 is now known as midas
13:24 🔗 Jonison has joined #archiveteam-bs
13:24 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
13:27 🔗 Jonison2 has quit IRC (Ping timeout: 260 seconds)
13:32 🔗 pizzaiolo has joined #archiveteam-bs
15:11 🔗 RichardG has joined #archiveteam-bs
15:50 🔗 bsmith093 has joined #archiveteam-bs
16:08 🔗 Petri152 has quit IRC (Read error: Operation timed out)
16:18 🔗 Petri152 has joined #archiveteam-bs
16:39 🔗 zhongfu has quit IRC (Remote host closed the connection)
16:40 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
16:41 🔗 zhongfu has joined #archiveteam-bs
17:04 🔗 Lord_Nigh has joined #archiveteam-bs
17:11 🔗 wp494 has joined #archiveteam-bs
17:15 🔗 JAA has joined #archiveteam-bs
17:22 🔗 Pudsey has joined #archiveteam-bs
17:22 🔗 odemg has joined #archiveteam-bs
17:26 🔗 Pudsey has quit IRC (Remote host closed the connection)
17:27 🔗 cf has quit IRC (Ping timeout: 260 seconds)
17:42 🔗 cf has joined #archiveteam-bs
17:43 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
17:44 🔗 fie has quit IRC (Read error: Operation timed out)
17:57 🔗 GE has joined #archiveteam-bs
17:59 🔗 fie has joined #archiveteam-bs
18:23 🔗 Lord_Nigh has joined #archiveteam-bs
18:48 🔗 mls has quit IRC (Ping timeout: 250 seconds)
18:54 🔗 Kaz anyone got anything on a gbit link with ipv6? france/eu if possible, would like to do a quick iperf test
19:08 🔗 JAA_ has joined #archiveteam-bs
19:11 🔗 JAA has quit IRC (Ping timeout: 268 seconds)
19:11 🔗 bwn has quit IRC (Read error: Connection reset by peer)
19:12 🔗 mls has joined #archiveteam-bs
19:30 🔗 bwn has joined #archiveteam-bs
19:31 🔗 GE has quit IRC (Remote host closed the connection)
19:50 🔗 odemg has quit IRC (Remote host closed the connection)
19:51 🔗 odemg has joined #archiveteam-bs
19:51 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
19:52 🔗 odemg2 has joined #archiveteam-bs
19:52 🔗 odemg2 has quit IRC (Connection closed)
19:53 🔗 odemg2 has joined #archiveteam-bs
19:53 🔗 woktenna has joined #archiveteam-bs
19:56 🔗 odemg has quit IRC (Ping timeout: 245 seconds)
19:56 🔗 woktenna Guys... could something be done about domain parker?
19:56 🔗 woktenna They run nginx on canonical domain... serving robots.txt
19:57 🔗 woktenna And autoredirect to ww1.example.com
19:57 🔗 woktenna Which CNAMEs to various parking teams
19:58 🔗 woktenna Sorry, I know you are archivists, not retrievers...
19:58 🔗 woktenna But just in case you know... please tell
20:01 🔗 JAA_ is now known as JAA
20:02 🔗 odemg2 has quit IRC (Read error: Operation timed out)
20:03 🔗 odemg has joined #archiveteam-bs
20:06 🔗 pizzaiolo has joined #archiveteam-bs
20:12 🔗 woktenna Try yourself: curl http://survey-winner.net/robots.txt
20:13 🔗 woktenna And curl http://survey-winner.net/ 302s to curl http://ww1.survey-winner.net/
20:25 🔗 schbirid well what would you want to do about it?
20:33 🔗 woktenna cammon
20:33 🔗 woktenna I would want to access the Archive!
20:34 🔗 schbirid ?
20:35 🔗 woktenna Look, the guys behind the http://survey-winner.net/ has set up an nginx
20:35 🔗 woktenna on multiple IP addresses
20:35 🔗 woktenna Thousands of domains resolve to those IP addresses
20:36 🔗 woktenna Those are long-ago _expired_ domains, which previously belonged to old websites
20:37 🔗 woktenna But they hold them as hostages
20:37 🔗 woktenna So they could (presumably) make money on domain parking
20:38 🔗 woktenna To clear this a bit:
20:38 🔗 woktenna They do not provide domain parking themselves
20:39 🔗 woktenna They just set up a server to redirect
20:39 🔗 woktenna to ww1.*whatever*, which CNAMEs to actual domain parkers
20:40 🔗 woktenna BUT
20:41 🔗 woktenna www.survey-winner.net or whatever points to a stub webserver
20:41 🔗 woktenna which unfortunately hosts robots.txt
20:41 🔗 woktenna Any suggestions?
20:46 🔗 woktenna @schbirid, I wonder if you addressed me in particular
20:46 🔗 woktenna Sorry if I jumped the conversation
20:46 🔗 schbirid you are describing domain squatting
20:46 🔗 icedice has joined #archiveteam-bs
20:46 🔗 woktenna Sort of
20:46 🔗 schbirid but not what your problem is that you want to solve
20:48 🔗 woktenna @schbirid Web Archive _prohibits_ browsing of websites with robots.txt
20:48 🔗 schbirid ah, we are not the Internet Archive
20:48 🔗 schbirid and yes that is a known and well disliked feature
20:48 🔗 woktenna Look, I already said
20:48 🔗 woktenna > Sorry, I know you are archivists, not retrievers...
20:49 🔗 woktenna But just in case you know... please tell
20:49 🔗 schbirid no way around it
20:49 🔗 schbirid :}
20:49 🔗 schbirid err -> :\
20:55 🔗 xmc hm, my corpweb proxy is smart enough to block http://web.archive.org/web/*/http://survey-winner.net/ under the survey-winner.net block
20:55 🔗 xmc in other news thanks for making me trip my corporate web proxy, woktenna
20:55 🔗 woktenna Try changing to https:*
20:55 🔗 xmc -_-
20:56 🔗 xmc yes that's fine
20:56 🔗 GE has joined #archiveteam-bs
20:56 🔗 xmc still yet another log of me doing something that's not work
21:01 🔗 HCross2 xmc: HTTPS blocked?
21:02 🔗 xmc no https works fine but the "view this site's robots.txt" link goes to a plaintext link
21:02 🔗 HCross2 Nvm. Can't read
21:02 🔗 xmc on the target domain
21:03 🔗 woktenna The http://survey-winner.net/ is nothing but another example of this practice
21:04 🔗 woktenna It's peculiar though
21:04 🔗 xmc there's no history of it in the archive prior to its domain squatting
21:04 🔗 woktenna Because if you try to access it with changed 'Host:' header
21:05 🔗 woktenna The webserver will still point to http://ww1.survey-winner.net/
21:05 🔗 woktenna In other words
21:05 🔗 woktenna It is a stub in case no such domain is in their database
21:06 🔗 woktenna I will come up with another domain, wait a mo
21:07 🔗 woktenna Try curl http://1papercraft.com/robots.txt
21:07 🔗 woktenna It's the same people
21:07 🔗 xmc ah
21:07 🔗 woktenna IPs are different, though
21:08 🔗 woktenna But their webserver config is the same
21:11 🔗 woktenna No point in enumerating all the domains
21:13 🔗 woktenna Many are just spam, some are priceless (belonged to websites in past)
21:22 🔗 woktenna If you want to look further, I used www.robtex.com to reverse IPs to domains
21:22 🔗 woktenna Try this: 51.254.28.162
21:29 🔗 schbirid has quit IRC (Quit: Leaving)
21:34 🔗 woktenna To be precise: how can we convince them to add the exception to robots.txt?
21:35 🔗 woktenna I'm fine with their profits on expired domains
21:35 🔗 woktenna Not all evil could be rooted today
21:37 🔗 xmc squatters will not listen to you. ia is doing something about it, slowly.
21:38 🔗 woktenna @xmc Are you with IA?
21:38 🔗 xmc no
21:38 🔗 woktenna How could you know then?
21:38 🔗 xmc because i talk to people who are
21:51 🔗 woktenna Is it possible to find admin of those webservers?
21:52 🔗 woktenna Chances are the squatters outsource their operation
21:52 🔗 woktenna And only point their A records
22:26 🔗 kristian_ has joined #archiveteam-bs
22:40 🔗 GE has quit IRC (Remote host closed the connection)
22:50 🔗 JAA Update on InterfaceLIFT: I now have a functional wpull hook script which retrieves all sensible resources not accessible directly (images in all resolutions and the portfolio/submission browsers).
22:50 🔗 JAA Note that ArchiveBot did pick up images in some resolutions, but I'm almost certain it'll only be able to find about half of them; it'll also miss the portfolio and submission browsers (which are actually pretty redundant but still nice to have for a fully functional archive; they won't work in the Wayback Machine though).
22:55 🔗 JAA Unfortunately, based on a very rough estimate, the full archive will be several hundred GB, which is more space than I have available currently. If anyone of you wants to run it or has other suggestions, let me know.
22:55 🔗 JAA tammy_: ^
22:58 🔗 tammy_ JAA: If your willing to help me, I'll run it. I storage for days.
22:58 🔗 tammy_ *I got
23:01 🔗 JAA tammy_: Sure. Do you have a functioning wpull?
23:02 🔗 JAA Version 1.2.3, that is.
23:03 🔗 tammy_ nope, never even heard of it
23:04 🔗 tammy_ I stick to wget
23:05 🔗 JAA I see. Do you have Python and pip?
23:06 🔗 tammy_ I can aquire anything. infact I'm standing up a new VM for this
23:08 🔗 tammy_ single core good enough?
23:11 🔗 JAA I guess so, yeah. The limiting factor is time (not overloading the server) and network anyway.
23:12 🔗 JAA You'll need Python 3.2+ (including the dev headers) and pip. Which OS are you using?
23:14 🔗 tammy_ I'm gigabit, if need be, but I'd rather work through my vpn server. I don't mind cutting over to my personal network if time requires it.
23:14 🔗 tammy_ Debian 8.7
23:15 🔗 tammy_ I can stand up something different if that's an issue
23:16 🔗 JAA It's the server's network which is slow. Gbit or 10 Mbit probably doesn't make any difference.
23:16 🔗 JAA Debian's perfect. :-)
23:20 🔗 JAA So the required Python packages are python3 and python3-dev. If you want to install pip system-wide, python3-pip; I normally install it per-user on my machines using https://bootstrap.pypa.io/get-pip.py (wget, then python3 get-pip.py --user).
23:21 🔗 tammy_ I am root, I'll just apt it :)
23:21 🔗 JAA Ok
23:22 🔗 JAA Then: pip install html5lib==0.9999999 (wpull hasn't been updated to deal with the newest version, and the dependencies haven't been fixed either...)
23:22 🔗 JAA Followed by: pip install wpull==1.2.3 psutil (I think everything else gets pulled automatically)
23:23 🔗 JAA Add a --user flag if you want to do that in the user's directory instead.
23:29 🔗 JAA Here's the hook script and the wpull command I used for testing: https://gist.github.com/anonymous/c752b52901d6688d8b677e759c694896
23:53 🔗 BlueMaxim has joined #archiveteam-bs
23:57 🔗 WIDOW has joined #archiveteam-bs

irclogger-viewer