Time |
Nickname |
Message |
00:01
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
00:23
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
00:25
🔗
|
|
pizzaiolo has quit IRC (Client Quit) |
00:42
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
00:43
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
02:31
🔗
|
|
pnJay has quit IRC (Leaving) |
02:32
🔗
|
|
sep332_ has quit IRC (Read error: Operation timed out) |
02:51
🔗
|
|
pizzaiol1 has left |
02:59
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
03:01
🔗
|
|
dashcloud has joined #archiveteam-bs |
04:01
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
04:02
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
04:27
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:34
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:27
🔗
|
|
wickedpla is now known as wp494 |
06:33
🔗
|
|
DudesonMc has joined #archiveteam-bs |
06:53
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
06:53
🔗
|
|
kniffy has quit IRC (Read error: Operation timed out) |
06:55
🔗
|
|
Stilett0 has joined #archiveteam-bs |
07:07
🔗
|
|
kniffy has joined #archiveteam-bs |
07:07
🔗
|
|
Jonison has joined #archiveteam-bs |
07:12
🔗
|
|
GE has joined #archiveteam-bs |
07:23
🔗
|
|
CHRONO is now known as notabot |
07:23
🔗
|
|
notabot is now known as chrono |
07:44
🔗
|
|
chrono is now known as CHRONO |
07:58
🔗
|
|
schbirid has joined #archiveteam-bs |
08:02
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
08:13
🔗
|
|
GE has joined #archiveteam-bs |
08:29
🔗
|
|
wp494 has quit IRC (Read error: Connection reset by peer) |
08:36
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
08:39
🔗
|
|
CHRONO has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in) |
08:39
🔗
|
|
chrono- has joined #archiveteam-bs |
08:42
🔗
|
|
chrono- is now known as chrono |
08:42
🔗
|
|
chrono is now known as SENDQ |
08:46
🔗
|
|
SENDQ is now known as CHRONO |
08:51
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
08:53
🔗
|
|
Stilett0 has joined #archiveteam-bs |
09:09
🔗
|
|
johtso has joined #archiveteam-bs |
09:14
🔗
|
|
DudesonMc has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) |
10:35
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:46
🔗
|
|
BartoCH has quit IRC (Remote host closed the connection) |
10:50
🔗
|
|
bsmith093 has quit IRC (Ping timeout: 260 seconds) |
10:52
🔗
|
godane |
SketchCow: i'm uploading some old ezboard i grabbed from kbskorea |
10:52
🔗
|
godane |
https://archive.org/details/kbskorea.net-bbs-ezboard-k_chuncheontv1-20151216 |
10:54
🔗
|
godane |
this a full list of ones i got in the past: https://archive.org/search.php?query=subject%3A%22kbskorea.net%22&sort=-publicdate |
11:23
🔗
|
|
fie has joined #archiveteam-bs |
11:53
🔗
|
|
BartoCH has joined #archiveteam-bs |
12:28
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 250 seconds) |
12:58
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
13:03
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
13:04
🔗
|
|
Lord_Nigh has quit IRC (Excess Flood) |
13:04
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
13:12
🔗
|
|
midas2 has joined #archiveteam-bs |
13:13
🔗
|
|
midas has quit IRC (Ping timeout: 244 seconds) |
13:15
🔗
|
|
Jonison2 has joined #archiveteam-bs |
13:18
🔗
|
|
Jonison has quit IRC (Ping timeout: 260 seconds) |
13:20
🔗
|
|
midas2 is now known as midas |
13:24
🔗
|
|
Jonison has joined #archiveteam-bs |
13:24
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
13:27
🔗
|
|
Jonison2 has quit IRC (Ping timeout: 260 seconds) |
13:32
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
15:11
🔗
|
|
RichardG has joined #archiveteam-bs |
15:50
🔗
|
|
bsmith093 has joined #archiveteam-bs |
16:08
🔗
|
|
Petri152 has quit IRC (Read error: Operation timed out) |
16:18
🔗
|
|
Petri152 has joined #archiveteam-bs |
16:39
🔗
|
|
zhongfu has quit IRC (Remote host closed the connection) |
16:40
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
16:41
🔗
|
|
zhongfu has joined #archiveteam-bs |
17:04
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
17:11
🔗
|
|
wp494 has joined #archiveteam-bs |
17:15
🔗
|
|
JAA has joined #archiveteam-bs |
17:22
🔗
|
|
Pudsey has joined #archiveteam-bs |
17:22
🔗
|
|
odemg has joined #archiveteam-bs |
17:26
🔗
|
|
Pudsey has quit IRC (Remote host closed the connection) |
17:27
🔗
|
|
cf has quit IRC (Ping timeout: 260 seconds) |
17:42
🔗
|
|
cf has joined #archiveteam-bs |
17:43
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
17:44
🔗
|
|
fie has quit IRC (Read error: Operation timed out) |
17:57
🔗
|
|
GE has joined #archiveteam-bs |
17:59
🔗
|
|
fie has joined #archiveteam-bs |
18:23
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
18:48
🔗
|
|
mls has quit IRC (Ping timeout: 250 seconds) |
18:54
🔗
|
Kaz |
anyone got anything on a gbit link with ipv6? france/eu if possible, would like to do a quick iperf test |
19:08
🔗
|
|
JAA_ has joined #archiveteam-bs |
19:11
🔗
|
|
JAA has quit IRC (Ping timeout: 268 seconds) |
19:11
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
19:12
🔗
|
|
mls has joined #archiveteam-bs |
19:30
🔗
|
|
bwn has joined #archiveteam-bs |
19:31
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
19:50
🔗
|
|
odemg has quit IRC (Remote host closed the connection) |
19:51
🔗
|
|
odemg has joined #archiveteam-bs |
19:51
🔗
|
|
pizzaiolo has quit IRC (Read error: Operation timed out) |
19:52
🔗
|
|
odemg2 has joined #archiveteam-bs |
19:52
🔗
|
|
odemg2 has quit IRC (Connection closed) |
19:53
🔗
|
|
odemg2 has joined #archiveteam-bs |
19:53
🔗
|
|
woktenna has joined #archiveteam-bs |
19:56
🔗
|
|
odemg has quit IRC (Ping timeout: 245 seconds) |
19:56
🔗
|
woktenna |
Guys... could something be done about domain parker? |
19:56
🔗
|
woktenna |
They run nginx on canonical domain... serving robots.txt |
19:57
🔗
|
woktenna |
And autoredirect to ww1.example.com |
19:57
🔗
|
woktenna |
Which CNAMEs to various parking teams |
19:58
🔗
|
woktenna |
Sorry, I know you are archivists, not retrievers... |
19:58
🔗
|
woktenna |
But just in case you know... please tell |
20:01
🔗
|
|
JAA_ is now known as JAA |
20:02
🔗
|
|
odemg2 has quit IRC (Read error: Operation timed out) |
20:03
🔗
|
|
odemg has joined #archiveteam-bs |
20:06
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
20:12
🔗
|
woktenna |
Try yourself: curl http://survey-winner.net/robots.txt |
20:13
🔗
|
woktenna |
And curl http://survey-winner.net/ 302s to curl http://ww1.survey-winner.net/ |
20:25
🔗
|
schbirid |
well what would you want to do about it? |
20:33
🔗
|
woktenna |
cammon |
20:33
🔗
|
woktenna |
I would want to access the Archive! |
20:34
🔗
|
schbirid |
? |
20:35
🔗
|
woktenna |
Look, the guys behind the http://survey-winner.net/ has set up an nginx |
20:35
🔗
|
woktenna |
on multiple IP addresses |
20:35
🔗
|
woktenna |
Thousands of domains resolve to those IP addresses |
20:36
🔗
|
woktenna |
Those are long-ago _expired_ domains, which previously belonged to old websites |
20:37
🔗
|
woktenna |
But they hold them as hostages |
20:37
🔗
|
woktenna |
So they could (presumably) make money on domain parking |
20:38
🔗
|
woktenna |
To clear this a bit: |
20:38
🔗
|
woktenna |
They do not provide domain parking themselves |
20:39
🔗
|
woktenna |
They just set up a server to redirect |
20:39
🔗
|
woktenna |
to ww1.*whatever*, which CNAMEs to actual domain parkers |
20:40
🔗
|
woktenna |
BUT |
20:41
🔗
|
woktenna |
www.survey-winner.net or whatever points to a stub webserver |
20:41
🔗
|
woktenna |
which unfortunately hosts robots.txt |
20:41
🔗
|
woktenna |
Any suggestions? |
20:46
🔗
|
woktenna |
@schbirid, I wonder if you addressed me in particular |
20:46
🔗
|
woktenna |
Sorry if I jumped the conversation |
20:46
🔗
|
schbirid |
you are describing domain squatting |
20:46
🔗
|
|
icedice has joined #archiveteam-bs |
20:46
🔗
|
woktenna |
Sort of |
20:46
🔗
|
schbirid |
but not what your problem is that you want to solve |
20:48
🔗
|
woktenna |
@schbirid Web Archive _prohibits_ browsing of websites with robots.txt |
20:48
🔗
|
schbirid |
ah, we are not the Internet Archive |
20:48
🔗
|
schbirid |
and yes that is a known and well disliked feature |
20:48
🔗
|
woktenna |
Look, I already said |
20:48
🔗
|
woktenna |
> Sorry, I know you are archivists, not retrievers... |
20:49
🔗
|
woktenna |
But just in case you know... please tell |
20:49
🔗
|
schbirid |
no way around it |
20:49
🔗
|
schbirid |
:} |
20:49
🔗
|
schbirid |
err -> :\ |
20:55
🔗
|
xmc |
hm, my corpweb proxy is smart enough to block http://web.archive.org/web/*/http://survey-winner.net/ under the survey-winner.net block |
20:55
🔗
|
xmc |
in other news thanks for making me trip my corporate web proxy, woktenna |
20:55
🔗
|
woktenna |
Try changing to https:* |
20:55
🔗
|
xmc |
-_- |
20:56
🔗
|
xmc |
yes that's fine |
20:56
🔗
|
|
GE has joined #archiveteam-bs |
20:56
🔗
|
xmc |
still yet another log of me doing something that's not work |
21:01
🔗
|
HCross2 |
xmc: HTTPS blocked? |
21:02
🔗
|
xmc |
no https works fine but the "view this site's robots.txt" link goes to a plaintext link |
21:02
🔗
|
HCross2 |
Nvm. Can't read |
21:02
🔗
|
xmc |
on the target domain |
21:03
🔗
|
woktenna |
The http://survey-winner.net/ is nothing but another example of this practice |
21:04
🔗
|
woktenna |
It's peculiar though |
21:04
🔗
|
xmc |
there's no history of it in the archive prior to its domain squatting |
21:04
🔗
|
woktenna |
Because if you try to access it with changed 'Host:' header |
21:05
🔗
|
woktenna |
The webserver will still point to http://ww1.survey-winner.net/ |
21:05
🔗
|
woktenna |
In other words |
21:05
🔗
|
woktenna |
It is a stub in case no such domain is in their database |
21:06
🔗
|
woktenna |
I will come up with another domain, wait a mo |
21:07
🔗
|
woktenna |
Try curl http://1papercraft.com/robots.txt |
21:07
🔗
|
woktenna |
It's the same people |
21:07
🔗
|
xmc |
ah |
21:07
🔗
|
woktenna |
IPs are different, though |
21:08
🔗
|
woktenna |
But their webserver config is the same |
21:11
🔗
|
woktenna |
No point in enumerating all the domains |
21:13
🔗
|
woktenna |
Many are just spam, some are priceless (belonged to websites in past) |
21:22
🔗
|
woktenna |
If you want to look further, I used www.robtex.com to reverse IPs to domains |
21:22
🔗
|
woktenna |
Try this: 51.254.28.162 |
21:29
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:34
🔗
|
woktenna |
To be precise: how can we convince them to add the exception to robots.txt? |
21:35
🔗
|
woktenna |
I'm fine with their profits on expired domains |
21:35
🔗
|
woktenna |
Not all evil could be rooted today |
21:37
🔗
|
xmc |
squatters will not listen to you. ia is doing something about it, slowly. |
21:38
🔗
|
woktenna |
@xmc Are you with IA? |
21:38
🔗
|
xmc |
no |
21:38
🔗
|
woktenna |
How could you know then? |
21:38
🔗
|
xmc |
because i talk to people who are |
21:51
🔗
|
woktenna |
Is it possible to find admin of those webservers? |
21:52
🔗
|
woktenna |
Chances are the squatters outsource their operation |
21:52
🔗
|
woktenna |
And only point their A records |
22:26
🔗
|
|
kristian_ has joined #archiveteam-bs |
22:40
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
22:50
🔗
|
JAA |
Update on InterfaceLIFT: I now have a functional wpull hook script which retrieves all sensible resources not accessible directly (images in all resolutions and the portfolio/submission browsers). |
22:50
🔗
|
JAA |
Note that ArchiveBot did pick up images in some resolutions, but I'm almost certain it'll only be able to find about half of them; it'll also miss the portfolio and submission browsers (which are actually pretty redundant but still nice to have for a fully functional archive; they won't work in the Wayback Machine though). |
22:55
🔗
|
JAA |
Unfortunately, based on a very rough estimate, the full archive will be several hundred GB, which is more space than I have available currently. If anyone of you wants to run it or has other suggestions, let me know. |
22:55
🔗
|
JAA |
tammy_: ^ |
22:58
🔗
|
tammy_ |
JAA: If your willing to help me, I'll run it. I storage for days. |
22:58
🔗
|
tammy_ |
*I got |
23:01
🔗
|
JAA |
tammy_: Sure. Do you have a functioning wpull? |
23:02
🔗
|
JAA |
Version 1.2.3, that is. |
23:03
🔗
|
tammy_ |
nope, never even heard of it |
23:04
🔗
|
tammy_ |
I stick to wget |
23:05
🔗
|
JAA |
I see. Do you have Python and pip? |
23:06
🔗
|
tammy_ |
I can aquire anything. infact I'm standing up a new VM for this |
23:08
🔗
|
tammy_ |
single core good enough? |
23:11
🔗
|
JAA |
I guess so, yeah. The limiting factor is time (not overloading the server) and network anyway. |
23:12
🔗
|
JAA |
You'll need Python 3.2+ (including the dev headers) and pip. Which OS are you using? |
23:14
🔗
|
tammy_ |
I'm gigabit, if need be, but I'd rather work through my vpn server. I don't mind cutting over to my personal network if time requires it. |
23:14
🔗
|
tammy_ |
Debian 8.7 |
23:15
🔗
|
tammy_ |
I can stand up something different if that's an issue |
23:16
🔗
|
JAA |
It's the server's network which is slow. Gbit or 10 Mbit probably doesn't make any difference. |
23:16
🔗
|
JAA |
Debian's perfect. :-) |
23:20
🔗
|
JAA |
So the required Python packages are python3 and python3-dev. If you want to install pip system-wide, python3-pip; I normally install it per-user on my machines using https://bootstrap.pypa.io/get-pip.py (wget, then python3 get-pip.py --user). |
23:21
🔗
|
tammy_ |
I am root, I'll just apt it :) |
23:21
🔗
|
JAA |
Ok |
23:22
🔗
|
JAA |
Then: pip install html5lib==0.9999999 (wpull hasn't been updated to deal with the newest version, and the dependencies haven't been fixed either...) |
23:22
🔗
|
JAA |
Followed by: pip install wpull==1.2.3 psutil (I think everything else gets pulled automatically) |
23:23
🔗
|
JAA |
Add a --user flag if you want to do that in the user's directory instead. |
23:29
🔗
|
JAA |
Here's the hook script and the wpull command I used for testing: https://gist.github.com/anonymous/c752b52901d6688d8b677e759c694896 |
23:53
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
23:57
🔗
|
|
WIDOW has joined #archiveteam-bs |