Time |
Nickname |
Message |
00:02
🔗
|
|
BlueMaxim has joined #archiveteam |
00:54
🔗
|
|
SimpBrain has quit IRC (Read error: Operation timed out) |
01:01
🔗
|
|
xk_id has joined #archiveteam |
01:02
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
01:23
🔗
|
|
SimpBrain has joined #archiveteam |
01:31
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
01:38
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
01:53
🔗
|
|
zenguy_pc has joined #archiveteam |
02:20
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
02:34
🔗
|
|
JesseW has joined #archiveteam |
02:35
🔗
|
|
Ravenloft has quit IRC (Remote host closed the connection) |
02:47
🔗
|
|
primus104 has quit IRC (Leaving.) |
03:13
🔗
|
|
JesseW has quit IRC (Leaving.) |
03:41
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
03:52
🔗
|
|
wyatt8750 has quit IRC (Remote host closed the connection) |
03:52
🔗
|
|
zenguy_pc has joined #archiveteam |
03:53
🔗
|
|
wyatt8750 has joined #archiveteam |
04:00
🔗
|
|
bwn has joined #archiveteam |
04:10
🔗
|
|
dashcloud has quit IRC (Ping timeout: 252 seconds) |
04:14
🔗
|
|
dashcloud has joined #archiveteam |
04:38
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:43
🔗
|
|
JesseW has joined #archiveteam |
05:42
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
05:53
🔗
|
|
zenguy_pc has joined #archiveteam |
06:02
🔗
|
|
WinterFox has joined #archiveteam |
06:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:11
🔗
|
|
dashcloud has joined #archiveteam |
06:42
🔗
|
|
cvb has joined #archiveteam |
07:04
🔗
|
|
godane has joined #archiveteam |
07:12
🔗
|
|
insane_al has quit IRC (Remote host closed the connection) |
07:40
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
07:46
🔗
|
|
primus104 has joined #archiveteam |
07:54
🔗
|
|
zenguy_pc has joined #archiveteam |
07:56
🔗
|
|
zer0c00l has joined #archiveteam |
08:47
🔗
|
|
Cameron_D has joined #archiveteam |
08:52
🔗
|
|
JesseW has quit IRC (Leaving.) |
08:54
🔗
|
|
bzc6p_ has joined #archiveteam |
08:57
🔗
|
|
bzc6p has quit IRC (Read error: Operation timed out) |
08:57
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
09:05
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
09:30
🔗
|
|
atomotic has joined #archiveteam |
09:31
🔗
|
|
xk_id has joined #archiveteam |
09:32
🔗
|
|
Ghost_of_ has joined #archiveteam |
09:36
🔗
|
|
schbirid has joined #archiveteam |
09:36
🔗
|
arkiver |
New items added to the wiki grab! |
09:37
🔗
|
arkiver |
I'll do my best to get the yuku project running with new items today |
09:39
🔗
|
|
primus104 has quit IRC (Leaving.) |
09:39
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
09:52
🔗
|
|
bwn has joined #archiveteam |
10:00
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
10:47
🔗
|
|
ohhdemgir has quit IRC (Ping timeout: 252 seconds) |
11:25
🔗
|
Kazzy |
spinning up some wiki pipelines, been a while since i've thrown some power around |
11:46
🔗
|
|
primus104 has joined #archiveteam |
11:49
🔗
|
|
Ungstein has quit IRC (Quit: Leaving.) |
11:50
🔗
|
|
Ungstein has joined #archiveteam |
12:00
🔗
|
Atluxity |
I redirected my juice to wikis-grab too |
12:06
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
12:06
🔗
|
|
atomotic has joined #archiveteam |
12:28
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
12:48
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
13:56
🔗
|
|
Darkstar has quit IRC (Ping timeout: 506 seconds) |
14:05
🔗
|
|
Darkstar has joined #archiveteam |
14:32
🔗
|
|
scyther has joined #archiveteam |
14:35
🔗
|
SketchCow |
The new IA search engine is working better than expected, and collections (when I move them) show up within a couple minutes now. |
14:40
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
14:45
🔗
|
|
Deewiant has quit IRC (Ping timeout: 198 seconds) |
14:46
🔗
|
|
Atluxity has quit IRC (Ping timeout: 370 seconds) |
14:47
🔗
|
|
Deewiant has joined #archiveteam |
14:47
🔗
|
|
Atluxity has joined #archiveteam |
14:49
🔗
|
Atluxity |
nice |
14:50
🔗
|
|
nertzy has joined #archiveteam |
14:54
🔗
|
godane |
cool |
15:08
🔗
|
Jonimus |
woot |
15:08
🔗
|
SketchCow |
Yeah, it's blazing. I'm throwing a lot of shit at it. |
15:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:15
🔗
|
phuzion |
I've got a wiki that I'm grabbing external items for that's got 540K pages. This should be a fun grab. |
15:16
🔗
|
phuzion |
My entire tmux backlog for that page is "Found and queued 500 URLs, continuing..." |
15:16
🔗
|
Atluxity |
wow |
15:18
🔗
|
|
primus104 has quit IRC (Leaving.) |
15:25
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
15:29
🔗
|
arkiver |
phuzion: which wiki is that? |
15:29
🔗
|
arkiver |
or which item? |
15:31
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
15:34
🔗
|
phuzion |
arkiver: http://lt.biologija.wikia.com/wiki/ |
15:34
🔗
|
phuzion |
583K pages, rather |
15:34
🔗
|
arkiver |
so many external links.... http://lt.biologija.wikia.com/wiki/Ragys |
15:35
🔗
|
phuzion |
Hooooooooly shit ok maybe I won't be able to handle that item. |
15:36
🔗
|
arkiver |
if you have enough time and enough space, why not? |
15:36
🔗
|
phuzion |
100GB disk on that box. |
15:36
🔗
|
phuzion |
I could easily see the WARC being 100GB+ for that. |
15:36
🔗
|
arkiver |
We'll see I guess |
15:37
🔗
|
phuzion |
Yeah, I'll keep an eye on my disk usage for a day or two and see how it goes. |
15:37
🔗
|
arkiver |
ok |
15:42
🔗
|
|
vitzli has joined #archiveteam |
15:54
🔗
|
|
Start has joined #archiveteam |
16:05
🔗
|
|
Start has quit IRC (Read error: Operation timed out) |
16:11
🔗
|
godane |
https://archive.org/details/pris-the-world |
16:12
🔗
|
arkiver |
godane: nice |
16:29
🔗
|
|
Start has joined #archiveteam |
16:31
🔗
|
|
scyther has quit IRC (Leaving) |
16:38
🔗
|
|
JesseW has joined #archiveteam |
16:40
🔗
|
|
bithippo has joined #archiveteam |
16:41
🔗
|
bithippo |
Hello #archiveteam! Would someone be able to drop http://www.mfat.govt.nz/Treaties-and-International-Law/01-Treaties-for-which-NZ-is-Depositary/0-Trans-Pacific-Partnership-Text.php into archivebot when they have a moment? Thank you!! |
16:42
🔗
|
DFJustin |
bithippo: done |
16:43
🔗
|
|
arkiver2 has joined #archiveteam |
16:43
🔗
|
bithippo |
@DFJustin: :bow: |
16:44
🔗
|
bithippo |
Thanks again, ArchiveBot dashboard shows it being done super quick. |
16:45
🔗
|
|
arkiver2 has quit IRC (Client Quit) |
16:57
🔗
|
|
JesseW has quit IRC (Leaving.) |
17:00
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:02
🔗
|
|
bithippo has quit IRC (Quit: Page closed) |
17:14
🔗
|
Atluxity |
do we know who "chip" in the wikis-grab is? |
17:14
🔗
|
achip |
howdy |
17:14
🔗
|
Atluxity |
right |
17:14
🔗
|
Atluxity |
you just added a lot of instances to the grab? |
17:15
🔗
|
achip |
20x10 |
17:15
🔗
|
achip |
10 concurrent that is |
17:18
🔗
|
|
Start has joined #archiveteam |
17:19
🔗
|
Atluxity |
so I am wondering why the items per hours is less now |
17:20
🔗
|
achip |
looked like there was a little bit of an rsync bottle neck for a bit |
17:20
🔗
|
Atluxity |
ah, ok |
17:29
🔗
|
|
primus104 has joined #archiveteam |
17:29
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
17:35
🔗
|
Kazzy |
yeah sorry, that was caused by me :( |
17:44
🔗
|
Atluxity |
Kazzy: how? |
17:44
🔗
|
Atluxity |
just trying to understand stuff here |
17:45
🔗
|
Kazzy |
rsync host is capped at 25 connections (FOS i think, logs flying too fast for me to check at this minute) |
17:45
🔗
|
Atluxity |
sounds right |
17:45
🔗
|
Kazzy |
i'm running multiple instances, accidentally had 4 upload slots running for each, ended up eating a ton of the connections |
17:45
🔗
|
Atluxity |
aha |
17:49
🔗
|
yipdw |
fos has 4.9 TB free on the gamefront mount, so we might be able to ramp it up again |
17:49
🔗
|
yipdw |
SketchCow's call |
17:49
🔗
|
yipdw |
its load average is also back to "oh that looks ok" |
17:50
🔗
|
|
afics has quit IRC (Read error: Operation timed out) |
17:50
🔗
|
|
Apathy has quit IRC (Read error: Operation timed out) |
17:51
🔗
|
|
cadbury has quit IRC (Read error: Operation timed out) |
17:54
🔗
|
|
matthusby has quit IRC (Ping timeout: 606 seconds) |
17:54
🔗
|
|
matthusby has joined #archiveteam |
17:56
🔗
|
achip |
if you need a log file to look at http://54.85.211.51/, and search "rsync error" |
17:56
🔗
|
Kazzy |
25 indeed, thanks achip |
17:58
🔗
|
|
Cameron_D has quit IRC (Ping timeout: 606 seconds) |
18:09
🔗
|
|
Apathy has joined #archiveteam |
18:09
🔗
|
|
afics has joined #archiveteam |
18:10
🔗
|
SketchCow |
So, FOS is really getting hammered. |
18:10
🔗
|
SketchCow |
Yes, we have 5tb free back on 1 |
18:10
🔗
|
SketchCow |
We got that by me basically shoving all the gamefront into 0 |
18:11
🔗
|
SketchCow |
So the thing is really grinding through wikis, two gamefront threads, and a couple ZIP operations for the minecraft drive I took in. |
18:11
🔗
|
|
cadbury has joined #archiveteam |
18:23
🔗
|
|
zenguy_pc has joined #archiveteam |
18:25
🔗
|
|
bwn has quit IRC (Ping timeout: 255 seconds) |
18:27
🔗
|
Atluxity |
SketchCow: Do you feel its best to wait with gamefront, or engage? |
18:33
🔗
|
phuzion |
Does this error mean "FOS has too many open connections" or something else? |
18:33
🔗
|
phuzion |
http://pastebin.com/3Fwpnbze |
18:34
🔗
|
phuzion |
I see it timed out, but is FOS refusing to accept the connection because it's overloaded or is there something else going on with my machine? |
18:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:56
🔗
|
SketchCow |
Well, I'm mostly trying to bring the thing down to a more reasonable load. |
18:57
🔗
|
SketchCow |
Is there something OTHER than gamefront in need of the machine? |
18:57
🔗
|
SketchCow |
Otherwise, yeah, turn it on and let's see if it chokes |
18:58
🔗
|
SketchCow |
Can someone help me pull http://martyy92.hyves.nl out of our WARCs? |
18:58
🔗
|
arkiver |
SketchCow: everything that is curently going to FOS has no deadline |
18:58
🔗
|
SketchCow |
OK, then turn on gamefront |
18:59
🔗
|
arkiver |
Ok, can we also higher the max rsync connections? |
18:59
🔗
|
SketchCow |
Just quadrupled it |
19:00
🔗
|
arkiver |
Some projects with a deadline are coming up, like screenr |
19:00
🔗
|
arkiver |
Will pause GameFront for that if needed |
19:00
🔗
|
|
bwn has joined #archiveteam |
19:00
🔗
|
arkiver |
So, for the FTP grab project. |
19:01
🔗
|
arkiver |
If anyone here has some FTPs that need to be saved, please create a list |
19:01
🔗
|
godane |
i went after rain wilson twitter account using archivebox |
19:02
🔗
|
godane |
also grabbed a google cachc of suckington account |
19:05
🔗
|
|
aaaaaaaaa has joined #archiveteam |
19:05
🔗
|
arkiver |
I think we can use filemare and other FTP indexers or lists on the internet to find FTPs for the grab |
19:06
🔗
|
arkiver |
Maybe give a higher priority to edu and government FTPs? |
19:07
🔗
|
aaaaaaaaa |
yes, who knows how many professors are serving FTPs for their little projects |
19:07
🔗
|
aaaaaaaaa |
not to mention how much in software started as university research projects |
19:08
🔗
|
phuzion |
I have a small zmap scan of the internet on port 21 from some time ago, let me see if I can track that down |
19:08
🔗
|
phuzion |
Basically, I ran zmap on a DO instance until I got banned from DO lol |
19:08
🔗
|
xmc |
i do too, i got about 60% of ipv4 |
19:09
🔗
|
xmc |
and then i got a phonecall from upstream |
19:09
🔗
|
phuzion |
"Hey, you're generating a shitload of abuse reports" right? |
19:10
🔗
|
xmc |
yeah |
19:10
🔗
|
aaaaaaaaa |
He just wanted to see how many blacklists he could appear on |
19:15
🔗
|
godane |
i'm starting to upload my www.700wlw.com/media/play urls |
19:16
🔗
|
godane |
turns out wayback machine only have 15 urls in that path |
19:18
🔗
|
Atluxity |
I'll start move my guns over to gamefront |
19:29
🔗
|
arkiver |
aaaaaaaaa: yes, and the very large amount of scientific data |
19:29
🔗
|
arkiver |
vk.nl |
19:29
🔗
|
arkiver |
oops, sorry |
19:44
🔗
|
|
MMovie has quit IRC (Ping timeout: 310 seconds) |
19:46
🔗
|
arkiver |
more items added to the wikis project! |
19:49
🔗
|
|
SilSte has quit IRC (Read error: Connection reset by peer) |
19:49
🔗
|
|
Silvan has joined #archiveteam |
19:51
🔗
|
|
MMovie has joined #archiveteam |
20:07
🔗
|
phuzion |
arkiver: nice, thanks. |
20:08
🔗
|
phuzion |
Damn, achip, how many threads are you running right now on wikis? |
20:09
🔗
|
phuzion |
Oh |
20:09
🔗
|
phuzion |
I should read the backlog. 200 threads. |
20:17
🔗
|
DFJustin |
http://piratepad.net/effteepees is the ftp list we were working from in #effteepee |
20:26
🔗
|
schbirid |
DFJustin: random stuff from my bookmarks ftp://ftp.cmdl.noaa.gov/ ftp://ftp.fbo.gov/ ftp://de.aminet.net/ ftp://ftp.uni-erlangen.de/ ftp://lvlmirror.mhgaming.com/ ftp://www.artfiles.org/ |
20:26
🔗
|
schbirid |
some huge, some small, some fast, some slow, some mirrors |
20:26
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
20:33
🔗
|
|
bzc6p__ has joined #archiveteam |
20:39
🔗
|
|
zenguy_pc has joined #archiveteam |
20:39
🔗
|
|
bzc6p_ has quit IRC (Read error: Operation timed out) |
20:48
🔗
|
|
Start has joined #archiveteam |
20:54
🔗
|
schbirid |
archivebot on http://data.deutschebahn.com/ would be nice, ~30MB total. thanks! |
20:58
🔗
|
Atluxity |
it has been done |
21:00
🔗
|
schbirid |
thanks |
21:02
🔗
|
|
scyther has joined #archiveteam |
21:03
🔗
|
SketchCow |
I've now spent 5 years cleaning up the godane inbox |
21:03
🔗
|
SketchCow |
A force of nature |
21:17
🔗
|
ersi |
Is he over 500k items yet? I know he was getting near |
21:19
🔗
|
Kazzy |
494k as of yesterday, iirc |
21:22
🔗
|
|
Nemo_bis has joined #archiveteam |
21:22
🔗
|
|
jspiros has quit IRC (Ping timeout: 186 seconds) |
21:22
🔗
|
|
Nemo_bis_ has quit IRC (Ping timeout: 186 seconds) |
21:23
🔗
|
|
jspiros has joined #archiveteam |
21:24
🔗
|
godane |
495k now |
21:26
🔗
|
myself |
clue in a newbie, what's that about? |
21:26
🔗
|
DFJustin |
godane uploads a lot of things to archive.org |
21:26
🔗
|
|
bzc6p_ has joined #archiveteam |
21:27
🔗
|
myself |
oh so this is a file inbox, not email.. |
21:27
🔗
|
myself |
makes sense now, got it :) |
21:28
🔗
|
DFJustin |
A LOT of things |
21:28
🔗
|
DFJustin |
I'm not sure when he eats and sleeps |
21:28
🔗
|
godane |
i still have to do more crazy stuff later |
21:29
🔗
|
godane |
i'm about to eat right now |
21:29
🔗
|
DFJustin |
myself: https://archive.org/details/@chris85 |
21:30
🔗
|
myself |
like a true archivist, his keyboard has a repository of food crumbs going back several years, each tagged with metadata |
21:30
🔗
|
arkiver |
SketchCow: what would be a good size per items for the ftp grab? |
21:31
🔗
|
|
bzc6p__ has quit IRC (Read error: Operation timed out) |
21:40
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
21:41
🔗
|
|
jleclanch has quit IRC (Remote host closed the connection) |
21:42
🔗
|
schbirid |
godane: have a good appetite! |
21:42
🔗
|
schbirid |
as we say in germany at least |
21:44
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:44
🔗
|
|
Start has joined #archiveteam |
21:45
🔗
|
|
jleclanch has joined #archiveteam |
21:48
🔗
|
phuzion |
arkiver: can you do me a favor real quick and release that wiki we were talking about earlier and re-assign it to the nick phuzion-1 for me? |
21:49
🔗
|
arkiver |
sure |
21:49
🔗
|
aaaaaaaaa |
get too big? |
21:49
🔗
|
phuzion |
No, I wanna be able to monitor it in its own window. |
21:50
🔗
|
phuzion |
and I think the thread it was running under died anyways |
21:50
🔗
|
aaaaaaaaa |
may also want to run "df -h" in a loop too. |
21:50
🔗
|
arkiver |
mediawikieu:lt.biologija.wikia.com/api.php:lt.biologija.wikia.com/wiki/ added to user phuzion-1 |
21:50
🔗
|
phuzion |
arkiver: thanks :) |
21:50
🔗
|
arkiver |
;) |
22:05
🔗
|
|
jleclanch has quit IRC (Remote host closed the connection) |
22:10
🔗
|
|
jleclanch has joined #archiveteam |
22:12
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
22:15
🔗
|
|
bzc6p_ is now known as bzc6p |
22:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:20
🔗
|
arkiver |
SketchCow: I'm thinking items of 200 MB for the FTP grab |
22:24
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
22:28
🔗
|
|
bwn has joined #archiveteam |
22:30
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
22:30
🔗
|
phuzion |
I have a file with 23 million IP addresses that responded on port 21, is that useful to us? |
22:30
🔗
|
phuzion |
lol |
22:30
🔗
|
phuzion |
Granted, the data is from August of 2014 |
22:31
🔗
|
phuzion |
But maybe we can start working with that? |
22:35
🔗
|
|
BlueMaxim has joined #archiveteam |
22:39
🔗
|
|
zenguy_pc has joined #archiveteam |
22:44
🔗
|
|
rxhivert has joined #archiveteam |
22:44
🔗
|
|
rxhivert has quit IRC (Connection closed) |
22:45
🔗
|
|
rxhivert has joined #archiveteam |
22:45
🔗
|
rxhivert |
can somebody clear the queue for rxhivert, rxhivert2 and rxhivert5? |
22:45
🔗
|
rxhivert |
had to reboot server |
22:46
🔗
|
rxhivert |
(game front grab) |
22:52
🔗
|
|
jleclanch has quit IRC (Remote host closed the connection) |
22:53
🔗
|
|
jleclanch has joined #archiveteam |
22:55
🔗
|
phuzion |
In case anyone wants 23 million IPs that responded on port 21 about 14 months ago, check this file out: http://irc.teh-server.com/files/ftpsites.gz |
23:18
🔗
|
|
Ravenloft has joined #archiveteam |
23:30
🔗
|
SketchCow |
I want that. |
23:31
🔗
|
|
rxhivert has quit IRC (Quit: rxhivert) |
23:32
🔗
|
joepie91 |
lol |
23:36
🔗
|
phuzion |
SketchCow: Take it and enjoy :) |
23:56
🔗
|
|
Start has joined #archiveteam |