#archiveteam 2015-11-05,Thu

↑back Search

Time Nickname Message
00:02 🔗 BlueMaxim has joined #archiveteam
00:54 🔗 SimpBrain has quit IRC (Read error: Operation timed out)
01:01 🔗 xk_id has joined #archiveteam
01:02 🔗 godane has quit IRC (Read error: Operation timed out)
01:23 🔗 SimpBrain has joined #archiveteam
01:31 🔗 Ghost_of_ has quit IRC (Quit: Leaving)
01:38 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
01:53 🔗 zenguy_pc has joined #archiveteam
02:20 🔗 bwn has quit IRC (Read error: Operation timed out)
02:34 🔗 JesseW has joined #archiveteam
02:35 🔗 Ravenloft has quit IRC (Remote host closed the connection)
02:47 🔗 primus104 has quit IRC (Leaving.)
03:13 🔗 JesseW has quit IRC (Leaving.)
03:41 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
03:52 🔗 wyatt8750 has quit IRC (Remote host closed the connection)
03:52 🔗 zenguy_pc has joined #archiveteam
03:53 🔗 wyatt8750 has joined #archiveteam
04:00 🔗 bwn has joined #archiveteam
04:10 🔗 dashcloud has quit IRC (Ping timeout: 252 seconds)
04:14 🔗 dashcloud has joined #archiveteam
04:38 🔗 aaaaaaaaa has quit IRC (Leaving)
04:43 🔗 JesseW has joined #archiveteam
05:42 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
05:53 🔗 zenguy_pc has joined #archiveteam
06:02 🔗 WinterFox has joined #archiveteam
06:08 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:11 🔗 dashcloud has joined #archiveteam
06:42 🔗 cvb has joined #archiveteam
07:04 🔗 godane has joined #archiveteam
07:12 🔗 insane_al has quit IRC (Remote host closed the connection)
07:40 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
07:46 🔗 primus104 has joined #archiveteam
07:54 🔗 zenguy_pc has joined #archiveteam
07:56 🔗 zer0c00l has joined #archiveteam
08:47 🔗 Cameron_D has joined #archiveteam
08:52 🔗 JesseW has quit IRC (Leaving.)
08:54 🔗 bzc6p_ has joined #archiveteam
08:57 🔗 bzc6p has quit IRC (Read error: Operation timed out)
08:57 🔗 xk_id has quit IRC (Remote host closed the connection)
09:05 🔗 bwn has quit IRC (Read error: Operation timed out)
09:30 🔗 atomotic has joined #archiveteam
09:31 🔗 xk_id has joined #archiveteam
09:32 🔗 Ghost_of_ has joined #archiveteam
09:36 🔗 schbirid has joined #archiveteam
09:36 🔗 arkiver New items added to the wiki grab!
09:37 🔗 arkiver I'll do my best to get the yuku project running with new items today
09:39 🔗 primus104 has quit IRC (Leaving.)
09:39 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
09:52 🔗 bwn has joined #archiveteam
10:00 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
10:47 🔗 ohhdemgir has quit IRC (Ping timeout: 252 seconds)
11:25 🔗 Kazzy spinning up some wiki pipelines, been a while since i've thrown some power around
11:46 🔗 primus104 has joined #archiveteam
11:49 🔗 Ungstein has quit IRC (Quit: Leaving.)
11:50 🔗 Ungstein has joined #archiveteam
12:00 🔗 Atluxity I redirected my juice to wikis-grab too
12:06 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:06 🔗 atomotic has joined #archiveteam
12:28 🔗 WinterFox has quit IRC (Remote host closed the connection)
12:48 🔗 Ghost_of_ has quit IRC (Quit: Leaving)
13:56 🔗 Darkstar has quit IRC (Ping timeout: 506 seconds)
14:05 🔗 Darkstar has joined #archiveteam
14:32 🔗 scyther has joined #archiveteam
14:35 🔗 SketchCow The new IA search engine is working better than expected, and collections (when I move them) show up within a couple minutes now.
14:40 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:45 🔗 Deewiant has quit IRC (Ping timeout: 198 seconds)
14:46 🔗 Atluxity has quit IRC (Ping timeout: 370 seconds)
14:47 🔗 Deewiant has joined #archiveteam
14:47 🔗 Atluxity has joined #archiveteam
14:49 🔗 Atluxity nice
14:50 🔗 nertzy has joined #archiveteam
14:54 🔗 godane cool
15:08 🔗 Jonimus woot
15:08 🔗 SketchCow Yeah, it's blazing. I'm throwing a lot of shit at it.
15:15 🔗 Start has quit IRC (Quit: Disconnected.)
15:15 🔗 phuzion I've got a wiki that I'm grabbing external items for that's got 540K pages. This should be a fun grab.
15:16 🔗 phuzion My entire tmux backlog for that page is "Found and queued 500 URLs, continuing..."
15:16 🔗 Atluxity wow
15:18 🔗 primus104 has quit IRC (Leaving.)
15:25 🔗 nertzy has quit IRC (Quit: This computer has gone to sleep)
15:29 🔗 arkiver phuzion: which wiki is that?
15:29 🔗 arkiver or which item?
15:31 🔗 xk_id has quit IRC (Remote host closed the connection)
15:34 🔗 phuzion arkiver: http://lt.biologija.wikia.com/wiki/
15:34 🔗 phuzion 583K pages, rather
15:34 🔗 arkiver so many external links.... http://lt.biologija.wikia.com/wiki/Ragys
15:35 🔗 phuzion Hooooooooly shit ok maybe I won't be able to handle that item.
15:36 🔗 arkiver if you have enough time and enough space, why not?
15:36 🔗 phuzion 100GB disk on that box.
15:36 🔗 phuzion I could easily see the WARC being 100GB+ for that.
15:36 🔗 arkiver We'll see I guess
15:37 🔗 phuzion Yeah, I'll keep an eye on my disk usage for a day or two and see how it goes.
15:37 🔗 arkiver ok
15:42 🔗 vitzli has joined #archiveteam
15:54 🔗 Start has joined #archiveteam
16:05 🔗 Start has quit IRC (Read error: Operation timed out)
16:11 🔗 godane https://archive.org/details/pris-the-world
16:12 🔗 arkiver godane: nice
16:29 🔗 Start has joined #archiveteam
16:31 🔗 scyther has quit IRC (Leaving)
16:38 🔗 JesseW has joined #archiveteam
16:40 🔗 bithippo has joined #archiveteam
16:41 🔗 bithippo Hello #archiveteam! Would someone be able to drop http://www.mfat.govt.nz/Treaties-and-International-Law/01-Treaties-for-which-NZ-is-Depositary/0-Trans-Pacific-Partnership-Text.php into archivebot when they have a moment? Thank you!!
16:42 🔗 DFJustin bithippo: done
16:43 🔗 arkiver2 has joined #archiveteam
16:43 🔗 bithippo @DFJustin: :bow:
16:44 🔗 bithippo Thanks again, ArchiveBot dashboard shows it being done super quick.
16:45 🔗 arkiver2 has quit IRC (Client Quit)
16:57 🔗 JesseW has quit IRC (Leaving.)
17:00 🔗 Start has quit IRC (Quit: Disconnected.)
17:02 🔗 bithippo has quit IRC (Quit: Page closed)
17:14 🔗 Atluxity do we know who "chip" in the wikis-grab is?
17:14 🔗 achip howdy
17:14 🔗 Atluxity right
17:14 🔗 Atluxity you just added a lot of instances to the grab?
17:15 🔗 achip 20x10
17:15 🔗 achip 10 concurrent that is
17:18 🔗 Start has joined #archiveteam
17:19 🔗 Atluxity so I am wondering why the items per hours is less now
17:20 🔗 achip looked like there was a little bit of an rsync bottle neck for a bit
17:20 🔗 Atluxity ah, ok
17:29 🔗 primus104 has joined #archiveteam
17:29 🔗 vitzli has quit IRC (Quit: Leaving)
17:35 🔗 Kazzy yeah sorry, that was caused by me :(
17:44 🔗 Atluxity Kazzy: how?
17:44 🔗 Atluxity just trying to understand stuff here
17:45 🔗 Kazzy rsync host is capped at 25 connections (FOS i think, logs flying too fast for me to check at this minute)
17:45 🔗 Atluxity sounds right
17:45 🔗 Kazzy i'm running multiple instances, accidentally had 4 upload slots running for each, ended up eating a ton of the connections
17:45 🔗 Atluxity aha
17:49 🔗 yipdw fos has 4.9 TB free on the gamefront mount, so we might be able to ramp it up again
17:49 🔗 yipdw SketchCow's call
17:49 🔗 yipdw its load average is also back to "oh that looks ok"
17:50 🔗 afics has quit IRC (Read error: Operation timed out)
17:50 🔗 Apathy has quit IRC (Read error: Operation timed out)
17:51 🔗 cadbury has quit IRC (Read error: Operation timed out)
17:54 🔗 matthusby has quit IRC (Ping timeout: 606 seconds)
17:54 🔗 matthusby has joined #archiveteam
17:56 🔗 achip if you need a log file to look at http://54.85.211.51/, and search "rsync error"
17:56 🔗 Kazzy 25 indeed, thanks achip
17:58 🔗 Cameron_D has quit IRC (Ping timeout: 606 seconds)
18:09 🔗 Apathy has joined #archiveteam
18:09 🔗 afics has joined #archiveteam
18:10 🔗 SketchCow So, FOS is really getting hammered.
18:10 🔗 SketchCow Yes, we have 5tb free back on 1
18:10 🔗 SketchCow We got that by me basically shoving all the gamefront into 0
18:11 🔗 SketchCow So the thing is really grinding through wikis, two gamefront threads, and a couple ZIP operations for the minecraft drive I took in.
18:11 🔗 cadbury has joined #archiveteam
18:23 🔗 zenguy_pc has joined #archiveteam
18:25 🔗 bwn has quit IRC (Ping timeout: 255 seconds)
18:27 🔗 Atluxity SketchCow: Do you feel its best to wait with gamefront, or engage?
18:33 🔗 phuzion Does this error mean "FOS has too many open connections" or something else?
18:33 🔗 phuzion http://pastebin.com/3Fwpnbze
18:34 🔗 phuzion I see it timed out, but is FOS refusing to accept the connection because it's overloaded or is there something else going on with my machine?
18:38 🔗 Start has quit IRC (Quit: Disconnected.)
18:56 🔗 SketchCow Well, I'm mostly trying to bring the thing down to a more reasonable load.
18:57 🔗 SketchCow Is there something OTHER than gamefront in need of the machine?
18:57 🔗 SketchCow Otherwise, yeah, turn it on and let's see if it chokes
18:58 🔗 SketchCow Can someone help me pull http://martyy92.hyves.nl out of our WARCs?
18:58 🔗 arkiver SketchCow: everything that is curently going to FOS has no deadline
18:58 🔗 SketchCow OK, then turn on gamefront
18:59 🔗 arkiver Ok, can we also higher the max rsync connections?
18:59 🔗 SketchCow Just quadrupled it
19:00 🔗 arkiver Some projects with a deadline are coming up, like screenr
19:00 🔗 arkiver Will pause GameFront for that if needed
19:00 🔗 bwn has joined #archiveteam
19:00 🔗 arkiver So, for the FTP grab project.
19:01 🔗 arkiver If anyone here has some FTPs that need to be saved, please create a list
19:01 🔗 godane i went after rain wilson twitter account using archivebox
19:02 🔗 godane also grabbed a google cachc of suckington account
19:05 🔗 aaaaaaaaa has joined #archiveteam
19:05 🔗 arkiver I think we can use filemare and other FTP indexers or lists on the internet to find FTPs for the grab
19:06 🔗 arkiver Maybe give a higher priority to edu and government FTPs?
19:07 🔗 aaaaaaaaa yes, who knows how many professors are serving FTPs for their little projects
19:07 🔗 aaaaaaaaa not to mention how much in software started as university research projects
19:08 🔗 phuzion I have a small zmap scan of the internet on port 21 from some time ago, let me see if I can track that down
19:08 🔗 phuzion Basically, I ran zmap on a DO instance until I got banned from DO lol
19:08 🔗 xmc i do too, i got about 60% of ipv4
19:09 🔗 xmc and then i got a phonecall from upstream
19:09 🔗 phuzion "Hey, you're generating a shitload of abuse reports" right?
19:10 🔗 xmc yeah
19:10 🔗 aaaaaaaaa He just wanted to see how many blacklists he could appear on
19:15 🔗 godane i'm starting to upload my www.700wlw.com/media/play urls
19:16 🔗 godane turns out wayback machine only have 15 urls in that path
19:18 🔗 Atluxity I'll start move my guns over to gamefront
19:29 🔗 arkiver aaaaaaaaa: yes, and the very large amount of scientific data
19:29 🔗 arkiver vk.nl
19:29 🔗 arkiver oops, sorry
19:44 🔗 MMovie has quit IRC (Ping timeout: 310 seconds)
19:46 🔗 arkiver more items added to the wikis project!
19:49 🔗 SilSte has quit IRC (Read error: Connection reset by peer)
19:49 🔗 Silvan has joined #archiveteam
19:51 🔗 MMovie has joined #archiveteam
20:07 🔗 phuzion arkiver: nice, thanks.
20:08 🔗 phuzion Damn, achip, how many threads are you running right now on wikis?
20:09 🔗 phuzion Oh
20:09 🔗 phuzion I should read the backlog. 200 threads.
20:17 🔗 DFJustin http://piratepad.net/effteepees is the ftp list we were working from in #effteepee
20:26 🔗 schbirid DFJustin: random stuff from my bookmarks ftp://ftp.cmdl.noaa.gov/ ftp://ftp.fbo.gov/ ftp://de.aminet.net/ ftp://ftp.uni-erlangen.de/ ftp://lvlmirror.mhgaming.com/ ftp://www.artfiles.org/
20:26 🔗 schbirid some huge, some small, some fast, some slow, some mirrors
20:26 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
20:33 🔗 bzc6p__ has joined #archiveteam
20:39 🔗 zenguy_pc has joined #archiveteam
20:39 🔗 bzc6p_ has quit IRC (Read error: Operation timed out)
20:48 🔗 Start has joined #archiveteam
20:54 🔗 schbirid archivebot on http://data.deutschebahn.com/ would be nice, ~30MB total. thanks!
20:58 🔗 Atluxity it has been done
21:00 🔗 schbirid thanks
21:02 🔗 scyther has joined #archiveteam
21:03 🔗 SketchCow I've now spent 5 years cleaning up the godane inbox
21:03 🔗 SketchCow A force of nature
21:17 🔗 ersi Is he over 500k items yet? I know he was getting near
21:19 🔗 Kazzy 494k as of yesterday, iirc
21:22 🔗 Nemo_bis has joined #archiveteam
21:22 🔗 jspiros has quit IRC (Ping timeout: 186 seconds)
21:22 🔗 Nemo_bis_ has quit IRC (Ping timeout: 186 seconds)
21:23 🔗 jspiros has joined #archiveteam
21:24 🔗 godane 495k now
21:26 🔗 myself clue in a newbie, what's that about?
21:26 🔗 DFJustin godane uploads a lot of things to archive.org
21:26 🔗 bzc6p_ has joined #archiveteam
21:27 🔗 myself oh so this is a file inbox, not email..
21:27 🔗 myself makes sense now, got it :)
21:28 🔗 DFJustin A LOT of things
21:28 🔗 DFJustin I'm not sure when he eats and sleeps
21:28 🔗 godane i still have to do more crazy stuff later
21:29 🔗 godane i'm about to eat right now
21:29 🔗 DFJustin myself: https://archive.org/details/@chris85
21:30 🔗 myself like a true archivist, his keyboard has a repository of food crumbs going back several years, each tagged with metadata
21:30 🔗 arkiver SketchCow: what would be a good size per items for the ftp grab?
21:31 🔗 bzc6p__ has quit IRC (Read error: Operation timed out)
21:40 🔗 Start has quit IRC (Quit: Disconnected.)
21:41 🔗 jleclanch has quit IRC (Remote host closed the connection)
21:42 🔗 schbirid godane: have a good appetite!
21:42 🔗 schbirid as we say in germany at least
21:44 🔗 schbirid has quit IRC (Quit: Leaving)
21:44 🔗 Start has joined #archiveteam
21:45 🔗 jleclanch has joined #archiveteam
21:48 🔗 phuzion arkiver: can you do me a favor real quick and release that wiki we were talking about earlier and re-assign it to the nick phuzion-1 for me?
21:49 🔗 arkiver sure
21:49 🔗 aaaaaaaaa get too big?
21:49 🔗 phuzion No, I wanna be able to monitor it in its own window.
21:50 🔗 phuzion and I think the thread it was running under died anyways
21:50 🔗 aaaaaaaaa may also want to run "df -h" in a loop too.
21:50 🔗 arkiver mediawikieu:lt.biologija.wikia.com/api.php:lt.biologija.wikia.com/wiki/ added to user phuzion-1
21:50 🔗 phuzion arkiver: thanks :)
21:50 🔗 arkiver ;)
22:05 🔗 jleclanch has quit IRC (Remote host closed the connection)
22:10 🔗 jleclanch has joined #archiveteam
22:12 🔗 bwn has quit IRC (Read error: Operation timed out)
22:15 🔗 bzc6p_ is now known as bzc6p
22:16 🔗 Start has quit IRC (Quit: Disconnected.)
22:20 🔗 arkiver SketchCow: I'm thinking items of 200 MB for the FTP grab
22:24 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
22:28 🔗 bwn has joined #archiveteam
22:30 🔗 scyther has quit IRC (Read error: Connection reset by peer)
22:30 🔗 phuzion I have a file with 23 million IP addresses that responded on port 21, is that useful to us?
22:30 🔗 phuzion lol
22:30 🔗 phuzion Granted, the data is from August of 2014
22:31 🔗 phuzion But maybe we can start working with that?
22:35 🔗 BlueMaxim has joined #archiveteam
22:39 🔗 zenguy_pc has joined #archiveteam
22:44 🔗 rxhivert has joined #archiveteam
22:44 🔗 rxhivert has quit IRC (Connection closed)
22:45 🔗 rxhivert has joined #archiveteam
22:45 🔗 rxhivert can somebody clear the queue for rxhivert, rxhivert2 and rxhivert5?
22:45 🔗 rxhivert had to reboot server
22:46 🔗 rxhivert (game front grab)
22:52 🔗 jleclanch has quit IRC (Remote host closed the connection)
22:53 🔗 jleclanch has joined #archiveteam
22:55 🔗 phuzion In case anyone wants 23 million IPs that responded on port 21 about 14 months ago, check this file out: http://irc.teh-server.com/files/ftpsites.gz
23:18 🔗 Ravenloft has joined #archiveteam
23:30 🔗 SketchCow I want that.
23:31 🔗 rxhivert has quit IRC (Quit: rxhivert)
23:32 🔗 joepie91 lol
23:36 🔗 phuzion SketchCow: Take it and enjoy :)
23:56 🔗 Start has joined #archiveteam

irclogger-viewer