#archiveteam 2016-10-31,Mon

↑back Search

Time Nickname Message
00:01 🔗 benuski has quit IRC (Quit: Leaving)
00:18 🔗 maelstrom has quit IRC (Quit: Leaving)
00:22 🔗 maelstrom has joined #archiveteam
00:26 🔗 hive-mind has quit IRC (Ping timeout: 260 seconds)
00:26 🔗 hive-mind has joined #archiveteam
00:31 🔗 jrwr has quit IRC (Remote host closed the connection)
00:33 🔗 jrwr has joined #archiveteam
00:49 🔗 powerKitt has joined #archiveteam
00:59 🔗 powerKitt has quit IRC (Quit: Page closed)
01:06 🔗 JesseW has joined #archiveteam
01:45 🔗 kristian_ has quit IRC (Quit: Leaving)
02:01 🔗 ravetcofx has quit IRC (Ping timeout: 506 seconds)
02:10 🔗 ravetcofx has joined #archiveteam
02:24 🔗 rudolphos has joined #archiveteam
02:25 🔗 jrwr has quit IRC (Remote host closed the connection)
02:29 🔗 rudolphos has quit IRC (Leaving)
02:52 🔗 ndiddy has quit IRC (Quit: Leaving)
02:53 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
02:53 🔗 Froggypwn has joined #archiveteam
02:53 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
02:54 🔗 BlueMaxim has joined #archiveteam
03:54 🔗 GLaDOS has quit IRC (Quit: Oh crap, I died.)
04:18 🔗 maelstrom has quit IRC (Remote host closed the connection)
05:12 🔗 balrog has quit IRC (Ping timeout: 260 seconds)
05:21 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:27 🔗 Sk1d has joined #archiveteam
05:52 🔗 Start has quit IRC (Quit: Disconnected.)
05:55 🔗 Start has joined #archiveteam
06:15 🔗 balrog has joined #archiveteam
06:15 🔗 swebb sets mode: +o balrog
06:30 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
08:00 🔗 Observer has quit IRC (Ping timeout: 268 seconds)
08:15 🔗 WinterFox has joined #archiveteam
08:36 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
08:37 🔗 khaoohs_ has joined #archiveteam
08:42 🔗 W1nterFox has joined #archiveteam
08:48 🔗 WinterFox has quit IRC (Read error: Operation timed out)
09:31 🔗 Medowar0 who was doing the home.arcor.de discovery? Some more google scraping, again, raw output, no dedup etc. https://www.medowar.de/lab/at/arcor/liste2.txt
10:04 🔗 PurpleSym That would be me, Medowar0.
10:05 🔗 ravetcofx has quit IRC (Ping timeout: 506 seconds)
10:19 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:43 🔗 antomati_ has joined #archiveteam
10:43 🔗 swebb sets mode: +o antomati_
10:49 🔗 antomatic has quit IRC (Read error: Operation timed out)
11:39 🔗 Budgiebra has joined #archiveteam
11:53 🔗 Medowar0 rip. DNShistory is now officially offline. I was crawling it very slowly, but it is now officially dead.
12:25 🔗 Budgiebra has left
12:32 🔗 khaoohs__ has joined #archiveteam
12:34 🔗 khaoohs_ has quit IRC (Read error: Operation timed out)
12:57 🔗 tatata has joined #archiveteam
12:58 🔗 tatata has quit IRC (Client Quit)
13:23 🔗 bRick5772 has joined #archiveteam
13:39 🔗 W1nterFox has quit IRC (Read error: Operation timed out)
13:51 🔗 sep332 has joined #archiveteam
14:25 🔗 ndiddy has joined #archiveteam
14:26 🔗 ndizzle has joined #archiveteam
14:26 🔗 ndizzle has quit IRC (Read error: Connection reset by peer)
15:25 🔗 arkiver sets mode: +o HCross
15:48 🔗 JesseW has joined #archiveteam
16:08 🔗 RichardG has joined #archiveteam
16:09 🔗 atomotic has joined #archiveteam
16:33 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:48 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
17:16 🔗 SketchCow Define "Megawarc seems stuck"
17:19 🔗 SketchCow Greetings, I'm home
17:19 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
17:19 🔗 SketchCow I expect to drink 45 5-hour energy drinks and go through our (and other) backlogs
17:19 🔗 BartoCH has joined #archiveteam
17:19 🔗 SketchCow POSTIMAGE
17:19 🔗 SketchCow HELLO POSTIMAGE
17:20 🔗 kristian_ has joined #archiveteam
17:21 🔗 SketchCow Hello ArchiveTeam,
17:21 🔗 SketchCow Our project hosts over 140 million images used in ~450k websites all over the web, including a number of vibrant communities and bulletin boards.
17:21 🔗 SketchCow We have recently found ourselves in financial dire straits, and would like to investigate the opportunities for archiving our collection in case we do not survive after all [although there's still a good chance that we do]. Our total image database is nearly 100Tb large, but almost 40% of that is adult imagery that we believe can be safely sacrificed.
17:21 🔗 SketchCow What do you think about this?
17:21 🔗 xmc let's take it
17:21 🔗 xmc it's uh
17:22 🔗 SketchCow I'm going to write them about it
17:22 🔗 SketchCow And request a phone call, etc.
17:22 🔗 xmc 20x as big as gitorious, and i was the only person willing to host gitorious
17:22 🔗 SketchCow I just want hard drives
17:22 🔗 xmc aye
17:27 🔗 JW_work has joined #archiveteam
17:33 🔗 SketchCow Anyway, that's on the mantle
17:39 🔗 powerKitt has joined #archiveteam
17:39 🔗 arkiver SketchCow: I think they announced on their page they're not in trouble anymore
17:40 🔗 SketchCow Which, Postimage?
17:47 🔗 DFJustin won't somebody think of the adult imagery
17:49 🔗 arkiver http://postimage.org/
17:49 🔗 arkiver yeah
17:50 🔗 arkiver but it looks like it's changed/removed now
17:50 🔗 arkiver do we still want to grab it?
17:58 🔗 powerKitt Looks like they're still saying their in danger of closing.
18:00 🔗 Aranje has joined #archiveteam
18:04 🔗 PepsiMax has joined #archiveteam
18:09 🔗 bwn has quit IRC (Read error: Operation timed out)
18:13 🔗 SketchCow They've mailed me and I'm working to get a con call
18:13 🔗 xmc rad
18:16 🔗 Kenshin SketchCow: i'm going to discuss internally to see if we can step in to help postimage with our cdn
18:20 🔗 SketchCow You got it.
18:20 🔗 SketchCow But ideally they just send us 50 hard drives.
18:20 🔗 SketchCow We have tons of hard drives.
18:21 🔗 Kaz jesus, reading their blog post
18:21 🔗 Kenshin true, but then project goes into read only mode over at archive.org. not a bad thing if it's kept alive instead
18:22 🔗 Kenshin Kaz: there's plenty of other traffic heavy sites that are hiding behind cloudflare, just a matter of time before they get snuffed out
18:22 🔗 Kaz I guess
18:22 🔗 Kenshin most people just assume cloudflare would offer them free bw forever
18:22 🔗 xmc lol
18:22 🔗 Kaz just the fact that they really had no plans at all, "We didn�t pay enough attention to making money off Postimage"
18:22 🔗 Kenshin they're probably developers > business people
18:30 🔗 bwn has joined #archiveteam
18:40 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
18:47 🔗 bwn has joined #archiveteam
18:51 🔗 Nemo_bis Kenshin: do you mean heroku can provide a cheaper option when cloudflare stops paying all the bills?
18:51 🔗 Kenshin Nemo_bis: don't get you
18:53 🔗 Yoshimura There are a lot of cheaper providers.
18:53 🔗 Yoshimura The only question is if they can handle the load.
18:57 🔗 cadbury_ has quit IRC (Read error: Operation timed out)
18:57 🔗 yipdw i have a lot of nice things to say about Heroku but "cheap" is not one of them
18:58 🔗 yipdw well, maybe at company budgets it is
18:58 🔗 yipdw on individual scale though
19:05 🔗 ravetcofx has joined #archiveteam
19:06 🔗 bsmith093 has quit IRC (Read error: Operation timed out)
19:08 🔗 cadbury_ has joined #archiveteam
19:25 🔗 BlackoutI has joined #archiveteam
19:27 🔗 BlackoutI has left
19:27 🔗 Blackout has joined #archiveteam
19:28 🔗 Blackout So is there any point in setting my warrior to vine rn?
19:31 🔗 Yoshimura Blackout: Nope. Use Archiveteam's choice
19:32 🔗 Yoshimura That's always the best, unless you are around your instance all the time and obsessed.
19:37 🔗 Blackout @Yoshimura I have a gigabit line and I figured I'd probably want to run multiple projects. I'm assuming a ton of warriors is inefficient?
19:39 🔗 Yoshimura Glad you ask, running a lot of VMs, is inefficient, yes. Warrior itself is inefficient, yes. So you got it squared.
19:39 🔗 Yoshimura Simplest thing is to run multiple warriors with modified code in a docker.
19:39 🔗 xmc also. you shouldn't run more than one warrior per IP
19:39 🔗 Yoshimura xmc: Why not?
19:39 🔗 Blackout Even for different projects?
19:39 🔗 xmc if you stack multiple warriors on the same ip address, you're twice as likely to get ip-banned
19:39 🔗 xmc etc
19:40 🔗 xmc for the same project, that is
19:40 🔗 Yoshimura Yeah, ip bans are obvious thing.
19:40 🔗 Blackout So basically only have one on auto
19:40 🔗 xmc because we try to make it so that each warrior sneaks in under IP bans by whoever we're archiving
19:40 🔗 Yoshimura Blackout: Is that Linux?
19:40 🔗 Blackout Well my main box is Win 10 With hyperV but I have an ESXi host as well on my lan
19:42 🔗 Blackout I would say I'm new here but I've been around once before for a tracker I can't quite recall the name of
19:42 🔗 Yoshimura Linux host and Warrior dockers (just forward UI to different port), one per project. And use mounting parameters: relatime. And idealy also writeback (need filesystem tweak first). Or mount /data in the containers to a ramdisk.
19:42 🔗 Yoshimura Each wget thread competes for IO, plus syslog, so it is pretty inefficient without tweaks (which the VM has)
19:43 🔗 Blackout How much data do they download before shipping it back?
19:48 🔗 bsmith093 has joined #archiveteam
19:49 🔗 Blackout o/
19:49 🔗 Kaz you could also just run the scripts for each project if you're confortable with that. means you can run a lot higher in terms of concurrency etc
19:50 🔗 Yoshimura You can just modify single number in code and it will for Warrior also, but you need to watch the IP bans.
19:51 🔗 Yoshimura Like Panoramio will not care, but with more threads and process context switching you do not get much performance above about 10-20 threads.
19:52 🔗 Blackout What kind of disk space should I allocate though?
19:57 🔗 Yoshimura Depends on project. Panoramio items are small, so running that of tmpfs is fine. tmpfs can swap if needed. I run on gigabytes, but keep close eye, and I run on 100Mbit line.
19:58 🔗 Yoshimura Panoramio needs few dozen MB per thread.
19:58 🔗 SketchCow The Internet Archive S3 infrastructure just got a boost
19:58 🔗 Blackout They use S3?
20:00 🔗 Yoshimura Great to hear.
20:00 🔗 Yoshimura Blackout: S3 is API. S3 compatible products are not a rare thing.
20:01 🔗 Blackout Oh ok right that's a widely adopted api. Gotcha
20:05 🔗 SketchCow We tend to use "S3-like" but most people in here get it. It's the moving of the term from S3 as a Amazon brand and "S3" as a format.
20:05 🔗 SketchCow There was an FTP company once, after all
20:05 🔗 SketchCow Fuck those guys
20:07 🔗 Frogging TIL
20:09 🔗 Kaz what's the 'boost'?
20:13 🔗 SketchCow Additional 16cpu machine with 10gig connection
20:13 🔗 SketchCow I mean, you and Kenshin are going to assault it to within an inch of its life anyway
20:13 🔗 SketchCow but there it is
20:14 🔗 Kaz whee
20:22 🔗 Blackout Is that an ingest server you're talking about @SketchCow ?
20:22 🔗 godane i thought was just me 'assault it to within an inch of its life' :P
20:23 🔗 SketchCow I think you all can be blamed
20:23 🔗 SketchCow You're all monsters
20:34 🔗 xmc the kleenexing of amazon
20:46 🔗 SketchCow http://prawfsblawg.blogs.com/.a/6a00d8341c6a7953ef0134851907f7970c-500wi
20:48 🔗 Aoede https://www.adobe.com/legal/permissions/trademarks.html
20:48 🔗 xmc Aoede: what?
20:49 🔗 Aoede Adobe has same problem with trademarks
20:49 🔗 xmc oh
20:49 🔗 powerKitt Specifically, with the usage of "photoshop" to mean "edit an image"
20:49 🔗 Aoede "Correct: The image was enhanced using Adobe® Photoshop® software."
20:50 🔗 Aoede " Incorrect: The image was photoshopped."
20:50 🔗 powerKitt "Incorrect: My hobby is photoshopping.:
20:50 🔗 Blackout I love that
20:50 🔗 SketchCow Adobe: OUR NEW BARN DOOR IS GOING TO COMPLETELY CONTAIN THE ESCAPED HORSE
20:50 🔗 Blackout Good luck Adobe
20:50 🔗 powerKitt "Incorrect: The photoshop pokes fun at the Senator."
20:50 🔗 xmc is it better or worse if i call it a shoop
20:51 🔗 SketchCow Better
20:51 🔗 xmc gr8
20:52 🔗 powerKitt shoop the woop
20:54 🔗 maelstrom has joined #archiveteam
21:01 🔗 SketchCow Postimage guy gave me his skype.
21:01 🔗 SketchCow We'll talk
21:10 🔗 Blackout How do you set max rsync jobs with run-pipeline?
21:12 🔗 Start has quit IRC (Read error: Connection reset by peer)
21:13 🔗 Start has joined #archiveteam
21:19 🔗 Nemo_bis Blackout: that's just a legal safeguard for the trademark
21:20 🔗 Nemo_bis Just like kleenex tries not to lose the trademark due to the word becoming a common noun
21:21 🔗 Nemo_bis Wikimedia Foundation makes the same stupid request for that reason. </endlegalintermezzo>
21:24 🔗 xmc woop woop woop off-topic siren
21:27 🔗 HCross2 SketchCow: are things still bumpy? 40Mbps up atm, 20tb to shift
21:56 🔗 BlueMaxim has joined #archiveteam
22:03 🔗 SketchCow Yes
22:16 🔗 db48x has joined #archiveteam
22:20 🔗 maelstrom has quit IRC (Remote host closed the connection)
22:29 🔗 RichardG_ has joined #archiveteam
22:29 🔗 RichardG has quit IRC (Ping timeout: 370 seconds)
22:33 🔗 maelstrom has joined #archiveteam
22:35 🔗 maelstrom has quit IRC (Client Quit)
22:51 🔗 SketchCow https://pbs.twimg.com/media/CwIL6ezWAAAC0id.jpg:large
22:51 🔗 SketchCow Everyone got that?
22:52 🔗 Blackout Nice
22:54 🔗 powerKitt has quit IRC (Ping timeout: 268 seconds)
23:03 🔗 xmc hrmph
23:07 🔗 atomotic has joined #archiveteam
23:13 🔗 JW_work That's a lot of broken links, though...
23:16 🔗 JW_work more detail: https://www.whitehouse.gov/participate/opening-our-data-public
23:16 🔗 JW_work https://www.whitehouse.gov/blog/2016/10/31/digital-transition-how-presidential-transition-works-social-media-age
23:18 🔗 xmc so i'm going to assume they are getting special dispensation from twitter to enable them to migrate tweets from one account to another
23:18 🔗 xmc it would be the only way to keep ids and timestamps reasonably the same, which is necessary for any archival at all amio
23:18 🔗 xmc *imo
23:19 🔗 xmc but we should def throw a scraper or two at them
23:20 🔗 joepie91 https://www.reddit.com/r/trackers/comments/5aew97/sciencehd_says_farewell_on_november_31/
23:21 🔗 joepie91 apparently big private(-ish?) torrent tracker closing with sciencey(?) stuff
23:21 🔗 joepie91 enabled site-wide freeleech until shutdown
23:21 🔗 joepie91 unsure if within scope, I'd imagine there's a lot of rare materials
23:21 🔗 joepie91 sounds sciencey, no idea what it really is
23:21 🔗 joepie91 seems it's not free signup though
23:24 🔗 Yoshimura Would need someone with account
23:24 🔗 Yoshimura The applications are closed.
23:24 🔗 Yoshimura https://sciencehd.me/applications.php
23:27 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
23:27 🔗 RichardG has joined #archiveteam
23:29 🔗 bRick5772 has quit IRC (Quit: Leaving.)
23:34 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
23:38 🔗 godane do people not know that there is no november 31st
23:38 🔗 godane thats the second time a closing site has november 31st in the closing post
23:39 🔗 xmc yeah! there was one last week, too

irclogger-viewer