#archiveteam 2016-08-28,Sun

↑back Search

Time Nickname Message
00:16 πŸ”— zout has joined #archiveteam
00:21 πŸ”— Aranje has joined #archiveteam
00:44 πŸ”— zout is anybody actively tackling archive.is? there's a wiki page for it, but that seems to be the extent.
00:47 πŸ”— zout unfortunately they seem to have removed the "all domains" listing, and the original search tool has been replaced with Google Custom Search.
00:52 πŸ”— zout the URLs are 31 bits long which is too much for an exhaustive search.
00:56 πŸ”— arkiver I'm concerned about archive.is
00:57 πŸ”— arkiver It has a lot of data, and it looks like it's run by only one person]
00:57 πŸ”— zout I'm going to enumerate the top 1M domains from alexa's list, pending any better ideas.
00:57 πŸ”— arkiver Please do
00:57 πŸ”— arkiver We won't start a project for it though, since it's not in danger
00:58 πŸ”— arkiver joepie91: bzc6p: I'll be fixing nujij tomorrow, the way it's currently being done is too slow
01:16 πŸ”— BlueMaxim has joined #archiveteam
01:26 πŸ”— zout running, though their host is awful slow to return results.
01:51 πŸ”— rchrch has quit IRC (Ping timeout: 244 seconds)
01:59 πŸ”— rchrch has joined #archiveteam
02:07 πŸ”— dashcloud has joined #archiveteam
02:15 πŸ”— WinterFox has joined #archiveteam
02:32 πŸ”— JesseW has joined #archiveteam
02:35 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
02:35 πŸ”— alembic has joined #archiveteam
03:14 πŸ”— SirCmpwn has quit IRC (Ping timeout: 260 seconds)
03:21 πŸ”— filippo__ has quit IRC (Ping timeout: 244 seconds)
03:25 πŸ”— filippo__ has joined #archiveteam
03:34 πŸ”— ndiddy has quit IRC (Read error: Connection reset by peer)
03:47 πŸ”— Ymgve has quit IRC ()
04:11 πŸ”— Sk1d has quit IRC (Ping timeout: 194 seconds)
04:19 πŸ”— Sk1d has joined #archiveteam
04:33 πŸ”— SketchCow gmane knows to contact me
04:33 πŸ”— SketchCow But he also I think did the "stop or I'll shoot" and got some money to go on
04:34 πŸ”— Ymgve has joined #archiveteam
04:34 πŸ”— JesseW well, for the news-server part, yes -- but the web interface is still down
04:38 πŸ”— zout there's an awful lot of item URLs on archive.is, and I'm surely not getting all of them.
04:40 πŸ”— JesseW "item URLs"?
04:40 πŸ”— zout individual archives pages.
04:49 πŸ”— JesseW zout: still not understanding you
04:50 πŸ”— JesseW archives of what?
04:50 πŸ”— JesseW are you trying to mirror archive.is?
04:52 πŸ”— zout JesseW: I'm enumerating as many of the archives on archive.is as I can to gauge feasibility.
04:52 πŸ”— zout just discovering the URLs, not downloading the content itself.
05:23 πŸ”— RichardG has quit IRC (Ping timeout: 370 seconds)
05:42 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
06:27 πŸ”— Aranje has quit IRC (Quit: Three sheets to the wind)
07:04 πŸ”— nicolas17 has quit IRC (Quit: U+1F634)
07:05 πŸ”— Honno has joined #archiveteam
07:25 πŸ”— tomwsmf has quit IRC (Read error: Operation timed out)
09:20 πŸ”— marvinw FYI http://googleappsdeveloper.blogspot.com/2015/08/deprecating-web-hosting-support-in.html
09:57 πŸ”— RichardG has joined #archiveteam
10:16 πŸ”— GLaDOS has quit IRC (Oh crap, I died.)
10:34 πŸ”— AlexLehm has joined #archiveteam
10:51 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
10:52 πŸ”— tomaspark has quit IRC (Ping timeout: 255 seconds)
11:17 πŸ”— SirCmpwn has joined #archiveteam
11:46 πŸ”— VADemon has joined #archiveteam
12:08 πŸ”— Jeroen__u has joined #archiveteam
12:43 πŸ”— morbus_ has joined #archiveteam
12:44 πŸ”— Morbus has quit IRC (Read error: Operation timed out)
12:47 πŸ”— Jeroen__u Hey, just started a Warrior using VirtualBox and selected a project, but I don't think that it is actually doing anything. It is stuck on "The warrior is beginning work on a project."
13:03 πŸ”— Jeroen__u Sorry, wrong channel, going to #Warrior.
13:20 πŸ”— GLaDOS has joined #archiveteam
13:45 πŸ”— WinterFox has quit IRC (Ping timeout: 501 seconds)
13:47 πŸ”— ravetcofx has quit IRC (Ping timeout: 501 seconds)
14:05 πŸ”— dashcloud has joined #archiveteam
14:14 πŸ”— VADemon has quit IRC (Read error: Connection reset by peer)
14:38 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
14:41 πŸ”— dashcloud has joined #archiveteam
14:46 πŸ”— dashcloud has quit IRC (Remote host closed the connection)
15:46 πŸ”— AlexLehm has quit IRC (Remote host closed the connection)
16:22 πŸ”— SketchCow Got an alert Bioware forums locked and deleted soon
16:37 πŸ”— JesseW has joined #archiveteam
16:41 πŸ”— metalcamp has joined #archiveteam
16:54 πŸ”— JesseW zout: oops, sorry I missed where you explained what you were doing (now read through the scrollback). Good idea, thank you for doing it.
16:57 πŸ”— Frogging SketchCow: How soon? The official date was October or such
16:59 πŸ”— VADemon has joined #archiveteam
17:08 πŸ”— metalcamp has quit IRC (Ping timeout: 244 seconds)
17:11 πŸ”— kristian_ has joined #archiveteam
17:15 πŸ”— ndiddy has joined #archiveteam
17:36 πŸ”— joepie91 advance notice: Imgur may be at some amount of risk: https://www.reddit.com/r/undelete/comments/4zx28b/imgur_removed_the_infamous_comcast_swastika_from/d6zpksd?context=1
17:36 πŸ”— joepie91 probably somewhat longer-term, but bad news for its longevity regardless, if that goes through
17:36 πŸ”— metalcamp has joined #archiveteam
17:41 πŸ”— Frogging I don't think banning it from /r/pics in itself would really matter. But Imgur are starting to "crack" around the edges, and they're huge and important, so they're definitely something to watch closely
17:42 πŸ”— joepie91 Frogging: it would, they get a ton of traffic from there
17:43 πŸ”— Frogging yeah? hm, it's mostly hotlinks though surely
17:44 πŸ”— joepie91 Frogging: don't think so
17:44 πŸ”— joepie91 and even if it were, everybody recognizes an imgur link when they see it
17:45 πŸ”— joepie91 if /r/pics were to be full of otherhost.com, that's where users would flock to over time
17:45 πŸ”— Frogging yeah, true
17:47 πŸ”— joepie91 so especially given the importance of imgur, I think we should treat this as an early warning sign, especially since the reasons why imgur might be banned there are also likely to drive users away in other ways
17:47 πŸ”— joepie91 er
17:47 πŸ”— joepie91 given the importance and size of imgur *
17:47 πŸ”— joepie91 it's -not- going to be an easy one to archive
18:03 πŸ”— SketchCow I think this is weird fringe response
18:04 πŸ”— SketchCow I think the thing to do with imgur is archive the most popular images
18:07 πŸ”— Frogging as a start, yes. I wonder if a full grab would even be feasible if shit were to hit the fan, hypothetically
18:08 πŸ”— SketchCow No, of course not.
18:08 πŸ”— SketchCow It's got to be in the petabyte range now
18:08 πŸ”— Frogging yeah :s
18:09 πŸ”— tomwsmf has joined #archiveteam
18:10 πŸ”— SketchCow What I WOULD say is that if people wanted to whip all these reddit nerds into some storing frenzy there could be a distributed saving effort
18:11 πŸ”— PurpleSym Their sitemap would be a good start: https://imgur.com/gallery/sitemap.xml
18:12 πŸ”— Frogging I am amazed and pleased that this exists. cool
18:14 πŸ”— PurpleSym Unfortunately reddit’s domain listing is disabled: https://www.reddit.com/domain/i.imgur.com/
18:15 πŸ”— Frogging maybe there's an API way to do it
18:16 πŸ”— joepie91 protip: add .json after basically any Reddit URL
18:16 πŸ”— joepie91 :)
18:16 πŸ”— joepie91 (it's still disabled for that one though)
18:17 πŸ”— Sanqui maybe reddit itself would be willing to assist?
18:17 πŸ”— Sanqui didn't the provide some database dumps in the past?
18:22 πŸ”— PurpleSym Comments only: https://archive.org/details/2015_reddit_comments_corpus
18:33 πŸ”— joepie91 https://twitter.com/ServerBear/status/765034545703813121
18:33 πŸ”— joepie91 serverbear is dead
18:33 πŸ”— alembic :0
18:33 πŸ”— joepie91 nuked historical hardware and performance stats of hosting providers
18:33 πŸ”— joepie91 fuckssake
18:34 πŸ”— * joepie91 is only slightly bitter about this
18:51 πŸ”— SirCmpwn has quit IRC (Read error: Operation timed out)
18:56 πŸ”— bRick5772 has joined #archiveteam
18:56 πŸ”— kristian_ has quit IRC (Leaving)
19:11 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
19:44 πŸ”— kristian_ has joined #archiveteam
19:46 πŸ”— notjack has joined #archiveteam
19:46 πŸ”— notjack Hey everyone, great to be here again ;)
20:05 πŸ”— Kaz Hi, Jack
20:06 πŸ”— notjack Hey! ;)
20:24 πŸ”— tomaspark has joined #archiveteam
20:26 πŸ”— tomaspar1 has joined #archiveteam
20:32 πŸ”— tomaspark has quit IRC (Quit: ChatZilla 0.9.92 [Firefox 48.0/20160728203720])
20:36 πŸ”— SirCmpwn has joined #archiveteam
21:00 πŸ”— VADemon has quit IRC (Quit: left4dead)
21:07 πŸ”— dashcloud has joined #archiveteam
21:15 πŸ”— arkiver Update on the tumblr and fickr projects. I've written some URL agnostic WARC deduplication scripts. Some example WARCs will be uploaded here and send to Internet Archive
21:15 πŸ”— arkiver To see if they are made correctly (they already play back good), or if anything is missing
21:16 πŸ”— arkiver If they are good they will be implemented in the flickr and tumblr (and possibly yahooanswers) projects.
21:16 πŸ”— arkiver Flickr is the first one to start.
21:17 πŸ”— arkiver CC flickr images will be done first. Over these CC flickr images we're going to do two samples of 100000 images to know what size flickr will be in total
21:18 πŸ”— arkiver One sample will be with all version of the images and a second sample will be with only the original size ad the size shown on the webpage of an image
21:18 πŸ”— arkiver From there will decide on what we're going to grab exactly from flickr.
21:18 πŸ”— arkiver After CC images we're going to have a look at non-CC images and possibly to those too.
21:19 πŸ”— arkiver That was the little update on where we are at the moment with these projects.
21:20 πŸ”— arkiver if you have any suggestions or questions regarding the above, please post them
21:27 πŸ”— metalcamp has quit IRC (Ping timeout: 244 seconds)
21:48 πŸ”— dashcloud has quit IRC (Ping timeout: 260 seconds)
22:15 πŸ”— RichardG has quit IRC (Keyboard not found, press F1 to continue)
22:15 πŸ”— RichardG has joined #archiveteam
22:42 πŸ”— logchfoo0 starts logging #archiveteam at Sun Aug 28 22:42:31 2016
22:42 πŸ”— logchfoo0 has joined #archiveteam
22:46 πŸ”— JonimusP has joined #archiveteam
22:46 πŸ”— swebb sets mode: +o JonimusP
22:47 πŸ”— jk[[SVP]] is now known as jk[SVP]
22:47 πŸ”— LordNigh2 is now known as Lord_Nigh
22:47 πŸ”— Kaz| is now known as Kaz
22:53 πŸ”— JesseW The update on tumblr & flickr sounds good
22:56 πŸ”— VonGuard_ has quit IRC (Ping timeout: 260 seconds)
22:57 πŸ”— AlexLehm has joined #archiveteam
23:02 πŸ”— kevin has quit IRC (Ping timeout: 260 seconds)
23:09 πŸ”— VonGuard_ has joined #archiveteam
23:14 πŸ”— Honno has quit IRC (Read error: Operation timed out)
23:15 πŸ”— sHATNER has joined #archiveteam
23:15 πŸ”— espes__ has joined #archiveteam
23:15 πŸ”— xhdr has joined #archiveteam
23:15 πŸ”— PepsiMax has joined #archiveteam
23:15 πŸ”— tephra has joined #archiveteam
23:26 πŸ”— kevin has joined #archiveteam
23:28 πŸ”— ErkDog has quit IRC (Read error: Operation timed out)
23:28 πŸ”— ErkDog has joined #archiveteam
23:28 πŸ”— ErkDog has quit IRC (Remote host closed the connection!)
23:29 πŸ”— ErkDog has joined #archiveteam
23:32 πŸ”— dashcloud has quit IRC (Remote host closed the connection)
23:41 πŸ”— cadbury_ has joined #archiveteam
23:43 πŸ”— ErkDog has quit IRC (Read error: Operation timed out)
23:44 πŸ”— dserodio has quit IRC (Read error: Operation timed out)
23:45 πŸ”— Zialus has quit IRC (Read error: Operation timed out)
23:46 πŸ”— ErkDog has joined #archiveteam
23:46 πŸ”— dserodio has joined #archiveteam
23:50 πŸ”— Zialus has joined #archiveteam
23:59 πŸ”— dserodio has quit IRC (Read error: Operation timed out)

irclogger-viewer