Time |
Nickname |
Message |
00:16
π
|
|
zout has joined #archiveteam |
00:21
π
|
|
Aranje has joined #archiveteam |
00:44
π
|
zout |
is anybody actively tackling archive.is? there's a wiki page for it, but that seems to be the extent. |
00:47
π
|
zout |
unfortunately they seem to have removed the "all domains" listing, and the original search tool has been replaced with Google Custom Search. |
00:52
π
|
zout |
the URLs are 31 bits long which is too much for an exhaustive search. |
00:56
π
|
arkiver |
I'm concerned about archive.is |
00:57
π
|
arkiver |
It has a lot of data, and it looks like it's run by only one person] |
00:57
π
|
zout |
I'm going to enumerate the top 1M domains from alexa's list, pending any better ideas. |
00:57
π
|
arkiver |
Please do |
00:57
π
|
arkiver |
We won't start a project for it though, since it's not in danger |
00:58
π
|
arkiver |
joepie91: bzc6p: I'll be fixing nujij tomorrow, the way it's currently being done is too slow |
01:16
π
|
|
BlueMaxim has joined #archiveteam |
01:26
π
|
zout |
running, though their host is awful slow to return results. |
01:51
π
|
|
rchrch has quit IRC (Ping timeout: 244 seconds) |
01:59
π
|
|
rchrch has joined #archiveteam |
02:07
π
|
|
dashcloud has joined #archiveteam |
02:15
π
|
|
WinterFox has joined #archiveteam |
02:32
π
|
|
JesseW has joined #archiveteam |
02:35
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:35
π
|
|
alembic has joined #archiveteam |
03:14
π
|
|
SirCmpwn has quit IRC (Ping timeout: 260 seconds) |
03:21
π
|
|
filippo__ has quit IRC (Ping timeout: 244 seconds) |
03:25
π
|
|
filippo__ has joined #archiveteam |
03:34
π
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
03:47
π
|
|
Ymgve has quit IRC () |
04:11
π
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:19
π
|
|
Sk1d has joined #archiveteam |
04:33
π
|
SketchCow |
gmane knows to contact me |
04:33
π
|
SketchCow |
But he also I think did the "stop or I'll shoot" and got some money to go on |
04:34
π
|
|
Ymgve has joined #archiveteam |
04:34
π
|
JesseW |
well, for the news-server part, yes -- but the web interface is still down |
04:38
π
|
zout |
there's an awful lot of item URLs on archive.is, and I'm surely not getting all of them. |
04:40
π
|
JesseW |
"item URLs"? |
04:40
π
|
zout |
individual archives pages. |
04:49
π
|
JesseW |
zout: still not understanding you |
04:50
π
|
JesseW |
archives of what? |
04:50
π
|
JesseW |
are you trying to mirror archive.is? |
04:52
π
|
zout |
JesseW: I'm enumerating as many of the archives on archive.is as I can to gauge feasibility. |
04:52
π
|
zout |
just discovering the URLs, not downloading the content itself. |
05:23
π
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
05:42
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:27
π
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
07:04
π
|
|
nicolas17 has quit IRC (Quit: U+1F634) |
07:05
π
|
|
Honno has joined #archiveteam |
07:25
π
|
|
tomwsmf has quit IRC (Read error: Operation timed out) |
09:20
π
|
marvinw |
FYI http://googleappsdeveloper.blogspot.com/2015/08/deprecating-web-hosting-support-in.html |
09:57
π
|
|
RichardG has joined #archiveteam |
10:16
π
|
|
GLaDOS has quit IRC (Oh crap, I died.) |
10:34
π
|
|
AlexLehm has joined #archiveteam |
10:51
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:52
π
|
|
tomaspark has quit IRC (Ping timeout: 255 seconds) |
11:17
π
|
|
SirCmpwn has joined #archiveteam |
11:46
π
|
|
VADemon has joined #archiveteam |
12:08
π
|
|
Jeroen__u has joined #archiveteam |
12:43
π
|
|
morbus_ has joined #archiveteam |
12:44
π
|
|
Morbus has quit IRC (Read error: Operation timed out) |
12:47
π
|
Jeroen__u |
Hey, just started a Warrior using VirtualBox and selected a project, but I don't think that it is actually doing anything. It is stuck on "The warrior is beginning work on a project." |
13:03
π
|
Jeroen__u |
Sorry, wrong channel, going to #Warrior. |
13:20
π
|
|
GLaDOS has joined #archiveteam |
13:45
π
|
|
WinterFox has quit IRC (Ping timeout: 501 seconds) |
13:47
π
|
|
ravetcofx has quit IRC (Ping timeout: 501 seconds) |
14:05
π
|
|
dashcloud has joined #archiveteam |
14:14
π
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
14:38
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
14:41
π
|
|
dashcloud has joined #archiveteam |
14:46
π
|
|
dashcloud has quit IRC (Remote host closed the connection) |
15:46
π
|
|
AlexLehm has quit IRC (Remote host closed the connection) |
16:22
π
|
SketchCow |
Got an alert Bioware forums locked and deleted soon |
16:37
π
|
|
JesseW has joined #archiveteam |
16:41
π
|
|
metalcamp has joined #archiveteam |
16:54
π
|
JesseW |
zout: oops, sorry I missed where you explained what you were doing (now read through the scrollback). Good idea, thank you for doing it. |
16:57
π
|
Frogging |
SketchCow: How soon? The official date was October or such |
16:59
π
|
|
VADemon has joined #archiveteam |
17:08
π
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
17:11
π
|
|
kristian_ has joined #archiveteam |
17:15
π
|
|
ndiddy has joined #archiveteam |
17:36
π
|
joepie91 |
advance notice: Imgur may be at some amount of risk: https://www.reddit.com/r/undelete/comments/4zx28b/imgur_removed_the_infamous_comcast_swastika_from/d6zpksd?context=1 |
17:36
π
|
joepie91 |
probably somewhat longer-term, but bad news for its longevity regardless, if that goes through |
17:36
π
|
|
metalcamp has joined #archiveteam |
17:41
π
|
Frogging |
I don't think banning it from /r/pics in itself would really matter. But Imgur are starting to "crack" around the edges, and they're huge and important, so they're definitely something to watch closely |
17:42
π
|
joepie91 |
Frogging: it would, they get a ton of traffic from there |
17:43
π
|
Frogging |
yeah? hm, it's mostly hotlinks though surely |
17:44
π
|
joepie91 |
Frogging: don't think so |
17:44
π
|
joepie91 |
and even if it were, everybody recognizes an imgur link when they see it |
17:45
π
|
joepie91 |
if /r/pics were to be full of otherhost.com, that's where users would flock to over time |
17:45
π
|
Frogging |
yeah, true |
17:47
π
|
joepie91 |
so especially given the importance of imgur, I think we should treat this as an early warning sign, especially since the reasons why imgur might be banned there are also likely to drive users away in other ways |
17:47
π
|
joepie91 |
er |
17:47
π
|
joepie91 |
given the importance and size of imgur * |
17:47
π
|
joepie91 |
it's -not- going to be an easy one to archive |
18:03
π
|
SketchCow |
I think this is weird fringe response |
18:04
π
|
SketchCow |
I think the thing to do with imgur is archive the most popular images |
18:07
π
|
Frogging |
as a start, yes. I wonder if a full grab would even be feasible if shit were to hit the fan, hypothetically |
18:08
π
|
SketchCow |
No, of course not. |
18:08
π
|
SketchCow |
It's got to be in the petabyte range now |
18:08
π
|
Frogging |
yeah :s |
18:09
π
|
|
tomwsmf has joined #archiveteam |
18:10
π
|
SketchCow |
What I WOULD say is that if people wanted to whip all these reddit nerds into some storing frenzy there could be a distributed saving effort |
18:11
π
|
PurpleSym |
Their sitemap would be a good start: https://imgur.com/gallery/sitemap.xml |
18:12
π
|
Frogging |
I am amazed and pleased that this exists. cool |
18:14
π
|
PurpleSym |
Unfortunately redditβs domain listing is disabled: https://www.reddit.com/domain/i.imgur.com/ |
18:15
π
|
Frogging |
maybe there's an API way to do it |
18:16
π
|
joepie91 |
protip: add .json after basically any Reddit URL |
18:16
π
|
joepie91 |
:) |
18:16
π
|
joepie91 |
(it's still disabled for that one though) |
18:17
π
|
Sanqui |
maybe reddit itself would be willing to assist? |
18:17
π
|
Sanqui |
didn't the provide some database dumps in the past? |
18:22
π
|
PurpleSym |
Comments only: https://archive.org/details/2015_reddit_comments_corpus |
18:33
π
|
joepie91 |
https://twitter.com/ServerBear/status/765034545703813121 |
18:33
π
|
joepie91 |
serverbear is dead |
18:33
π
|
alembic |
:0 |
18:33
π
|
joepie91 |
nuked historical hardware and performance stats of hosting providers |
18:33
π
|
joepie91 |
fuckssake |
18:34
π
|
* |
joepie91 is only slightly bitter about this |
18:51
π
|
|
SirCmpwn has quit IRC (Read error: Operation timed out) |
18:56
π
|
|
bRick5772 has joined #archiveteam |
18:56
π
|
|
kristian_ has quit IRC (Leaving) |
19:11
π
|
|
JesseW has quit IRC (Read error: Operation timed out) |
19:44
π
|
|
kristian_ has joined #archiveteam |
19:46
π
|
|
notjack has joined #archiveteam |
19:46
π
|
notjack |
Hey everyone, great to be here again ;) |
20:05
π
|
Kaz |
Hi, Jack |
20:06
π
|
notjack |
Hey! ;) |
20:24
π
|
|
tomaspark has joined #archiveteam |
20:26
π
|
|
tomaspar1 has joined #archiveteam |
20:32
π
|
|
tomaspark has quit IRC (Quit: ChatZilla 0.9.92 [Firefox 48.0/20160728203720]) |
20:36
π
|
|
SirCmpwn has joined #archiveteam |
21:00
π
|
|
VADemon has quit IRC (Quit: left4dead) |
21:07
π
|
|
dashcloud has joined #archiveteam |
21:15
π
|
arkiver |
Update on the tumblr and fickr projects. I've written some URL agnostic WARC deduplication scripts. Some example WARCs will be uploaded here and send to Internet Archive |
21:15
π
|
arkiver |
To see if they are made correctly (they already play back good), or if anything is missing |
21:16
π
|
arkiver |
If they are good they will be implemented in the flickr and tumblr (and possibly yahooanswers) projects. |
21:16
π
|
arkiver |
Flickr is the first one to start. |
21:17
π
|
arkiver |
CC flickr images will be done first. Over these CC flickr images we're going to do two samples of 100000 images to know what size flickr will be in total |
21:18
π
|
arkiver |
One sample will be with all version of the images and a second sample will be with only the original size ad the size shown on the webpage of an image |
21:18
π
|
arkiver |
From there will decide on what we're going to grab exactly from flickr. |
21:18
π
|
arkiver |
After CC images we're going to have a look at non-CC images and possibly to those too. |
21:19
π
|
arkiver |
That was the little update on where we are at the moment with these projects. |
21:20
π
|
arkiver |
if you have any suggestions or questions regarding the above, please post them |
21:27
π
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
21:48
π
|
|
dashcloud has quit IRC (Ping timeout: 260 seconds) |
22:15
π
|
|
RichardG has quit IRC (Keyboard not found, press F1 to continue) |
22:15
π
|
|
RichardG has joined #archiveteam |
22:42
π
|
|
logchfoo0 starts logging #archiveteam at Sun Aug 28 22:42:31 2016 |
22:42
π
|
|
logchfoo0 has joined #archiveteam |
22:46
π
|
|
JonimusP has joined #archiveteam |
22:46
π
|
|
swebb sets mode: +o JonimusP |
22:47
π
|
|
jk[[SVP]] is now known as jk[SVP] |
22:47
π
|
|
LordNigh2 is now known as Lord_Nigh |
22:47
π
|
|
Kaz| is now known as Kaz |
22:53
π
|
JesseW |
The update on tumblr & flickr sounds good |
22:56
π
|
|
VonGuard_ has quit IRC (Ping timeout: 260 seconds) |
22:57
π
|
|
AlexLehm has joined #archiveteam |
23:02
π
|
|
kevin has quit IRC (Ping timeout: 260 seconds) |
23:09
π
|
|
VonGuard_ has joined #archiveteam |
23:14
π
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:15
π
|
|
sHATNER has joined #archiveteam |
23:15
π
|
|
espes__ has joined #archiveteam |
23:15
π
|
|
xhdr has joined #archiveteam |
23:15
π
|
|
PepsiMax has joined #archiveteam |
23:15
π
|
|
tephra has joined #archiveteam |
23:26
π
|
|
kevin has joined #archiveteam |
23:28
π
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
23:28
π
|
|
ErkDog has joined #archiveteam |
23:28
π
|
|
ErkDog has quit IRC (Remote host closed the connection!) |
23:29
π
|
|
ErkDog has joined #archiveteam |
23:32
π
|
|
dashcloud has quit IRC (Remote host closed the connection) |
23:41
π
|
|
cadbury_ has joined #archiveteam |
23:43
π
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
23:44
π
|
|
dserodio has quit IRC (Read error: Operation timed out) |
23:45
π
|
|
Zialus has quit IRC (Read error: Operation timed out) |
23:46
π
|
|
ErkDog has joined #archiveteam |
23:46
π
|
|
dserodio has joined #archiveteam |
23:50
π
|
|
Zialus has joined #archiveteam |
23:59
π
|
|
dserodio has quit IRC (Read error: Operation timed out) |