Time |
Nickname |
Message |
00:15
🔗
|
|
Ungstein has joined #archiveteam |
00:28
🔗
|
|
Ungstein has quit IRC (Read error: Connection reset by peer) |
00:30
🔗
|
|
Ungstein has joined #archiveteam |
00:34
🔗
|
|
Ungstein has quit IRC (Read error: Connection reset by peer) |
00:44
🔗
|
|
primus104 has quit IRC (Read error: Connection reset by peer) |
00:46
🔗
|
|
primus104 has joined #archiveteam |
00:47
🔗
|
|
Ungstein has joined #archiveteam |
00:48
🔗
|
|
JesseW has joined #archiveteam |
00:53
🔗
|
|
RichardG has joined #archiveteam |
00:57
🔗
|
|
Ungstein has quit IRC (Read error: Connection reset by peer) |
01:14
🔗
|
|
vitzli has joined #archiveteam |
01:15
🔗
|
SketchCow |
A mass of stuff is leaving FOS, as usual - but this is definitely some stuff that's been sitting around for upwards of a year or two in some cases. |
01:19
🔗
|
* |
JesseW 's gnoming tendencies perk up |
01:20
🔗
|
aaaaaaaaa |
you are going to steal a garden gnome and take pictures of it? |
01:21
🔗
|
|
Ungstein has joined #archiveteam |
01:24
🔗
|
JesseW |
aaaaaaaaa: no, but I might work on categorizing the 93 images in https://commons.wikimedia.org/wiki/Category:Garden_gnomes ... |
01:25
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:41
🔗
|
|
Froggypwn has quit IRC (Ping timeout: 252 seconds) |
01:42
🔗
|
|
Froggypwn has joined #archiveteam |
01:43
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
01:48
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
01:51
🔗
|
wp494 |
plug.dj is in fact going dead: http://blog.plug.dj/2015/09/its-time-to-say-goodbye/ |
01:55
🔗
|
xmc |
they're actively dead |
02:20
🔗
|
|
phuzion has joined #archiveteam |
02:42
🔗
|
|
Sue_ has quit IRC (Remote host closed the connection) |
02:59
🔗
|
|
phuzion has quit IRC (Read error: Operation timed out) |
03:05
🔗
|
|
robink has quit IRC (Ping timeout: 492 seconds) |
03:09
🔗
|
|
robink has joined #archiveteam |
03:40
🔗
|
|
JesseW has joined #archiveteam |
03:44
🔗
|
|
Ungstein has quit IRC (Quit: Leaving.) |
03:50
🔗
|
|
phuzion has joined #archiveteam |
03:57
🔗
|
|
Ungstein has joined #archiveteam |
04:10
🔗
|
|
Emcy has joined #archiveteam |
04:15
🔗
|
|
zenguy_pc has quit IRC (Read error: Connection reset by peer) |
04:16
🔗
|
|
Cr33per83 has joined #archiveteam |
04:21
🔗
|
Cr33per83 |
Hey, in my Warrior, when I try to select Google Code to work on, it says (I forget what it says, something about it was unable to get tasks and that it would retry in a few seconds, I know helpful) |
04:22
🔗
|
Cr33per83 |
But when I select another project it works. Is this just me, or what? |
04:22
🔗
|
yipdw_ |
it means that project has no tasks available for you, please retry again later |
04:23
🔗
|
|
Ungstein has quit IRC (Read error: Connection reset by peer) |
04:23
🔗
|
Cr33per83 |
But I already did that. |
04:23
🔗
|
aaaaaaaaa |
google code hasn't started yet |
04:24
🔗
|
aaaaaaaaa |
i believe it even says so on the description |
04:24
🔗
|
Cr33per83 |
What do you mean? |
04:24
🔗
|
Cr33per83 |
I don't recall seeing that on the wiki page... |
04:24
🔗
|
yipdw_ |
http://tracker.archiveteam.org/googlecode/ |
04:24
🔗
|
yipdw_ |
no tasks in the todo queue, so nothing to do |
04:25
🔗
|
|
Ungstein has joined #archiveteam |
04:28
🔗
|
Cr33per83 |
But I think that Google Code is the only project on that list that is definitely closing soon, so thus should be the most important... |
04:30
🔗
|
yipdw_ |
it went read-only in August 2015, the source control systems are online until January 2016, and many things are moving to other systems |
04:30
🔗
|
yipdw_ |
there is much time available |
04:30
🔗
|
yipdw_ |
if you would like to help, one thing you can do is identify inactive but referenced repositories |
04:30
🔗
|
aaaaaaaaa |
The code to actually grab the data isn't quite ready yet. But our top men are on that. |
04:31
🔗
|
Cr33per83 |
If I copypaste one here will it automatically be saved? |
04:31
🔗
|
yipdw_ |
no |
04:32
🔗
|
|
zenguy_pc has joined #archiveteam |
04:32
🔗
|
Cr33per83 |
Where can I add URLs? |
04:32
🔗
|
yipdw_ |
if you have known repositories try adding an issue at start a list in a pastebin or gist or something and |
04:32
🔗
|
yipdw_ |
oops |
04:32
🔗
|
yipdw_ |
if you have known repositories try adding an issue at https://github.com/ArchiveTeam/googlecode-items |
04:32
🔗
|
yipdw_ |
someone will triage it |
04:32
🔗
|
bentpins |
yipdw_: What do you mean inactive but referenced? |
04:33
🔗
|
bentpins |
referenced by what? |
04:33
🔗
|
yipdw_ |
bentpins: maintainer hasn't moved it but there are clearly people using it |
04:33
🔗
|
bentpins |
ah gotcha |
04:34
🔗
|
yipdw_ |
IMO there is more value in copying those than there is in copying active projcets |
04:34
🔗
|
Cr33per83 |
Wait, people using it? What about projects that are mostly abandoned and not very much used? |
04:34
🔗
|
* |
yipdw_ sighs |
04:35
🔗
|
yipdw_ |
I'm not asking you to get a precise reference count |
04:35
🔗
|
yipdw_ |
the abandoned one is fine too |
04:36
🔗
|
Cr33per83 |
I'm not trying to, but there are some projects that are kinda the opposite of "but there are clearly people using it" |
04:36
🔗
|
aaaaaaaaa |
The goal is to get all of them, but that doesn't mean that some aren't more important than others. right now, the more the better. |
04:36
🔗
|
yipdw_ |
then add them to the list |
04:37
🔗
|
aaaaaaaaa |
Just create a list of ones you want and link to it on the wiki or github issues page |
04:37
🔗
|
Cr33per83 |
Okay, thanks. Glad I can help. |
04:40
🔗
|
|
Ungstein has quit IRC (Read error: Connection reset by peer) |
04:40
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:41
🔗
|
Cr33per83 |
Should I NOT provide links to projects that have migrated? |
04:41
🔗
|
Cr33per83 |
I'm guessing yes. |
04:42
🔗
|
Cr33per83 |
*been migrated to another service |
04:44
🔗
|
yipdw_ |
I mean you can but it'd be nice to know that anyway |
04:44
🔗
|
yipdw_ |
if a project has been migrated and has a canonical home elsewhere, then it's still alive and isn't as huge a loss if we don't get it |
04:45
🔗
|
Cr33per83 |
Nice to know what? The URL? |
04:46
🔗
|
Cr33per83 |
To know if it's been migrated? (ie put the URL and then say it was migrated)? |
04:47
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
04:48
🔗
|
yipdw_ |
whichever is fine so long as we can look at it later and understand the intent |
04:50
🔗
|
Cr33per83 |
Just to be sure, you WANT me to, if I have one, put links to migrated projects? In particular, one that hosts downloads for a particular project where the source has been moved elsewhere? |
04:50
🔗
|
|
yipdw_ is now known as yipdw |
04:53
🔗
|
|
JesseW has joined #archiveteam |
04:58
🔗
|
|
Ungstein has joined #archiveteam |
05:08
🔗
|
|
Ungstein1 has joined #archiveteam |
05:09
🔗
|
|
Ungstein has quit IRC (Ping timeout: 252 seconds) |
05:13
🔗
|
|
Cr33per83 has quit IRC (Quit: Leaving) |
05:22
🔗
|
|
Sue_ has joined #archiveteam |
05:23
🔗
|
|
Ungstein1 has quit IRC (Read error: Connection reset by peer) |
05:25
🔗
|
|
Ungstein has joined #archiveteam |
05:34
🔗
|
|
xk_id has joined #archiveteam |
06:01
🔗
|
|
BlueMaxim has joined #archiveteam |
06:12
🔗
|
SketchCow |
-------------------------- |
06:13
🔗
|
SketchCow |
I was given a new Jamendo download script. WHO WANTS IT. |
06:13
🔗
|
SketchCow |
-------------------------- |
06:19
🔗
|
|
Dark_Star has quit IRC (Read error: Operation timed out) |
06:19
🔗
|
|
vitzli has joined #archiveteam |
06:24
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
06:45
🔗
|
|
schbirid has joined #archiveteam |
07:08
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
07:10
🔗
|
|
PurpleSym has joined #archiveteam |
07:16
🔗
|
|
xk_id has joined #archiveteam |
07:22
🔗
|
|
primus104 has joined #archiveteam |
07:35
🔗
|
|
atomotic has joined #archiveteam |
07:39
🔗
|
|
primus104 has quit IRC (Leaving.) |
07:54
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
08:12
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
08:27
🔗
|
arkiver |
Cr33per83: just any link to any google code project |
08:33
🔗
|
arkiver |
Start: do you think we need a warrior project for comcast? or will archivebot be able to handle it? |
08:39
🔗
|
|
underscor has joined #archiveteam |
08:40
🔗
|
|
brayden has joined #archiveteam |
08:50
🔗
|
|
primus104 has joined #archiveteam |
09:42
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
09:44
🔗
|
joepie91 |
SketchCow: this sounds relevant to my interests |
09:44
🔗
|
joepie91 |
:P |
09:48
🔗
|
SketchCow |
Sent |
09:49
🔗
|
|
underscor has quit IRC (Ping timeout: 483 seconds) |
09:50
🔗
|
joepie91 |
thanks |
09:52
🔗
|
SketchCow |
Feel free to do the uploading |
09:52
🔗
|
SketchCow |
I'll happily put it in the collection |
09:55
🔗
|
joepie91 |
SketchCow: PM for a moment |
10:11
🔗
|
|
superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) |
10:11
🔗
|
|
Emcy_ has joined #archiveteam |
10:14
🔗
|
|
Wyatts has quit IRC (Remote host closed the connection) |
10:15
🔗
|
|
Emcy has quit IRC (Ping timeout: 306 seconds) |
10:15
🔗
|
|
lytv has quit IRC (Ping timeout: 306 seconds) |
10:16
🔗
|
|
Wyatts has joined #archiveteam |
10:17
🔗
|
|
Emcy has joined #archiveteam |
10:17
🔗
|
|
Emcy_ has quit IRC (Ping timeout: 306 seconds) |
10:18
🔗
|
|
lytv has joined #archiveteam |
10:21
🔗
|
|
xk_id has joined #archiveteam |
10:22
🔗
|
|
primus104 has quit IRC (Leaving.) |
10:23
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
10:23
🔗
|
|
xk_id has joined #archiveteam |
11:34
🔗
|
|
xk_id_ has joined #archiveteam |
11:34
🔗
|
|
xk_id has quit IRC (Read error: Connection reset by peer) |
11:40
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:42
🔗
|
|
dashcloud has joined #archiveteam |
12:09
🔗
|
|
atomotic has joined #archiveteam |
12:17
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
13:34
🔗
|
Start |
arkiver: it depends how many sites we discover |
13:34
🔗
|
Start |
warrior might be better for saving everything |
13:37
🔗
|
|
primus104 has joined #archiveteam |
13:39
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
13:43
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
13:50
🔗
|
|
scyther has joined #archiveteam |
13:50
🔗
|
|
Stilett0 is now known as Stiletto |
13:52
🔗
|
|
PurpleSym has quit IRC (Remote host closed the connection) |
14:02
🔗
|
|
primus104 has quit IRC (Leaving.) |
14:12
🔗
|
|
PurpleSym has joined #archiveteam |
14:19
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
14:32
🔗
|
|
Start has joined #archiveteam |
14:41
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:48
🔗
|
|
nertzy has joined #archiveteam |
14:51
🔗
|
|
beardicus has quit IRC (Quit: bye now) |
14:54
🔗
|
|
beardicus has joined #archiveteam |
15:02
🔗
|
|
atomotic has joined #archiveteam |
15:13
🔗
|
|
glyph_ has joined #archiveteam |
15:13
🔗
|
glyph_ |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
15:13
🔗
|
sep332 |
glyph_: yahoosucks |
15:15
🔗
|
glyph_ |
Just a heads up it looks like Myvideo.de is going to be closing VERY soon. Just got a message reading : |
15:15
🔗
|
glyph_ |
Lieber MyVideo-Nutzer, wir werden übersichtlicher und besser - MyVideo möchte Dir in Zukunft nur noch lizensierte Inhalte zeigen. Ab dem 01.09.2015 dürfen nur noch Premium Partner von MyVideo Videos hochladen. Wie werde ich Premium Partner von MyVideo? Für nicht Premium Partner werden alle Videos am 30.09.2015 automatisch gelöscht. |
15:16
🔗
|
arkiver |
tomorrow? |
15:16
🔗
|
glyph_ |
which translates (according to Google) as Dear MyVideo users, we are clearer and better - MyVideo just want to show you in the future licensed content. As of 01.09.2015 Premium Partner of MyVideo may only upload videos. How do I become a Premium Partner of MyVideo? For non-Premium Partner All videos are automatically deleted on 30/09/2015. |
15:16
🔗
|
glyph_ |
I'm pretty sure, and I'm guessing that's tomorrow Eastern European Time Zone |
15:16
🔗
|
arkiver |
they sent that today???? |
15:17
🔗
|
glyph_ |
I didn't see a public notice anywhere, until I tried to upload a video and that popped up |
15:17
🔗
|
|
Start has joined #archiveteam |
15:18
🔗
|
arkiver |
I'afraid we can't save it |
15:18
🔗
|
arkiver |
not if it's shutting down tomorrow |
15:19
🔗
|
|
JesseW has joined #archiveteam |
15:19
🔗
|
glyph_ |
I suspect, just wanted to toss that out there |
15:23
🔗
|
glyph_ |
On the upside, the Internet Archive has 185,963 URLs captured for the myvideo.de domain, which is something |
15:23
🔗
|
arkiver |
They very likely did not capture the videos |
15:23
🔗
|
glyph_ |
I know they didn't |
15:24
🔗
|
|
glyph_ has left |
15:33
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
15:42
🔗
|
|
xk_id_ has quit IRC (Remote host closed the connection) |
15:42
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:47
🔗
|
|
Swolepeng has joined #archiveteam |
15:47
🔗
|
Swolepeng |
Sup y'all |
15:50
🔗
|
|
Atom__ has quit IRC (Ping timeout: 483 seconds) |
15:54
🔗
|
|
Swolepeng has quit IRC (Ping timeout: 240 seconds) |
15:55
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
15:56
🔗
|
|
Ymgve has quit IRC () |
15:57
🔗
|
|
Start has joined #archiveteam |
15:58
🔗
|
|
atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) |
16:07
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:09
🔗
|
|
Dark_Star has joined #archiveteam |
16:12
🔗
|
slang |
how many videos on this site are considered "non-premium"? |
16:14
🔗
|
|
beardicus has quit IRC (Quit: bye now) |
16:16
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
16:30
🔗
|
|
underscor has joined #archiveteam |
16:41
🔗
|
|
philpem has joined #archiveteam |
16:46
🔗
|
|
atomotic has joined #archiveteam |
16:46
🔗
|
|
atomotic has quit IRC (Client Quit) |
16:48
🔗
|
|
beardicus has joined #archiveteam |
16:49
🔗
|
|
pgoetz has joined #archiveteam |
16:52
🔗
|
|
primus104 has joined #archiveteam |
16:53
🔗
|
|
pgoetz has quit IRC (Remote host closed the connection) |
16:59
🔗
|
|
Ymgve has joined #archiveteam |
17:20
🔗
|
|
Start has joined #archiveteam |
17:24
🔗
|
|
atomotic has joined #archiveteam |
17:28
🔗
|
Dark_Star |
someone should do a grab of all the T11.org documents (starting point http://www.t11.org/t11/docreg.nsf/indproj?OpenView or http://www.t11.org/t11/docreg.nsf). Many of the links are broken or need slight editing (like inserting '/pub/'). It's all a mess over there |
17:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:47
🔗
|
|
underscor has quit IRC (Read error: Operation timed out) |
18:01
🔗
|
|
Zandro has left Leaving |
18:02
🔗
|
|
scyther has joined #archiveteam |
18:05
🔗
|
arkiver |
Right now projects are not very well sorted. |
18:05
🔗
|
arkiver |
If you have a website you'd like to have archived, please add it under the Proposed projects section: http://archiveteam.org/index.php?title=Current_Projects |
18:07
🔗
|
arkiver |
----------------------------------------------------------- |
18:07
🔗
|
arkiver |
Proposing a future project? |
18:07
🔗
|
arkiver |
List it under "Proposed projects": |
18:07
🔗
|
arkiver |
http://archiveteam.org/index.php?title=Current_Projects |
18:07
🔗
|
arkiver |
----------------------------------------------------------- |
18:07
🔗
|
|
aaaaaaaaa has joined #archiveteam |
18:07
🔗
|
arkiver |
Please remind people of that when they post something here |
18:13
🔗
|
swebb |
I'm ready for an rsync target for the soundcloud crawl. Can someone set me up? |
18:13
🔗
|
aaaaaaaaa |
and give that man ops |
18:18
🔗
|
|
underscor has joined #archiveteam |
18:26
🔗
|
|
SketchCow sets mode: +ooo bai Baljem brayden |
18:26
🔗
|
|
SketchCow sets mode: +oooo sep332 swebb yipdw aaaaaaaaa |
18:26
🔗
|
|
swebb sets mode: +o DFJustin |
18:26
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
18:26
🔗
|
|
swebb sets mode: +o antomatic |
18:26
🔗
|
|
swebb sets mode: +o edsu_ |
18:26
🔗
|
|
swebb sets mode: +o ersi |
18:27
🔗
|
|
aaaaaaaaa sets mode: +oo chfoo chfoo- |
18:55
🔗
|
|
beardicus has quit IRC (Read error: Connection reset by peer) |
19:00
🔗
|
|
beardicus has joined #archiveteam |
19:00
🔗
|
|
atlogbot has quit IRC (Quit: atlogbot) |
19:18
🔗
|
|
xk_id has joined #archiveteam |
19:36
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
19:53
🔗
|
|
Start has joined #archiveteam |
19:54
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:00
🔗
|
|
underscor has quit IRC (Remote host closed the connection) |
20:03
🔗
|
|
SimpBrain has joined #archiveteam |
20:03
🔗
|
|
habi has joined #archiveteam |
20:05
🔗
|
|
habi has left |
20:24
🔗
|
|
garyrh has quit IRC (Remote host closed the connection) |
20:30
🔗
|
|
wm_ has quit IRC (Ping timeout: 240 seconds) |
20:36
🔗
|
|
wm_ has joined #archiveteam |
20:37
🔗
|
|
RedType has quit IRC (Quit: leaving) |
20:39
🔗
|
|
K4k has joined #archiveteam |
20:42
🔗
|
|
RedType has joined #archiveteam |
21:04
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
21:05
🔗
|
|
garyrh has joined #archiveteam |
21:07
🔗
|
|
K4k has quit IRC (Read error: Operation timed out) |
21:07
🔗
|
|
K4k has joined #archiveteam |
21:07
🔗
|
|
slyphic is now known as slyphic|a |
21:13
🔗
|
|
K4k has quit IRC (Ping timeout: 258 seconds) |
21:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
21:20
🔗
|
|
PurpleSym has quit IRC (Remote host closed the connection) |
21:21
🔗
|
|
SimpBrain has quit IRC (Leaving) |
21:24
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
21:25
🔗
|
|
zenguy_pc has joined #archiveteam |
21:37
🔗
|
marvinw |
https://torrentfreak.com/rutracker-says-copyright-holders-can-moderate-its-torrents-150929/ |
21:37
🔗
|
|
mksplg has quit IRC (WeeChat 1.0.1) |
21:40
🔗
|
Spring |
chances the studios won't just remove most torrents? Seems like a giant leap of faith. |
21:40
🔗
|
Spring |
that is, removing torrents that they don't hold copyright to. They would be mods after all. |
21:41
🔗
|
|
mksplg has joined #archiveteam |
21:43
🔗
|
|
mksplg has quit IRC (Client Quit) |
21:47
🔗
|
|
mksplg has joined #archiveteam |
21:53
🔗
|
|
mksplg has quit IRC (WeeChat 1.0.1) |
21:56
🔗
|
|
aaaaaaaa_ has joined #archiveteam |
21:56
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
21:56
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
22:03
🔗
|
arkiver |
swebb: a soundcloud grab? |
22:04
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
22:05
🔗
|
arkiver |
I didn't know we want to grab soundcloud |
22:06
🔗
|
Sanqui |
https://twitter.com/ImageShack/status/648856178273837056 |
22:06
🔗
|
Sanqui |
https://imageshack.us/ |
22:07
🔗
|
Sanqui |
imageshack is not 'stable' by any definition of the word |
22:08
🔗
|
swebb |
arkiver: Yea. |
22:09
🔗
|
arkiver |
so, where are the scripts? |
22:09
🔗
|
swebb |
I've not checked them in anywhere yet. |
22:09
🔗
|
arkiver |
you want to make this a warrior project? |
22:09
🔗
|
swebb |
not yet. |
22:09
🔗
|
arkiver |
is that a yes or a no? :P |
22:09
🔗
|
swebb |
I've got some good resources at home and a fast internet connection. I'll see how well I can do by myself. |
22:10
🔗
|
arkiver |
hmm |
22:10
🔗
|
arkiver |
do you think I can have a look at the scripts? just making sure everything is downloaded |
22:10
🔗
|
arkiver |
:) |
22:11
🔗
|
arkiver |
swebb: what will be the total size do you think? |
22:11
🔗
|
swebb |
I can tell you what I'm grabbing. metadata for user profile, tracks and track comments. Images for user profile and tracks. MP3s for tracks. I'm using the scdl utility to download the MP3s, but the rest is hand-written stuff. |
22:11
🔗
|
arkiver |
so no warcs? |
22:12
🔗
|
swebb |
Not sure. I've got about 35M user profiles scraped to-date and I started downloading the users yesterday. after about 550 users, I had 50G of data after compression. |
22:12
🔗
|
swebb |
no, no warcs. |
22:12
🔗
|
arkiver |
hmm, I think if we're going to grab this it should be in warcs, but that's just my opinion |
22:12
🔗
|
swebb |
I don't know how to make warcs. |
22:13
🔗
|
aaaaaaaaa |
Sanqui: judging by all the yellow frogs I see on the internet, they haven't been "stable" for a long time |
22:13
🔗
|
swebb |
other than using the IA crawler. |
22:13
🔗
|
swebb |
I'm not grabbing web pages. Just json and raw data. |
22:13
🔗
|
swebb |
Soundcloud is a single-page-app, so it's not easy to crawl using a web crawler and it won't be viewable in the wayback machine. |
22:14
🔗
|
arkiver |
it ight be viewable in the future |
22:14
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
22:14
🔗
|
Sanqui |
aaaaaaaaa: yep |
22:14
🔗
|
arkiver |
wayback is always improving |
22:14
🔗
|
Sanqui |
I don't know what can be done though |
22:14
🔗
|
|
xk_id has joined #archiveteam |
22:14
🔗
|
arkiver |
if you'd like I can write some scripts to grab the stuff in warcs |
22:15
🔗
|
arkiver |
and some scripts to download the mp3's from IA while they're not playable yet in the wayback machine |
22:15
🔗
|
arkiver |
that way the mp3's canjust be embedded with a link to the url in the wayback machine |
22:16
🔗
|
arkiver |
we save the request and response headers and make future playback possible in the wayback machine |
22:19
🔗
|
arkiver |
and I think we can make it easier that way to find the mp3 files people are looking for |
22:21
🔗
|
arkiver |
^ well depending on how you look at it |
22:23
🔗
|
arkiver |
though the wayback machine indeed totally sucks at playback of the data |
22:23
🔗
|
|
Start has joined #archiveteam |
22:24
🔗
|
arkiver |
swebb: maybe grab everything in warcs, also grab the metadata in non-warcs and have links to the mp3 files in the wayback machine for people to download? |
22:24
🔗
|
arkiver |
We can write a search index so people can search on name, artist and such things to find what they're looking for |
22:25
🔗
|
arkiver |
we then have all the data as warcs and have support for the download of mp3s |
22:29
🔗
|
|
gibigiana has quit IRC (Read error: Operation timed out) |
22:30
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
22:30
🔗
|
|
RichardG_ has joined #archiveteam |
22:30
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
22:31
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
22:33
🔗
|
|
lytv has quit IRC (Read error: Operation timed out) |
22:34
🔗
|
|
gibigiana has joined #archiveteam |
22:34
🔗
|
|
odie5533 has quit IRC (Read error: Operation timed out) |
22:34
🔗
|
|
slyphic|a has quit IRC (Read error: Operation timed out) |
22:34
🔗
|
|
Smiley has joined #archiveteam |
22:35
🔗
|
|
ohhdemgir has quit IRC (Read error: Operation timed out) |
22:35
🔗
|
|
dserodio has quit IRC (Read error: Operation timed out) |
22:35
🔗
|
|
lytv has joined #archiveteam |
22:35
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
22:35
🔗
|
|
Infreq has quit IRC (Read error: Operation timed out) |
22:36
🔗
|
|
swebb has quit IRC (Read error: Operation timed out) |
22:36
🔗
|
|
Infreq has joined #archiveteam |
22:36
🔗
|
|
Laverne has quit IRC (Read error: Operation timed out) |
22:38
🔗
|
|
oli has quit IRC (Read error: Operation timed out) |
22:38
🔗
|
|
chazchaz has quit IRC (Read error: Operation timed out) |
22:38
🔗
|
|
Spring has quit IRC (Ping timeout: 369 seconds) |
22:39
🔗
|
|
Spring has joined #archiveteam |
22:39
🔗
|
|
oli has joined #archiveteam |
22:39
🔗
|
|
odie5533 has joined #archiveteam |
22:40
🔗
|
arkiver |
I paused blingee, something weird is going on there, will have a look tomorrow |
22:41
🔗
|
arkiver |
acutally, going on at thingiverse too |
22:41
🔗
|
garyrh |
What's weird? |
22:42
🔗
|
arkiver |
no items are reporting done |
22:42
🔗
|
arkiver |
and items are still being handed out |
22:43
🔗
|
arkiver |
nvm about thingiverse |
22:44
🔗
|
garyrh |
For blingee, I see a bunch of "Server returned 0. Sleeping." |
22:45
🔗
|
garyrh |
and 5xx codes too |
22:45
🔗
|
arkiver |
I see |
22:46
🔗
|
arkiver |
will investigate and fix that later |
22:46
🔗
|
* |
arkiver is afk for the night |
22:52
🔗
|
|
zenguy_pc has joined #archiveteam |
22:59
🔗
|
|
chazchaz has joined #archiveteam |
23:00
🔗
|
|
Laverne has joined #archiveteam |
23:01
🔗
|
|
swebb has joined #archiveteam |
23:02
🔗
|
|
dserodio has joined #archiveteam |
23:04
🔗
|
|
slyphic has joined #archiveteam |
23:14
🔗
|
|
Dark_Star has quit IRC (Read error: Operation timed out) |
23:18
🔗
|
|
Dark_Star has joined #archiveteam |
23:29
🔗
|
|
robink has quit IRC (Ping timeout: 492 seconds) |
23:31
🔗
|
|
robink has joined #archiveteam |
23:32
🔗
|
|
xmc has quit IRC (Ping timeout: 483 seconds) |
23:38
🔗
|
|
xmc has joined #archiveteam |
23:45
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
23:48
🔗
|
|
dashcloud has joined #archiveteam |
23:48
🔗
|
|
xmc has quit IRC (Ping timeout: 483 seconds) |
23:50
🔗
|
|
xmc has joined #archiveteam |
23:54
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |