| Time |
Nickname |
Message |
|
00:24
🔗
|
godane |
Arkiver2: the only subtitles i found are in dutch |
|
03:42
🔗
|
dx |
hey #archiveteam! bukkit, a minecraft server / modding platform, is dead/dying due to a licensing conflict. the conflicting software is all gone due to DMCA, but their websites might still have plenty of data useful for the rest of the modding community |
|
03:44
🔗
|
yipdw |
dx: we got a copy of the bukkit wiki via archivebot |
|
03:44
🔗
|
yipdw |
if you've got other sites, please feel free to drop by #archivebot |
|
03:44
🔗
|
dx |
neat |
|
03:44
🔗
|
dx |
what about this? http://dev.bukkit.org/bukkit-plugins/ |
|
03:45
🔗
|
dx |
pages in there could be archived through archivebot, but what about plugin .jars? |
|
03:45
🔗
|
dx |
pages would be useful because there's plenty of documentation in that site |
|
03:45
🔗
|
yipdw |
one sec, checking |
|
03:46
🔗
|
yipdw |
if the jars are linked, we can get them |
|
03:47
🔗
|
dx |
hmm, they seem like direct links http://dev.bukkit.org/bukkit-plugins/nametags/files/ |
|
03:49
🔗
|
balrog |
the forums |
|
03:49
🔗
|
balrog |
the plugins would be useful |
|
03:49
🔗
|
balrog |
I have clones of the DMCA'd repos |
|
03:49
🔗
|
balrog |
but I don't know what the legality on that is :p |
|
03:50
🔗
|
dx |
as illegal as it always was |
|
03:50
🔗
|
dx |
it's only down now because they are acting on it |
|
07:35
🔗
|
dx |
so uhhh, there's an api for dev.bukkit.org, i scrapped all the plugin download links with it |
|
07:36
🔗
|
dx |
so now i've got this huge json file with some metadata and a bunch of direct download links, sample of the first 100: http://dump.dequis.org/mF7xj.txt |
|
07:37
🔗
|
dx |
no filesize info or uploaded timestamps, could grab that with HEAD requests maybe. |
|
07:37
🔗
|
dx |
full dump http://dequis.org/bukkitdev-releases.json.gz - 3.4mb gzipped, 27mb uncompressed |
|
07:41
🔗
|
dx |
all file urls one per line, no filtering at all, http://dequis.org/bukkitdev_all_file_urls.txt - 4.5mb, 75494 lines |
|
07:42
🔗
|
dx |
it's an absurd amount of files, most of them not worth saving |
|
07:43
🔗
|
dx |
how do i reduce this? for some projects it's not as trivial as grabbing the latest release, they have several parallel releases for a single version |
|
07:46
🔗
|
Litus1960 |
Q: just installed Virtualbox and want to import a fine (appliance) how do I get the files? |
|
07:46
🔗
|
Litus1960 |
a file |
|
08:47
🔗
|
midas |
Litus1960: http://tracker.archiveteam.org/ click the download link |
|
08:55
🔗
|
Litus1960 |
@Midas I got the machine how do iI import the files? |
|
08:58
🔗
|
Rotab |
http://archiveteam.org/index.php?title=Warrior |
|
09:00
🔗
|
Litus1960 |
In VirtualBox, click File > Import Appliance and open the file. -> what file do I open? |
|
09:00
🔗
|
Rotab |
....? |
|
09:00
🔗
|
Rotab |
the file you just donwloaded |
|
09:01
🔗
|
Litus1960 |
I import the the file that I just installed the virtual machine with? |
|
09:03
🔗
|
Rotab |
you import the .ova |
|
09:04
🔗
|
midas |
Litus1960: http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova |
|
09:04
🔗
|
midas |
then in virtualbox > file > import |
|
09:07
🔗
|
Litus1960 |
ok did that now how do I run a job? |
|
09:07
🔗
|
midas |
it shows a IP address in the console |
|
09:08
🔗
|
midas |
it explains it here: http://tracker.archiveteam.org/ |
|
09:11
🔗
|
Litus1960 |
got it |
|
09:11
🔗
|
schbirid |
Litus1960: welcome and thanks for helping out :) |
|
09:15
🔗
|
Litus1960 |
sorry bud, thank you for helping out :D was so buissy trying to comprehend the poop not a nerdy person ;-) |
|
09:18
🔗
|
schbirid |
http://www.classiccomputing.com/CC/Blog/Entries/2014/9/7_Computer_History_Nostalgia_Podcasts.html |
|
09:23
🔗
|
Litus1960 |
followed with @FlexMind |
|
09:26
🔗
|
Litus1960 |
bye for now |
|
09:50
🔗
|
danneh_ |
y'all have probably seen this, but: http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ |
|
09:51
🔗
|
ersi |
danneh_: Yeah. There's a project channel at #quitpic |
|
09:52
🔗
|
danneh_ |
ersi: Ah, fair enough. Thanks for being so quick and awesome! |
|
09:53
🔗
|
ersi |
np! |
|
09:53
🔗
|
Muad-Dib |
is anyone going to those Dutch meetups about digital preservation tomorrow? http://www.dezwijger.nl/115739/nl/tegenlicht-meet-up-23-digitaal-geheugenverlies |
|
09:54
🔗
|
Muad-Dib |
joepie91, midas ^ |
|
09:58
🔗
|
midas |
I wish i could, but work and job interviews |
|
10:03
🔗
|
Muad-Dib |
I might be able to go, but I'm a bit reluctant about walking around there and going "Hey guys, I'm with Archive Team!" |
|
10:04
🔗
|
Muad-Dib |
I don't want to be mistaken for a representative, as I'm not really a good PR person |
|
10:05
🔗
|
midas |
simple awnser, dont :p |
|
10:07
🔗
|
Muad-Dib |
but I also want to stir up discussion about digital preservation, and if people ask "how are you connected to this stuff?" |
|
10:09
🔗
|
Muad-Dib |
Shall I stick with Jasons "Teenagers and crazy people" description for bands of rogue archivists like AT? https://decorrespondent.nl/1695/Waarom-deze-man-het-hele-internet-downloadt/56475705-f10825bc |
|
10:12
🔗
|
antomatic |
I believe he said "maniacs" :) |
|
10:12
🔗
|
Muad-Dib |
antomatic: figures, it was translated into dutch in that interview |
|
10:13
🔗
|
Muad-Dib |
havent watched the entire documentary yet |
|
14:36
🔗
|
Arkiver2 |
Guys, I'm going to the meetup tomorrow evening in Amsterdam |
|
14:37
🔗
|
Arkiver2 |
^Muad-Dib |
|
14:39
🔗
|
Muad-Dib |
If I can free the time, I'll be there too |
|
14:41
🔗
|
midas |
i'd like to be there, but the day after that i have a job interview abroad.. so ill pass |
|
14:42
🔗
|
midas |
but do remind them that we grabbed the announcement already Muad-Dib ;) |
|
14:43
🔗
|
Arkiver2 |
midas: I'd like to tell people about the projects we are curently doing |
|
14:43
🔗
|
Arkiver2 |
and the problems the archiveteam is usually experiencing when archiving a websites with warrior |
|
14:44
🔗
|
Arkiver2 |
However, it would be great if someone of you can also be there |
|
14:44
🔗
|
Arkiver2 |
:) |
|
14:51
🔗
|
vantec |
I'd like to go but I'm 6.5k km away |
|
14:51
🔗
|
phuzion |
dx: I'm getting the sizes of everything right now, just so you know |
|
14:53
🔗
|
Arkiver2 |
dx: There is nothing that is not worth saving!! |
|
14:54
🔗
|
Arkiver2 |
If the list of files is below, say, 200 GB we can do it with archivebot |
|
14:58
🔗
|
Arkiver2 |
added it to archivebot |
|
14:58
🔗
|
Arkiver2 |
should be done in a day |
|
14:59
🔗
|
dx |
whoa! |
|
14:59
🔗
|
dx |
just came home and you're already archiving the whole thing :D |
|
14:59
🔗
|
Arkiver2 |
dx: http://archivebot.com/ first one |
|
15:00
🔗
|
dx |
Arkiver2: thanks :D |
|
15:00
🔗
|
Arkiver2 |
if you have any other list of files/pages please give it and it will be saved |
|
15:01
🔗
|
dx |
Arkiver2: all the pages under http://dev.bukkit.org/bukkit-plugins/ - mostly documentation of those plugins, they also link to the jars in that list so you'll want to exclude that. |
|
15:02
🔗
|
Arkiver2 |
dx: all of dev.bukkit.org is being saved: http://archivebot.com/ (third one) |
|
15:02
🔗
|
dx |
Arkiver2: it won't download jars twice, right? |
|
15:03
🔗
|
dx |
also, :D |
|
15:04
🔗
|
yipdw |
dx: if the JARs have N URLs, they will be downloaded N times |
|
15:05
🔗
|
dx |
yipdw: each url is a different version, but they are being downloaded from both my file list and the dev.bukkit.org recursive task |
|
15:06
🔗
|
Arkiver2 |
I'd say just leave it as it's going now, it won't be too big |
|
15:06
🔗
|
dx |
hmm |
|
15:08
🔗
|
jules |
hi |
|
15:08
🔗
|
Arkiver2 |
jules: hello |
|
15:08
🔗
|
dx |
yesterday i took a random sample of 50 of them, got average 140kbytes, max 4mbytes, so uhhh, somewhere between 10gb and 300gb |
|
15:08
🔗
|
dx |
but i have no idea what's "too big" for you guys :P |
|
15:09
🔗
|
Arkiver2 |
nothing is too big or us |
|
15:09
🔗
|
Arkiver2 |
well, there is a limit kind of |
|
15:09
🔗
|
dx |
yeah i saw the twitch wiki page |
|
15:09
🔗
|
Arkiver2 |
MobileMe was more then 200 TB |
|
15:09
🔗
|
dx |
whoa. |
|
15:09
🔗
|
Arkiver2 |
so it should be fine ;) |
|
15:10
🔗
|
dx |
:D |
|
15:10
🔗
|
dx |
thanks a lot! |
|
15:10
🔗
|
yipdw |
dx: if that's the case, please ignore all jars from the dev.bukkit.org job |
|
15:10
🔗
|
dx |
Arkiver2: ^ |
|
15:10
🔗
|
yipdw |
there's "having lots of space" and "not being brain-dead" and two identical copies are the latter |
|
15:10
🔗
|
dx |
yup! |
|
15:10
🔗
|
Arkiver2 |
we can do that too |
|
15:11
🔗
|
yipdw |
hopefully dev.bukkit.org doesn't use Java applets |
|
15:11
🔗
|
dx |
haha no |
|
15:11
🔗
|
yipdw |
that'll make the obvious ignore pattern a bit trickier |
|
15:11
🔗
|
dx |
it's not a website from the 90s luckily |
|
15:13
🔗
|
Arkiver2 |
yipdw: maybe ignore everything from servermods.cursecdn.com/ |
|
15:13
🔗
|
yipdw |
done |
|
15:13
🔗
|
Arkiver2 |
as all filles are form there |
|
15:13
🔗
|
Arkiver2 |
ah that's even better |
|
15:13
🔗
|
Arkiver2 |
thanks |
|
17:53
🔗
|
pluesch |
i hate sites that are impossible to archive... |
|
18:01
🔗
|
joepie91 |
nothing is impossible ;) |
|
18:01
🔗
|
joepie91 |
it's just a challenge! |
|
18:02
🔗
|
dx |
unless they are already dead |
|
18:03
🔗
|
joepie91 |
... okay, point taken. |
|
18:03
🔗
|
joepie91 |
:P |
|
18:17
🔗
|
pluesch |
joepie91: tumblr.com :D |
|
18:17
🔗
|
pluesch |
they limit everything |
|
18:17
🔗
|
pluesch |
for example: notes are limited to 4980 and I don't know why |
|
18:18
🔗
|
pluesch |
so if there is a post with lets say 10000 notes, it is impossible to find out all likes, reblogs |
|
18:19
🔗
|
xmc |
sounds like they tried to limit it to 250 pages of 20, but got it wrong |
|
18:23
🔗
|
joepie91 |
off-by-one, whoo |
|
18:23
🔗
|
joepie91 |
pluesch: how is the remainder normally visible? |
|
18:23
🔗
|
joepie91 |
surely they don't *completely* hide them |
|
18:40
🔗
|
pluesch |
joepie91: what do you mean? |
|
18:40
🔗
|
pluesch |
its not visible the data just gets lost in their database |
|
18:40
🔗
|
pluesch |
uhhh I hate it when data gets lost |
|
18:43
🔗
|
midas |
did someone say mongodb on 32bit? |
|
18:49
🔗
|
joepie91 |
pluesch: I suspect it can still be accessed |
|
18:49
🔗
|
joepie91 |
just through a different method |
|
18:51
🔗
|
pluesch |
joepie91: I've tried many ways ... api v1, api v2, normal page |
|
18:52
🔗
|
pluesch |
but yeah ... still have to check some things (tumblr android app api calls, "undocumented" api functions) |
|
19:12
🔗
|
espes__ |
of general interest: https://github.com/espes/jdget |
|
19:12
🔗
|
espes__ |
if anyone ever needs to pull a bunch of links from file lockers |
|
19:13
🔗
|
espes__ |
ping me and maybe I'll update / maintain it |
|
19:23
🔗
|
pluesch |
even with tumblr android app the 4980 limit is there.... |
|
19:23
🔗
|
pluesch |
-.- |
|
19:54
🔗
|
joepie91 |
pluesch: is it possible that the notes displayed differ depending on what reblog you're looking at |
|
19:54
🔗
|
xmc |
I think they don't |
|
19:54
🔗
|
joepie91 |
similar to how twitter sometimes shows an entirely different conversation flow on a different reply |
|
19:55
🔗
|
xmc |
tumblr notes are attached to the thread root |
|
19:55
🔗
|
joepie91 |
bah |
|
19:55
🔗
|
* |
joepie91 crumples up paper and aims at circular filing bin |
|
20:05
🔗
|
DFJustin |
espes__: ooooh |
|
20:06
🔗
|
DFJustin |
does it do youtube? jdownloader2 is one of the few tools I've found that will correctly grab 1080p videos |
|
20:06
🔗
|
pluesch |
joepie91: nope, checked |
|
20:06
🔗
|
pluesch |
so |
|
20:06
🔗
|
pluesch |
what should i do now? |
|
20:06
🔗
|
DFJustin |
but I can't really recommend it to people because it's such a heap |
|
20:06
🔗
|
pluesch |
ask tumblr to improve there api? XD |
|
20:07
🔗
|
pluesch |
their* |
|
20:07
🔗
|
pluesch |
DFJustin: youtube-dl can 1080p too? |
|
20:07
🔗
|
pluesch |
bestvideo+bestaudio |
|
20:12
🔗
|
joepie91 |
DFJustin: huh? youtube-dl does 1080p fine |
|
20:12
🔗
|
DFJustin |
yeah but it's not the default and you need ffmpeg |
|
20:12
🔗
|
joepie91 |
also, so does freerapid afaik |
|
20:12
🔗
|
joepie91 |
...? |
|
20:12
🔗
|
joepie91 |
DFJustin: link me a 1080p video? |
|
20:13
🔗
|
xmc |
youtube-dl usually works for me |
|
20:15
🔗
|
pluesch |
tumblr has just blocked my ip :/ |
|
20:15
🔗
|
pluesch |
damn. |
|
20:15
🔗
|
pluesch |
I just wanted to mirror 160 blogs |
|
20:16
🔗
|
RedType |
did you try saying you're googlebot? |
|
20:16
🔗
|
DFJustin |
https://www.youtube.com/watch?v=t8YXut6_56c |
|
20:17
🔗
|
RedType |
"oh what's that, you need me to register before viewing or my IP is 2much4u?" -A "Googlebot/2.1 (+http://www.google.com/bot.html)" |
|
20:18
🔗
|
DFJustin |
youtube-dl with default parameters gets 720p (silently!) |
|
20:20
🔗
|
pluesch |
will try that |
|
20:23
🔗
|
DFJustin |
ah --format bestvideo+bestaudio looks like it works, must be new |
|
20:24
🔗
|
DFJustin |
before you had to manually specify the correct numbers |
|
20:25
🔗
|
danneh_ |
yeah, that'll get 1080p with youtube-dl, but you need to merge the audio/video streams with some other tool later |
|
20:25
🔗
|
DFJustin |
the dependency on ffmpeg is also problematic, when I tried to get this working with sketchcow before his system ffmpeg was broken somehow |
|
20:25
🔗
|
DFJustin |
and in my case (windows) it doesn't seem to like unicode filenames |
|
20:26
🔗
|
danneh_ |
I believe, last time I checked you had to merge it manually at least |
|
20:26
🔗
|
DFJustin |
it does call out to ffmpeg to merge them |
|
20:26
🔗
|
danneh_ |
oh, that's shiny |
|
20:27
🔗
|
danneh_ |
pluesch: what're you using to download tumblr blogs? |
|
20:27
🔗
|
danneh_ |
I've downloaded a few and haven't really had any issues, just going through and wgetting the thing |
|
20:28
🔗
|
danneh_ |
had to modify the wget source code to grab another tag properly though (could probably be done via the lua extensions, but I haven't looked into that yet) |
|
20:29
🔗
|
pluesch |
danneh_: https://github.com/bbolli/tumblr-utils tumblr_backup.py with a few modifications |
|
20:30
🔗
|
pluesch |
will release it soon |
|
20:30
🔗
|
danneh_ |
ah, fair enough |
|
20:31
🔗
|
pluesch |
and i have created a script that goes through all notes to create a list of blogs |
|
20:31
🔗
|
pluesch |
^^ |
|
20:31
🔗
|
pluesch |
"all notes" == the 4890 notes per post |
|
20:32
🔗
|
danneh_ |
aha, nice |
|
20:32
🔗
|
danneh_ |
if you don't hit the api they let you download at full speed in my experience |
|
20:33
🔗
|
pluesch |
it hits the old api (blogname.tumblr.com/api/read) and that's the problem i guess |
|
20:33
🔗
|
pluesch |
media download isn't a problem yeah :) |
|
20:34
🔗
|
RedType |
hmm... is there a tumblr screenreading based api or something? |
|
20:34
🔗
|
RedType |
or just hitting their ajax endpoint? |
|
20:34
🔗
|
RedType |
err what im asking is if anyone has written such a thing |
|
20:35
🔗
|
danneh_ |
I'll try to clean up my script and throw it online sometime |
|
20:35
🔗
|
danneh_ |
also includes a little webserver to host blogs after they've been downloaded for fun |
|
20:57
🔗
|
raylee |
LOL |
|
20:57
🔗
|
raylee |
http://partners.disney.com/throwback |