Time |
Nickname |
Message |
00:04
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
00:15
🔗
|
|
Kenshin has quit IRC (Ping timeout: 260 seconds) |
00:18
🔗
|
|
Start has joined #archiveteam-bs |
00:20
🔗
|
|
Mayonaise has joined #archiveteam-bs |
00:24
🔗
|
|
espes___ has quit IRC (Ping timeout: 250 seconds) |
00:25
🔗
|
|
Kenshin has joined #archiveteam-bs |
00:31
🔗
|
dxrt |
We are writing to let you know that with effect from 27 January 2016, the Slashdot Media business, which provides online services through various web sites including Slashdot.org and SourceForge.net (the "Slashdot Media Services") has been purchased by SourceForge Media LLC of 1660 Logan Avenue, San Diego, California, 92113, USA ("we" or "us"). |
00:48
🔗
|
|
vtyl has quit IRC (Read error: Operation timed out) |
00:48
🔗
|
|
vtyl has joined #archiveteam-bs |
00:49
🔗
|
|
espes__ has joined #archiveteam-bs |
00:50
🔗
|
Frogging |
dxrt: yep, that happened |
00:54
🔗
|
|
JesseW has joined #archiveteam-bs |
00:57
🔗
|
Sk1d |
is there a collection on archive.org for youtube videos I downloaded and reuploaded ther via the tubeup.py script? |
01:29
🔗
|
|
ndiddy has joined #archiveteam-bs |
01:29
🔗
|
|
xXx_ndidd has quit IRC (Read error: Connection reset by peer) |
01:38
🔗
|
Sk1d |
SketchCow: ^ I uploaded the alphaGo videos from https://www.youtube.com/channel/UCP7jMXSY2xbc3KCAE0MHQ-A/videos |
01:53
🔗
|
JesseW |
MrRadar: so I've got the fanfiction archive uncompressed -- you recommended the following arguments to p7zip -mx=9 -ms=128m -m0=LZMA2 |
01:53
🔗
|
MrRadar |
Before you do that, ZIP might be better if the IA can browse inside of them |
01:53
🔗
|
JesseW |
but also said it might be worth trying PPMd compression; I need to figure out what the proper arguments are for that. |
01:53
🔗
|
JesseW |
Ah, true -- that's a good reason to for zip |
01:54
🔗
|
|
ko_ has joined #archiveteam-bs |
01:54
🔗
|
MrRadar |
I didn't think of that when I originally was for 7z |
01:54
🔗
|
ErkDog |
well I was trying to help that guy with the fan fiction file |
01:54
🔗
|
ErkDog |
but the warc extract kept crashing @ 13,000,000 items |
01:54
🔗
|
JesseW |
I'm not sure I'd trust IA's zip viewer with a zip file containing millions of items, though |
01:55
🔗
|
JesseW |
ErkDog: I tried to find the file he was referring to, but couldn't find it in the WARCs. |
01:56
🔗
|
* |
JesseW will be AFK |
01:56
🔗
|
MrRadar |
The AT wiki actually specifically says to prefer .zip for this reason: http://archiveteam.org/index.php?title=Internet_Archive#Uploading_to_archive.org |
01:57
🔗
|
MrRadar |
When uploading to IA |
01:57
🔗
|
JesseW |
Actually, I think probably the right idea is a number of zip files, each containing no more than, say, 1000 files... |
01:57
🔗
|
MrRadar |
Yeah |
01:58
🔗
|
JesseW |
maybe organized by folder, or first few letters of folder |
01:58
🔗
|
JesseW |
but I also want to wait on advice from SketchCow about recommended ways to handle this, too. |
01:58
🔗
|
MrRadar |
That would be good |
01:59
🔗
|
MrRadar |
Though he said earlier that he would be travelling today and tomorrow |
01:59
🔗
|
JesseW |
yep, so I expect to be waiting till probably sometime in the weekend. That works fine for me. |
02:01
🔗
|
JesseW |
up till then, I can still do various analysis on the files, figuring out how big various directories are, etc. |
02:03
🔗
|
|
ko_ has quit IRC (Quit: Page closed) |
02:04
🔗
|
JesseW |
for now, I'm re-generating the inventory file (all ~ 800MB of it) |
02:07
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
03:02
🔗
|
|
JesseW has joined #archiveteam-bs |
03:05
🔗
|
JesseW |
The inventory of the fanfictionNet grab found 6,930,546 files. |
03:05
🔗
|
JesseW |
and took half an hour to generate |
03:13
🔗
|
JesseW |
Interestingly, there are only 7 top-level folders (i.e. fandoms) with over 100,000 files: http://0bin.net/paste/fQ3AlxKaYJkp7b36#d4iFB00G0pA7HODIlLHWbp9cby1KCw-7jEQvbTc0ZI2 |
03:26
🔗
|
JesseW |
and this is the distribution of titles of completed Harry Potter stories by initial letter: http://0bin.net/paste/jt2SMbh92gtdr+dQ#n7fLpZfPbWNynm7kAPz9dfVoi3RZP23LeyDOg1T0erG |
03:50
🔗
|
JesseW |
And those top 7 folders -- contain a total of about 88 GB (out of ~307G for the whole thing) |
04:02
🔗
|
|
robink has quit IRC (Ping timeout: 633 seconds) |
04:04
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
04:23
🔗
|
Asparagir |
JesseW: I would have guessed Sherlock, LOTR, and the Marvel Movies (Avengers, etc.) would be in the top for sure. But maybe that's a reflection on AO3 biases rather than FanFictionNet. |
04:23
🔗
|
JesseW |
yeah, different era, I think. |
04:24
🔗
|
Asparagir |
Also, boy bands. |
04:24
🔗
|
Asparagir |
One Direction, K-Pop. And Teen Wolf. |
04:25
🔗
|
Asparagir |
Someday, if AO3 ever opens up its database as an API someday, I would love to figure out how to write a recommendation engine for it. Fans who like THIS also like THAT. |
04:26
🔗
|
|
robink has joined #archiveteam-bs |
04:26
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
04:27
🔗
|
JesseW |
Well, with the new Board for A03, maybe things will move more on that front. |
04:28
🔗
|
JesseW |
I've been keeping an eye on the meeting minutes, and making sure they're in Wayback |
04:35
🔗
|
JesseW |
bsmith094: In the Fanfiction grab, there's a path: Fanfiction/Fanfiction, with 8904 subdirectories -- many of which are empty. Any idea what was up with that? |
04:36
🔗
|
|
yipdw_ has quit IRC (Ping timeout: 260 seconds) |
04:43
🔗
|
JesseW |
Of the 169,569 directories in the grab, 8,250 of them are empty. |
04:45
🔗
|
JesseW |
and all but 36 of those are under Fanfiction/Fanfiction |
04:52
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
05:20
🔗
|
|
yipdw has joined #archiveteam-bs |
05:28
🔗
|
|
decay has quit IRC (Read error: Operation timed out) |
05:32
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
05:35
🔗
|
|
decay has joined #archiveteam-bs |
05:45
🔗
|
|
bwn_ has joined #archiveteam-bs |
05:55
🔗
|
JesseW |
It looks like the Fanfiction/Fanfiction one is a partial copy of the rest. |
05:55
🔗
|
JesseW |
totaling about 3 G |
05:56
🔗
|
|
phuzion has quit IRC (Read error: Operation timed out) |
06:00
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
06:00
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
06:07
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:08
🔗
|
JesseW |
Well, it's not an *exact* copy, there's at least one story that got re-written between the times it was grabbed. |
06:14
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
06:18
🔗
|
yipdw |
strangely, today I was looking into hosting an instance of otwarchive |
06:19
🔗
|
JesseW |
otwarchive? |
06:19
🔗
|
yipdw |
the software behind AO3 |
06:19
🔗
|
JesseW |
ah, I didn't know the name of the software |
06:19
🔗
|
JesseW |
how painful is it to set up an instance? |
06:20
🔗
|
yipdw |
I got roped into a Secret Project and decided "well let's see what OTW did" |
06:20
🔗
|
yipdw |
well, it's a Rails 3.2.22 app and I'm looking into Docker |
06:20
🔗
|
yipdw |
on a scale from Go to de Sade, I'd say Christian Grey |
06:20
🔗
|
JesseW |
whew... |
06:21
🔗
|
JesseW |
are we talking about Go the game, or go as in "start" |
06:21
🔗
|
yipdw |
Go the language which tends to produce stuff that's easy to work with operationally |
06:21
🔗
|
JesseW |
ah |
06:21
🔗
|
JesseW |
Interesting -- I hadn't actually heard that about Go-lang |
06:21
🔗
|
yipdw |
outputs end up being statically linked |
06:22
🔗
|
yipdw |
this is nice in some ways and terrible in others |
06:22
🔗
|
yipdw |
people who develop in golang tend to have other stories but I don't so I don't have any stories of my own |
06:23
🔗
|
* |
JesseW googled for go-lang dependency hell, and it's bringing up some interesting bits |
06:24
🔗
|
yipdw |
honestly the hardest part of getting otwarchive up and running (for me) is that it's written using mysql and Ruby 2.0.0, neither of which I have installed |
06:24
🔗
|
yipdw |
once you get past that I expect it is just like any other Rails 3 app |
06:25
🔗
|
JesseW |
IDK much about ruby, or it's operational challenges |
06:26
🔗
|
yipdw |
my biggest pain has come from Ruby and attendant libraries moving faster than most distros will keep up with |
06:26
🔗
|
yipdw |
the libraries bit can be solved (mostly) with bundler and app-local bundles |
06:27
🔗
|
yipdw |
the Ruby bit though is a bit more annoying; sometimes I work around it with rvm/rbenv/chruby but that adds another thing to get into the environment |
06:28
🔗
|
yipdw |
at present I work around this by using Docker images that are preconfigured with all the right stuff in the env, but it's not like I've simplified the stack by doing this |
06:28
🔗
|
|
phuzion has joined #archiveteam-bs |
06:28
🔗
|
yipdw |
Piss Your Devops Staff Off With These Five Weird Tricks |
06:29
🔗
|
JesseW |
lol |
06:40
🔗
|
JesseW |
https://nathany.com/go-packages/ |
06:44
🔗
|
JesseW |
So, of the over 56,000 files in Fanfiction/Fanfiction, all but 19 of them are identical to copies in just Fanfiction/ |
06:44
🔗
|
JesseW |
I am not at all sure what is the right way to handle this. :-/ |
06:58
🔗
|
|
metalcamp has joined #archiveteam-bs |
07:08
🔗
|
|
godane has quit IRC (Ping timeout: 260 seconds) |
07:18
🔗
|
|
Asparagir has quit IRC (Read error: Connection reset by peer) |
07:19
🔗
|
|
achip has quit IRC (Ping timeout: 258 seconds) |
07:20
🔗
|
|
RedType has quit IRC (Read error: Operation timed out) |
07:22
🔗
|
|
RedType has joined #archiveteam-bs |
07:28
🔗
|
|
achip has joined #archiveteam-bs |
07:44
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
07:52
🔗
|
|
bwn has joined #archiveteam-bs |
08:08
🔗
|
|
K0 has joined #archiveteam-bs |
08:32
🔗
|
|
K0 has quit IRC (Quit: Page closed) |
08:48
🔗
|
|
pgoetz has quit IRC (Quit: No Ping reply in 180 seconds.) |
08:48
🔗
|
|
pgoetz has joined #archiveteam-bs |
09:14
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
09:35
🔗
|
|
vtyl has quit IRC (Read error: Operation timed out) |
09:39
🔗
|
|
lytv has joined #archiveteam-bs |
10:41
🔗
|
|
Spilverga has joined #archiveteam-bs |
11:05
🔗
|
|
schbirid has joined #archiveteam-bs |
12:23
🔗
|
|
vitzli has joined #archiveteam-bs |
13:04
🔗
|
|
TheKiwi has quit IRC (Ping timeout: 260 seconds) |
13:26
🔗
|
|
pgoetz has quit IRC (Quit: No Ping reply in 180 seconds.) |
13:27
🔗
|
|
VADemon has joined #archiveteam-bs |
14:07
🔗
|
|
metalcamp has joined #archiveteam-bs |
14:12
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:43
🔗
|
|
Start has joined #archiveteam-bs |
15:15
🔗
|
|
toad2 has joined #archiveteam-bs |
15:16
🔗
|
|
toad1 has quit IRC (Read error: Operation timed out) |
15:28
🔗
|
|
ohhdemgir has quit IRC (Quit: True) |
15:48
🔗
|
|
JesseW has joined #archiveteam-bs |
15:57
🔗
|
|
vitzli has quit IRC (Leaving) |
16:02
🔗
|
|
ohhdemgir has joined #archiveteam-bs |
16:05
🔗
|
|
pgoetz has joined #archiveteam-bs |
16:06
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:15
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
17:34
🔗
|
|
bwn has quit IRC (Ping timeout: 246 seconds) |
17:39
🔗
|
|
JW_work has quit IRC (Read error: Operation timed out) |
18:02
🔗
|
|
bwn has joined #archiveteam-bs |
18:07
🔗
|
|
godane has joined #archiveteam-bs |
18:19
🔗
|
|
Start has joined #archiveteam-bs |
18:45
🔗
|
joepie91 |
http://www.rtlnieuws.nl/editienl/hema-klanten-krijgen-6-maanden-om-fotoalbums-af-te-maken |
19:04
🔗
|
|
toad2 has quit IRC (Read error: Operation timed out) |
19:07
🔗
|
|
toad1 has joined #archiveteam-bs |
19:15
🔗
|
|
Smiley has joined #archiveteam-bs |
19:31
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:50
🔗
|
|
xXx_ndidd has joined #archiveteam-bs |
19:50
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
21:08
🔗
|
|
bwn has quit IRC (Leaving) |
21:26
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:29
🔗
|
|
pgoetz_ has joined #archiveteam-bs |
21:29
🔗
|
|
Baljem has joined #archiveteam-bs |
21:33
🔗
|
|
pgoetz has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
RedType has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
Baljem_ has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
midas has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
BnA-Rob1n has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
Fletcher_ has quit IRC (hub.se efnet.portlane.se) |
21:33
🔗
|
|
bsmith094 has quit IRC (hub.se efnet.portlane.se) |
21:35
🔗
|
|
bsmith093 has joined #archiveteam-bs |
21:36
🔗
|
|
RedType_ has joined #archiveteam-bs |
21:40
🔗
|
|
Stiletto is now known as Stilett0 |
21:45
🔗
|
|
midas1 has joined #archiveteam-bs |
21:46
🔗
|
|
RichardG has quit IRC (Ping timeout: 244 seconds) |
22:19
🔗
|
|
Fletcher_ has joined #archiveteam-bs |
22:37
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
22:47
🔗
|
bsmith093 |
JeseW |
22:48
🔗
|
bsmith093 |
The Fanfiction/Fanfiction thing was me being stupid and not clearing out the old grab before starting a new one, after finally settling on a file name format |
22:52
🔗
|
|
Spilverga has quit IRC (Ping timeout: 268 seconds) |
23:01
🔗
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
23:01
🔗
|
|
ErkDog has joined #archiveteam-bs |
23:01
🔗
|
|
dashcloud has quit IRC (Ping timeout: 260 seconds) |
23:04
🔗
|
|
dashcloud has joined #archiveteam-bs |
23:22
🔗
|
|
Start has joined #archiveteam-bs |
23:40
🔗
|
|
Rickster has quit IRC (Ping timeout: 260 seconds) |
23:52
🔗
|
|
Stiletto has joined #archiveteam-bs |
23:55
🔗
|
|
Rickster has joined #archiveteam-bs |