Time |
Nickname |
Message |
00:05
🔗
|
|
mismatch has joined #archiveteam-bs |
00:58
🔗
|
|
JesseW has joined #archiveteam-bs |
01:01
🔗
|
|
xXx_ndidd has quit IRC (Ping timeout: 633 seconds) |
01:20
🔗
|
bsmith093 |
JesseW: how goes the upload? and the csv |
01:20
🔗
|
JesseW |
the csvs have finished -- I need to load them into a database. |
01:22
🔗
|
JesseW |
the upload has also finished, after a total of about 63 hours. |
01:26
🔗
|
JesseW |
I broke the csvs into separate files per directory -- I'm calculating the total size now. |
01:34
🔗
|
godane |
so i'm close to having all of gawker.com sitemap |
01:37
🔗
|
|
bwn_ has joined #archiveteam-bs |
01:39
🔗
|
JesseW |
godane: nice! |
01:49
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
01:57
🔗
|
JesseW |
bsmith093: so the total size of the CSVs is 3.5GB |
01:58
🔗
|
bsmith093 |
holy crap. will anything even read that massive sql thing?! |
02:06
🔗
|
JesseW |
3.5GB of data in a sql database isn't particularly large. |
02:07
🔗
|
JesseW |
It might be painful for sqlite (maybe), but not for other databases. |
02:35
🔗
|
* |
JesseW would like help/a listening chatroom (does that make sense?) in figuring out how to filter leaf nodes out of an adjacency list... |
02:36
🔗
|
JesseW |
I have a list of IA identifier -> collection it is in, and I want to filter out non-collections (i.e. leaf nodes) without loading the whole thing into a graph system |
03:02
🔗
|
Frogging |
being a CS student that sounds like something I should know |
03:02
🔗
|
Frogging |
but alas |
03:39
🔗
|
JesseW |
heh |
04:07
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
04:09
🔗
|
JesseW |
Amusing item: https://archive.org/metadata/ia-das/metadata -- a collection which is a member of itself. |
04:38
🔗
|
|
bwn has joined #archiveteam-bs |
04:48
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
05:17
🔗
|
remsen |
.j #justsolve |
05:17
🔗
|
remsen |
Shit. |
05:34
🔗
|
JesseW |
remsen: ? |
05:35
🔗
|
* |
JesseW has solved the graph problem I mentioned; it turned out just listing all the internal nodes, then running over the list again worked fine. |
05:35
🔗
|
JesseW |
I was going through the IA census collections data -- it turns out there are about 16,000 collections. |
05:35
🔗
|
remsen |
JesseW, command fuckup! I need now to log off and toss my modem into the garbage. |
05:38
🔗
|
JesseW |
sympathy. modems -> :-( |
05:40
🔗
|
remsen |
My new modem/router combo (!!!) from TWC is actually an upgrade from the Linksys one I bought myself. |
05:41
🔗
|
remsen |
Well, the one that was purchased for the household. |
05:43
🔗
|
remsen |
It actually has decent default security too. Good on Arris. |
05:44
🔗
|
remsen |
It's obviously leased so I can't flash it. |
05:53
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:55
🔗
|
|
JesseW has quit IRC (Remote host closed the connection) |
05:56
🔗
|
|
Honno has joined #archiveteam-bs |
05:57
🔗
|
|
JesseW has joined #archiveteam-bs |
06:01
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:26
🔗
|
godane |
i'm starting to uploading Sky & Telescope: https://archive.org/details/Sky_and_Telescope_1941-11-cbr |
06:27
🔗
|
godane |
the cbr files |
06:28
🔗
|
godane |
your going to get cbr and pdf collections of it |
06:29
🔗
|
godane |
this is mostly cause there could be gaps in both collections |
07:09
🔗
|
|
Start has quit IRC (Ping timeout: 260 seconds) |
07:36
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
07:41
🔗
|
|
JesseW has joined #archiveteam-bs |
08:05
🔗
|
|
schbirid has joined #archiveteam-bs |
08:09
🔗
|
|
mismatch has quit IRC (Ping timeout: 633 seconds) |
08:16
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
08:54
🔗
|
|
marvinw has quit IRC (Ping timeout: 633 seconds) |
09:21
🔗
|
godane |
I'm starting to upload gawker.com sitemap for 2005 |
10:01
🔗
|
|
marvinw has joined #archiveteam-bs |
10:04
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
10:12
🔗
|
|
bwn has joined #archiveteam-bs |
10:20
🔗
|
|
marvinw has quit IRC (Read error: Connection reset by peer) |
10:29
🔗
|
|
marvinw has joined #archiveteam-bs |
10:45
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:59
🔗
|
godane |
SketchCow: the collection for this item is a item: https://archive.org/details/Lifehacker_Extra_17 |
11:00
🔗
|
godane |
https://archive.org/details/lifehacker |
11:00
🔗
|
godane |
may want to change it to lifehacker-extra or rev3-lifehacker-extra |
11:02
🔗
|
godane |
SketchCow: also some got dark being mark spam: https://archive.org/details/Lifehacker_Extra_1 |
11:16
🔗
|
arkiver |
godane: that's awesome! https://archive.org/details/Sky_and_Telescope_1941-11-cbr |
11:16
🔗
|
arkiver |
will you upload all years? |
11:23
🔗
|
godane |
up to 2009 |
11:25
🔗
|
godane |
i'm up to this far : https://archive.org/details/Sky_and_Telescope_1960-12-cbr |
11:29
🔗
|
godane |
so looks like cbr files have no gaps |
11:29
🔗
|
godane |
it was only 1949 in the pdf format that had a gap |
11:29
🔗
|
godane |
mostly cause there is only 1 1949 magazine in pdf format |
13:18
🔗
|
|
Stiletto has quit IRC (Ping timeout: 260 seconds) |
13:19
🔗
|
|
Famicoma1 has quit IRC (Ping timeout: 260 seconds) |
13:19
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
13:19
🔗
|
|
Stiletto has joined #archiveteam-bs |
13:21
🔗
|
|
Famicoma1 has joined #archiveteam-bs |
13:26
🔗
|
|
vitzli has joined #archiveteam-bs |
13:31
🔗
|
|
Famicoma1 has quit IRC (Remote host closed the connection) |
13:31
🔗
|
|
Famicoma1 has joined #archiveteam-bs |
13:32
🔗
|
|
Muad-Dib has joined #archiveteam-bs |
14:37
🔗
|
|
metalcamp has joined #archiveteam-bs |
15:24
🔗
|
|
JesseW has joined #archiveteam-bs |
15:38
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
16:05
🔗
|
|
altlabel has joined #archiveteam-bs |
16:11
🔗
|
|
zino has quit IRC (Read error: Operation timed out) |
16:12
🔗
|
|
Start has joined #archiveteam-bs |
16:17
🔗
|
|
vitzli has quit IRC (Leaving) |
19:36
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
20:03
🔗
|
|
bwn has joined #archiveteam-bs |
20:31
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:02
🔗
|
|
mksplg has quit IRC (Ping timeout: 260 seconds) |
21:05
🔗
|
|
Rickster has quit IRC (Remote host closed the connection) |
21:07
🔗
|
|
Rickster has joined #archiveteam-bs |
21:14
🔗
|
|
mksplg has joined #archiveteam-bs |
21:15
🔗
|
|
zino has joined #archiveteam-bs |
21:22
🔗
|
|
Stiletto is now known as Stilett0 |
21:39
🔗
|
|
bauruine has quit IRC (Ping timeout: 260 seconds) |
21:42
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
21:48
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
21:53
🔗
|
|
bauruine has joined #archiveteam-bs |
22:02
🔗
|
|
Stiletto has joined #archiveteam-bs |
22:06
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
22:06
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
22:18
🔗
|
|
Honno has quit IRC (Ping timeout: 492 seconds) |
23:14
🔗
|
|
toad2 has quit IRC (Read error: Operation timed out) |
23:14
🔗
|
|
toad1 has joined #archiveteam-bs |
23:27
🔗
|
|
Stiletto has quit IRC (Ping timeout: 260 seconds) |
23:32
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
23:39
🔗
|
|
JetBalsa has joined #archiveteam-bs |
23:59
🔗
|
|
ndiddy has joined #archiveteam-bs |