Time |
Nickname |
Message |
00:03
🔗
|
|
atphoenix has quit IRC (Read error: Connection reset by peer) |
00:06
🔗
|
|
atphoenix has joined #archiveteam-bs |
00:44
🔗
|
|
atphoenix has quit IRC (Read error: Connection reset by peer) |
00:45
🔗
|
|
atphoenix has joined #archiveteam-bs |
01:19
🔗
|
|
d5f4a3622 has quit IRC (Read error: Connection reset by peer) |
01:21
🔗
|
|
d5f4a3622 has joined #archiveteam-bs |
02:09
🔗
|
|
RichardG_ is now known as RichardG |
02:15
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
02:22
🔗
|
|
OrIdow6 has quit IRC (Ping timeout: 276 seconds) |
02:28
🔗
|
|
OrIdow6 has joined #archiveteam-bs |
03:05
🔗
|
|
chr1sm has quit IRC (Quit: Connection closed for inactivity) |
04:12
🔗
|
|
qw3rty__ has joined #archiveteam-bs |
04:16
🔗
|
|
kahuna has quit IRC (Read error: Connection reset by peer) |
04:20
🔗
|
|
qw3rty_ has quit IRC (Read error: Operation timed out) |
04:34
🔗
|
|
d5f4a3622 has quit IRC (https://i.imgur.com/xacQ09F.mp4) |
05:25
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
05:26
🔗
|
|
kiska18 has joined #archiveteam-bs |
05:26
🔗
|
|
fuzzy8021 has quit IRC (Read error: Operation timed out) |
05:27
🔗
|
|
Ryz has joined #archiveteam-bs |
05:31
🔗
|
|
fuzzy8021 has joined #archiveteam-bs |
06:50
🔗
|
|
HP_Archiv has quit IRC (Ping timeout: 610 seconds) |
07:01
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
07:23
🔗
|
|
fuzzy8021 has quit IRC (Read error: Connection reset by peer) |
07:23
🔗
|
|
fuzzy8021 has joined #archiveteam-bs |
09:02
🔗
|
|
icedice2 has joined #archiveteam-bs |
09:08
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
10:25
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:18
🔗
|
|
icedice2 has quit IRC (Quit: Leaving) |
12:01
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
13:54
🔗
|
|
kahuna has joined #archiveteam-bs |
14:21
🔗
|
alex73 |
Is there any way to archive facebook page/user feed ? I don't see anything helpful on the https://www.archiveteam.org/index.php?title=Facebook |
14:21
🔗
|
JAA |
alex73: Yes, snscrape and ArchiveBot/chromebot, most easily used through socialbot in #archivebot. |
14:59
🔗
|
alex73 |
JAA: and where is result ? I see command "snscrape facebook-user ellenmilliongraphics" in #archivebot, but don't see anything on the https://web.archive.org/web/20200222233121/https://www.facebook.com/ellenmilliongraphics/ |
15:08
🔗
|
JAA |
alex73: Hmm, yeah, looks like the pipeline that ran on was banned from Facebook. The posts are there though, e.g. https://web.archive.org/web/20200222000546/https://www.facebook.com/ellenmilliongraphics/photos/1003312429705331/ |
15:10
🔗
|
JAA |
Actually no, the profile page was also archived: https://web.archive.org/web/20200221234402/https://www.facebook.com/ellenmilliongraphics/ |
15:10
🔗
|
JAA |
The snapshot you linked was not done by ArchiveBot. |
15:21
🔗
|
alex73 |
JAA: Do you mean snscrape archived only some posts, not a full posts feed ? There is no https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/posts/ page. |
15:23
🔗
|
JAA |
alex73: First, terminology: snscrape doesn't archive anything, it just scrapes the feed; the archival happens by running snscrape to discover the URLs for each post/photo/etc. and then feeding that into an actual archiving tool. |
15:24
🔗
|
JAA |
And what was archived is the profile page plus the page for each entry on the feed. Scrolling doesn't work on the WBM, comments won't load, and the page you mentioned was never retrieved either. |
15:26
🔗
|
JAA |
Try https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/* to discover what was archived. |
15:28
🔗
|
|
mtntmnky has quit IRC (Remote host closed the connection) |
15:28
🔗
|
|
mtntmnky has joined #archiveteam-bs |
15:37
🔗
|
alex73 |
Hm. Understood. In this case, I can't see list of entries like in facebooks. Only by parsing https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/* via some tool. Do you plan to fix/implement it somehow ? |
15:37
🔗
|
JAA |
Due to how Facebook works and sucks, it's virtually impossible to implement that. |
15:43
🔗
|
alex73 |
Does snscrape use chrome for load facebook pages ? In this case, it could write final DOM as html. |
15:45
🔗
|
JAA |
No, it's a Python tool that emulates the scrolling. |
15:45
🔗
|
JAA |
I.e. constructs the XHR that are needed to load more content etc. |
15:46
🔗
|
JAA |
I never made any attempt to actually reverse-engineer the scrolling code entirely. Facebook for example includes a variety of extra parameters in the scrolling URL, including a hash of the list of all JS "modules" that are active. That's essentially impossible to do without actually executing the JS. |
15:47
🔗
|
JAA |
While it would be possible to archive the pagination as snscrape retrieves it, that would be useless for the scrolling in the WBM. |
15:50
🔗
|
alex73 |
Yes, I understand. My idea was to archive DOM after real Chrome will retrieve it. It could be something like Selenium+Chrome, or some other way when Chrome will act as real browser, but will be controlled by our script. |
15:52
🔗
|
alex73 |
Or, some converter to RSS-like feed with own rendering. |
15:54
🔗
|
alex73 |
Okay, I understood that posts can be archived now, but if we want to have some facebook-like list of enrties, it should be implemented. In case if it required by somebody except me. |
16:09
🔗
|
JAA |
chromebot does that. |
16:10
🔗
|
JAA |
But it's not in the Wayback Machine because only original data goes in there, not something derived like this. |
16:15
🔗
|
alex73 |
Does anybody need such functionality ? If yes, it could be some 'pseudo-page', like https://web.archive.org/web/*/https://www.facebook.com/ellenmilliongraphics/posts-list/, or, if WM should contain only original data, it could be some site like http://facebook-mirror.archiveteam.org/ellenmilliongraphics/posts/, that can be archived also. |
16:32
🔗
|
JAA |
If someone wants to do it, feel free to set up the latter. |
17:24
🔗
|
godane |
Vogue Italia Archive is free : https://www.instagram.com/p/B91DUriK9qS/ |
17:52
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
17:53
🔗
|
|
asdf0101 has joined #archiveteam-bs |
17:59
🔗
|
|
anarchat has quit IRC (Read error: Connection reset by peer) |
18:00
🔗
|
|
anarcat has joined #archiveteam-bs |
18:04
🔗
|
alex73 |
Could anybody please to run archiving of some sites for me ? https://lit-bel.org/, http://dziejaslou.by/, http://www.baradulin.by/. |
18:21
🔗
|
Ryz |
Hello alex73, what are the reasons for archiving those 3 websites? |
18:55
🔗
|
alex73 |
Ryz: some important belarusian literature sites |
19:02
🔗
|
Ryz |
I'll see what I can do alex73, problem is we're in the middle of coronavirus stuff to archive~ |
19:03
🔗
|
alex73 |
thank you |
19:05
🔗
|
Ryz |
When archiving http://dziejaslou.by/ - AB won't be able to grab the Issuu content; I'm not sure if it'll grab the embeds properly |
19:08
🔗
|
Ryz |
alex73 ^ |
19:31
🔗
|
|
Dallas has joined #archiveteam-bs |
19:51
🔗
|
alex73 |
Ryz: it's okay. Issue content is not important part |
20:44
🔗
|
|
d5f4a3622 has joined #archiveteam-bs |
21:36
🔗
|
|
schbirid has joined #archiveteam-bs |
21:51
🔗
|
|
dxrt has quit IRC (Ping timeout: 276 seconds) |
21:59
🔗
|
|
Datechnom has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
dxrt has joined #archiveteam-bs |
22:02
🔗
|
|
Iglooop1 sets mode: +o dxrt |
22:38
🔗
|
|
dxrt has quit IRC (Ping timeout: 276 seconds) |
22:41
🔗
|
|
dxrt has joined #archiveteam-bs |
22:41
🔗
|
|
svchfoo1 sets mode: +o dxrt |
22:52
🔗
|
|
Datechnom has joined #archiveteam-bs |
23:00
🔗
|
|
BlueMax has joined #archiveteam-bs |